Datasets can be automatically generated for schemas for use in testing queries for grading. The datasets are built with random data and foreign keys are respected.
Data Generation Use
Data generation does not function properly on schemas with “-“ in the table or schema names.
Data Generation Implementation
Data generation is implemented through the Python faker library, which generates SQL INSERT
statements to load generated data into a table. Foreign keys and uniqueness constraints are detected to specify data generation ordering and ensure the generated data adheres to the schema requirements.
For efficiency during grading, a single container is created and the data inserted before a snapshot is taken to preserve the completed environment state. The snapshot is used as a container image for all student submissions to be graded, eliminating the need to run the large insert statements on additional database instances.
Data Types Observed
The following data types are observed for data generation:
- string, nvarchar, varchar, nchar, char, text
- uniqueidentifier
- int, tinyint, smallint, bigint
- float
- decimal, numeric
- money, currency
- binary
- bit
- date, time, datetime, datetime2, datetimeoffset
Data types not listed may generate a blank column.
Data Generation Limitations
- Data generation does not function properly on schemas with “-“ in the table or schema names.
- Certain data types will not generate data, including XML, array