Learning Intentions
- Understand what a data dictionary is and why databases need one
- Identify the four types of validation: presence check, restricted choice, field length, range
- Create a complete data dictionary with all required fields and validation rules
- Explain how validation enforces data quality, and how referential integrity (a separate design concept) protects linked tables
- Use a query design table to plan a SELECT query before writing SQL
Success Criteria
- You can name all four validation types with a real-world example for each
- You can create a complete data dictionary for a one or two-entity database
- You can identify which validation type to apply to each attribute
- You can explain why validation prevents errors and protects data integrity
- You can complete a 4-row query design table (Field(s), Table(s), Search criteria, Sort order) for single and two-table queries
1. What is a data dictionary? TYPE 1
2. Which validation type ensures a field is not left blank? TYPE 1
3. A field should only accept the values "Bronze", "Silver", or "Gold". Which validation type applies? TYPE 1
Key vocabulary
Data Dictionary and Validation
What is a Data Dictionary?
A data dictionary is a document (often a table) that describes every aspect of a database's structure. It answers: What tables exist? What columns does each table have? What type of data does each column hold? What rules prevent bad data from being entered?
A data dictionary is created during the Design phase and updated as the database evolves. It serves as:
- Documentation: Future developers understand the database without reading code.
- Specification: Database developers know exactly what to build.
- Quality assurance: Testers know what validation rules to test.
- Audit trail: If data looks wrong, the data dictionary shows what rules should have prevented it.
Components of a Data Dictionary
A complete data dictionary includes columns for:
- Entity: The table name (e.g., Student, Course, Enrollment).
- Attribute: The field/column name (e.g., StudentID, FirstName, DateOfBirth).
- Key Type: Is this a Primary Key (PK), Foreign Key (FK), or neither?
- Data Type: Text, Integer, Decimal, Date, Boolean, etc.
- Size: For text, max length (e.g., VARCHAR(50)). For numbers, precision (Integer 0–32,767).
- Validation Rule: What constraints apply? (e.g., "Presence check," "Range 0–100," "Restricted to A–F").
The Four Validation Types
1. Presence Check
A presence check ensures a field is not left blank. If a field is mandatory, the database rejects any record where that field is empty.
Example: StudentID is a primary key, so it must have a value. If someone tries to add a student without an ID, the database rejects the entry.
- Rule: StudentID must not be NULL
- Error: "Student ID is required"
2. Restricted Choice
A restricted choice validation limits entries to a predefined set of allowed values. The field only accepts values from a list.
Example: A Grade field should only accept A, B, C, D, E, or F. Entering "X" is rejected.
- Rule: Grade must be one of {A, B, C, D, E, F}
- Error: "Grade must be A, B, C, D, E, or F"
3. Field Length Validation
Field length validation limits how many characters can be entered into a text field. This prevents data that's too long for the field.
Example: A FirstName field is defined as VARCHAR(30), meaning up to 30 characters. A 9-character name is accepted. A 50-character name is rejected.
- Rule: FirstName must be ≤ 30 characters
- Error: "First name cannot exceed 30 characters"
4. Range Validation
Range validation ensures numeric or date values fall within acceptable limits. Test scores must be 0–100, dates must be between 2020 and 2030, ages must be 0–120.
Example: A TestScore field should only accept 0–100. Entering –5 or 150 is rejected.
- Rule: TestScore must be ≥ 0 AND ≤ 100
- Error: "Test score must be between 0 and 100"
Foreign keys are not a validation type — they are a separate design concept called referential integrity. The database enforces that every foreign key value must match an existing primary key in the linked table. In the data dictionary, the Validation column for a foreign key field reads: "Existing [fieldName] from [TableName] table" — for example, "Existing teamName from Team table". Referential integrity is set up using a FOREIGN KEY constraint when the table is created, not through one of the four validation types above.
Why Validation Matters
Validation prevents garbage-in-garbage-out (GIGO). Without validation:
- Users might leave required fields blank, creating incomplete records.
- Typos might be entered (e.g., Grade="X" instead of "A").
- Out-of-range values might be stored (e.g., Age=–5).
- Orphaned records might exist (e.g., enrollment for a deleted student).
- Reports and analysis become unreliable because data is inconsistent.
Validation catches errors before they're stored, ensuring data quality. This is much cheaper than fixing bad data later.
Worked examples
Scenario: Design a data dictionary for a library that tracks books, borrowers, and loans.
Scenario: Design validation rules for a Grades table with GradeID, StudentID (FK), CourseID (FK), Mark, Grade, DateRecorded.
Scenario: A Booking table has CustomerID referencing Customer table. What happens without and with foreign key validation?
Design validation rules for a Hospital Appointment System with Patient, Doctor, and Appointment tables.
Attributes to validate: PatientID, FirstName, LastName, DateOfBirth, PhoneNumber (11 chars), DoctorID, Speciality, AppointmentDate, AppointmentTime
Answer:
- PatientID: Presence check, Primary key, Auto-increment
- FirstName/LastName: Presence check, Field length ≤ 50
- DateOfBirth: Presence check, Range (0–120 years old)
- PhoneNumber: Presence check, Field length = 11
- DoctorID (FK): Foreign key → Doctor.DoctorID
- Speciality: Presence check, Restricted choice (Cardiology, Surgery, etc.)
- AppointmentDate: Presence check, Range ≥ today
- AppointmentTime: Presence check, Restricted choice (09:00, 10:00, 14:00, etc.)
Query Design
A query design is a planning table you fill in before writing SQL. Just as a data dictionary plans the database structure before it is built, a query design plans what a SELECT query should do. It uses a 4-row table that you complete top to bottom, then translate directly into SQL.
| Row | What to write |
|---|---|
| Field(s) | The column names you want to display (or * for all columns) |
| Table(s) | The table name(s) the fields come from |
| Search criteria | The conditions (the WHERE clause): comparisons, ranges, AND, OR, and the join condition for linked tables |
| Sort order | Which field to sort by and whether ASC or DESC. Leave blank if no sorting is required. |
The sort order row is left blank when the query has no sorting requirement — do not write "None" or "N/A", just leave the cell empty.
Request: Display the title and genre of all films with a rating of 8.0 or higher.
| Row | Value |
|---|---|
| Field(s) | title, genre |
| Table(s) | film |
| Search criteria | rating >= 8.0 |
| Sort order | (blank — no sort required) |
The SQL: SELECT title, genre FROM film WHERE rating >= 8.0;
Request: Display the title, year, and rating of all Drama films. Show newest films first.
| Row | Value |
|---|---|
| Field(s) | title, year, rating |
| Table(s) | film |
| Search criteria | genre = "Drama" |
| Sort order | year DESC |
The SQL: SELECT title, year, rating FROM film WHERE genre = "Drama" ORDER BY year DESC;
Request: Display the film title and the director's last name for all films released after 2010. Sort alphabetically by last name.
| Row | Value |
|---|---|
| Field(s) | title, lastName |
| Table(s) | film, director |
| Search criteria | film.director_id = director.director_id AND year > 2010 |
| Sort order | lastName ASC |
When two tables are involved, both names appear in the Table(s) row. The join condition (the link between primary key and foreign key) is listed as the first item in Search criteria. Additional filters follow with AND.
Data dictionary: Create a table with Entity, Attribute, Key Type, Data Type, Size, and Validation. The N5 spec covers one or two entities — you will not be asked to design for three or more tables. Include at least one of each of the four validation types (presence check, restricted choice, field length, range). Explain why each validation is necessary. Weak answers list attributes without validation; strong answers explain the rules.
Query design: In the 2024 paper (Q16b) pupils were asked to complete a query design table — worth 4 marks. Always fill in all four rows. The sort order row is left blank when no sorting is needed. When a join is involved, both table names go in the Table(s) row and the join condition (FK = PK) goes first in the Search criteria row.
1. Which validation type prevents a user from leaving a field blank? TYPE 1
2. A field is defined as VARCHAR(25). Which validation does this enforce? TYPE 1
3. What is referential integrity? TYPE 1
4. Which is NOT one of the four N5 validation types? TYPE 1
5. Describe what a data dictionary is and explain why databases need one. TYPE 2
Write 2-3 sentences.
6. A Mark field should only accept 0–100. What validation type applies and how would you specify it? TYPE 2
Write 1-2 sentences.
7. Explain what referential integrity means and how it prevents orphaned records in a database. TYPE 2
Write 2-3 sentences.
8. A library database stores information about books and borrowers. The Book table has attributes: ISBN (primary key), title, genre, publicationYear. The Borrower table has attributes: borrowerID (primary key), firstName, lastName, ISBN (foreign key). Design a data dictionary for both tables. For each attribute include: Entity, Attribute, Key Type, Data Type, Size, and Validation. Include at least one of each of the four validation types across the two tables. TYPE 3
Create a complete table covering all 8 attributes.
9. Complete the query design table for this request: "Display the title and certificate of all films rated 7.5 or higher, sorted alphabetically by title." TYPE 2
Draw or write the 4-row table with correct values for Field(s), Table(s), Search criteria, and Sort order.
| Row | Value |
|---|---|
| Field(s) | title, certificate |
| Table(s) | film |
| Search criteria | rating >= 7.5 |
| Sort order | title ASC |
10. Complete the query design table for this request: "Display the film title and the director's last name for PG films only. Sort by last name A to Z." (This query uses both the film and director tables.) TYPE 2
Note: in the films database, the foreign key is film.director_id and the primary key is director.director_id.
| Row | Value |
|---|---|
| Field(s) | title, lastName |
| Table(s) | film, director |
| Search criteria | film.director_id = director.director_id AND certificate = "PG" |
| Sort order | lastName ASC |
Lesson duration: 50–60 minutes. Work through examples step-by-step with pupils.
Key points:
- Data dictionary is a specification, not just a list. Every field needs validation.
- Validation is a design decision, not an afterthought.
- Referential integrity (FK constraints) is a separate design concept, not a fifth validation type. The spec has exactly four: presence check, restricted choice, field length, range.
- Validation prevents garbage-in-garbage-out (GIGO).
Common misconceptions: Students confuse data type with validation. Some think foreign keys are optional. Underestimate complexity of validation rules.
Past paper tips: Q5 asks students to design a data dictionary. Weak answers list attributes without validation; strong answers create clear tables with rules.