DDD  ·  Database Design & Development

Data Dictionary and Validation

Lesson DDD4 of 10 Approx 50 min Design data validation rules to maintain data quality

Learning Intentions

  • Understand what a data dictionary is and why databases need one
  • Identify the four types of validation: presence check, restricted choice, field length, range
  • Create a complete data dictionary with all required fields and validation rules
  • Explain how validation enforces data quality, and how referential integrity (a separate design concept) protects linked tables
  • Use a query design table to plan a SELECT query before writing SQL

Success Criteria

  • You can name all four validation types with a real-world example for each
  • You can create a complete data dictionary for a one or two-entity database
  • You can identify which validation type to apply to each attribute
  • You can explain why validation prevents errors and protects data integrity
  • You can complete a 4-row query design table (Field(s), Table(s), Search criteria, Sort order) for single and two-table queries
🔥 Warm-up
Before we start, test what you already know.

1. What is a data dictionary? TYPE 1

2. Which validation type ensures a field is not left blank? TYPE 1

3. A field should only accept the values "Bronze", "Silver", or "Gold". Which validation type applies? TYPE 1

Key vocabulary

Data Dictionary
A document describing the structure of a database: all tables, fields, data types, constraints, and validation rules.
Validation
Rules enforced to ensure data entered into a database is valid, accurate, and complete before storage.
Presence Check
Validation rule that ensures a field is not left empty; the field must contain a value.
Restricted Choice
Validation rule that limits entries to a predefined set of allowed values (e.g., Grade: A, B, C, D, E only).
Field Length
Validation rule that limits how many characters can be entered into a text field.
Range Validation
Validation rule that ensures a numeric or date value falls within acceptable minimum and maximum limits.
Foreign Key
A field in one table that references the primary key of another table, enforcing referential integrity.
Referential Integrity
A database constraint ensuring foreign key values match existing primary key values in related tables.
Data Type
The category of data a field can store: Text, Integer, Decimal, Date, Boolean, etc.
Attribute
A column in a table; a property or characteristic of an entity.

Data Dictionary and Validation

What is a Data Dictionary?

A data dictionary is a document (often a table) that describes every aspect of a database's structure. It answers: What tables exist? What columns does each table have? What type of data does each column hold? What rules prevent bad data from being entered?

A data dictionary is created during the Design phase and updated as the database evolves. It serves as:

  • Documentation: Future developers understand the database without reading code.
  • Specification: Database developers know exactly what to build.
  • Quality assurance: Testers know what validation rules to test.
  • Audit trail: If data looks wrong, the data dictionary shows what rules should have prevented it.

Components of a Data Dictionary

A complete data dictionary includes columns for:

  • Entity: The table name (e.g., Student, Course, Enrollment).
  • Attribute: The field/column name (e.g., StudentID, FirstName, DateOfBirth).
  • Key Type: Is this a Primary Key (PK), Foreign Key (FK), or neither?
  • Data Type: Text, Integer, Decimal, Date, Boolean, etc.
  • Size: For text, max length (e.g., VARCHAR(50)). For numbers, precision (Integer 0–32,767).
  • Validation Rule: What constraints apply? (e.g., "Presence check," "Range 0–100," "Restricted to A–F").

The Four Validation Types

Data dictionary extract — Student entity Attribute Key Data Type Size Validation Rule StudentID PK INT Presence check — must not be NULL FirstName VARCHAR 30 Field length — max 30 characters Grade VARCHAR 1 Restricted choice — A, B, C, D, E or F TestScore INT Range — must be 0 to 100 inclusive

1. Presence Check

A presence check ensures a field is not left blank. If a field is mandatory, the database rejects any record where that field is empty.

Example: StudentID is a primary key, so it must have a value. If someone tries to add a student without an ID, the database rejects the entry.

  • Rule: StudentID must not be NULL
  • Error: "Student ID is required"

2. Restricted Choice

A restricted choice validation limits entries to a predefined set of allowed values. The field only accepts values from a list.

Example: A Grade field should only accept A, B, C, D, E, or F. Entering "X" is rejected.

  • Rule: Grade must be one of {A, B, C, D, E, F}
  • Error: "Grade must be A, B, C, D, E, or F"

3. Field Length Validation

Field length validation limits how many characters can be entered into a text field. This prevents data that's too long for the field.

Example: A FirstName field is defined as VARCHAR(30), meaning up to 30 characters. A 9-character name is accepted. A 50-character name is rejected.

  • Rule: FirstName must be ≤ 30 characters
  • Error: "First name cannot exceed 30 characters"

4. Range Validation

Range validation ensures numeric or date values fall within acceptable limits. Test scores must be 0–100, dates must be between 2020 and 2030, ages must be 0–120.

Example: A TestScore field should only accept 0–100. Entering –5 or 150 is rejected.

  • Rule: TestScore must be ≥ 0 AND ≤ 100
  • Error: "Test score must be between 0 and 100"
The four validation types at a glance 1. Presence Check Field must not be left blank (not NULL) e.g. StudentID — required for every record ✓ ACCEPTED 1042 ✗ REJECTED (nothing entered) 2. Restricted Choice Value must come from a fixed list of options e.g. Grade must be one of: A B C D E F ✓ ACCEPTED B ✗ REJECTED X 3. Field Length Text value must not exceed max character count e.g. FirstName VARCHAR(30) — max 30 chars ✓ 9 chars — OK "Alexandra" ✗ 35 chars — too long "Bartholomew-Fitzgerald"… 4. Range Validation Number or date must fall within min–max limits e.g. TestScore must be 0 to 100 0 100 ✓ valid range −5 150
What about foreign keys?

Foreign keys are not a validation type — they are a separate design concept called referential integrity. The database enforces that every foreign key value must match an existing primary key in the linked table. In the data dictionary, the Validation column for a foreign key field reads: "Existing [fieldName] from [TableName] table" — for example, "Existing teamName from Team table". Referential integrity is set up using a FOREIGN KEY constraint when the table is created, not through one of the four validation types above.

Why Validation Matters

Validation prevents garbage-in-garbage-out (GIGO). Without validation:

  • Users might leave required fields blank, creating incomplete records.
  • Typos might be entered (e.g., Grade="X" instead of "A").
  • Out-of-range values might be stored (e.g., Age=–5).
  • Orphaned records might exist (e.g., enrollment for a deleted student).
  • Reports and analysis become unreliable because data is inconsistent.

Validation catches errors before they're stored, ensuring data quality. This is much cheaper than fixing bad data later.

Worked examples

Example 1: Complete Data Dictionary for a Library System

Scenario: Design a data dictionary for a library that tracks books, borrowers, and loans.

1
Identify entities: Book, Borrower, Loan
2
Identify attributes: Book (ISBN, Title, Author, YearPublished, Copies); Borrower (BorrowerID, FirstName, LastName, MembershipDate); Loan (LoanID, BorrowerID FK, ISBN FK, DateBorrowed, DueDate)
3
Create data dictionary with validation rules for each attribute.
Example 2: Applying Validation to a Student Grading System

Scenario: Design validation rules for a Grades table with GradeID, StudentID (FK), CourseID (FK), Mark, Grade, DateRecorded.

1
GradeID (Primary Key): Presence check, Auto-increment
2
StudentID (Foreign Key): Must match StudentID in Student table (prevents grades for non-existent students)
3
CourseID (Foreign Key): Must match CourseID in Course table
4
Mark (Numeric): Range 0–100 (prevents nonsensical values like –5 or 150)
5
Grade (Letter): Restricted choice {A, B, C, D, E, F} (prevents typos like "X")
6
DateRecorded (Date): Presence check, Range ≤ today (can't record future grades)
Example 3: Preventing Orphaned Records with Referential Integrity

Scenario: A Booking table has CustomerID referencing Customer table. What happens without and with foreign key validation?

1
Without foreign key validation: Admin deletes customer 123. Now the booking for CustomerID=123 is orphaned—it references a non-existent customer. Reports fail.
2
With foreign key validation: Database rejects deletion: "Cannot delete customer 123: 3 bookings reference this customer." Admin must delete bookings first, or use cascade delete to delete both.
3
Result: Foreign key validation prevents orphaned records and maintains data integrity.
Now you try

Design validation rules for a Hospital Appointment System with Patient, Doctor, and Appointment tables.

Attributes to validate: PatientID, FirstName, LastName, DateOfBirth, PhoneNumber (11 chars), DoctorID, Speciality, AppointmentDate, AppointmentTime

Answer:

  • PatientID: Presence check, Primary key, Auto-increment
  • FirstName/LastName: Presence check, Field length ≤ 50
  • DateOfBirth: Presence check, Range (0–120 years old)
  • PhoneNumber: Presence check, Field length = 11
  • DoctorID (FK): Foreign key → Doctor.DoctorID
  • Speciality: Presence check, Restricted choice (Cardiology, Surgery, etc.)
  • AppointmentDate: Presence check, Range ≥ today
  • AppointmentTime: Presence check, Restricted choice (09:00, 10:00, 14:00, etc.)
⚠️ Common mistakes
Confusing data type with validation: "The field is Text" is a data type. "Field length ≤ 50" is validation. A data dictionary must include both.
Forgetting presence checks on required fields: If a field is mandatory (like StudentID), it must have a presence check. Without one, a record can be created with blank StudentID.
Calling referential integrity a "validation type": Referential integrity (enforced by a foreign key constraint) is a separate database design concept — it is not one of the four N5 validation types. In the data dictionary, write the FK row's Validation column as "Existing [fieldName] from [TableName] table".
Backwards range validation: Students sometimes write "Range 0" and "Range 100" separately. Write "Range 0–100" or "0 ≤ Mark ≤ 100" for clarity.
Over-validating: "Field length ≤ 1000000" for a name field is overkill. Use reasonable limits (50 characters for names).

Query Design

A query design is a planning table you fill in before writing SQL. Just as a data dictionary plans the database structure before it is built, a query design plans what a SELECT query should do. It uses a 4-row table that you complete top to bottom, then translate directly into SQL.

RowWhat to write
Field(s)The column names you want to display (or * for all columns)
Table(s)The table name(s) the fields come from
Search criteriaThe conditions (the WHERE clause): comparisons, ranges, AND, OR, and the join condition for linked tables
Sort orderWhich field to sort by and whether ASC or DESC. Leave blank if no sorting is required.

The sort order row is left blank when the query has no sorting requirement — do not write "None" or "N/A", just leave the cell empty.

Query Design Example 1 — single table, no sort

Request: Display the title and genre of all films with a rating of 8.0 or higher.

RowValue
Field(s)title, genre
Table(s)film
Search criteriarating >= 8.0
Sort order(blank — no sort required)

The SQL: SELECT title, genre FROM film WHERE rating >= 8.0;

Query Design Example 2 — single table, with sort

Request: Display the title, year, and rating of all Drama films. Show newest films first.

RowValue
Field(s)title, year, rating
Table(s)film
Search criteriagenre = "Drama"
Sort orderyear DESC

The SQL: SELECT title, year, rating FROM film WHERE genre = "Drama" ORDER BY year DESC;

Query Design Example 3 — two tables (join)

Request: Display the film title and the director's last name for all films released after 2010. Sort alphabetically by last name.

RowValue
Field(s)title, lastName
Table(s)film, director
Search criteriafilm.director_id = director.director_id AND year > 2010
Sort orderlastName ASC

When two tables are involved, both names appear in the Table(s) row. The join condition (the link between primary key and foreign key) is listed as the first item in Search criteria. Additional filters follow with AND.

💡 Exam tip

Data dictionary: Create a table with Entity, Attribute, Key Type, Data Type, Size, and Validation. The N5 spec covers one or two entities — you will not be asked to design for three or more tables. Include at least one of each of the four validation types (presence check, restricted choice, field length, range). Explain why each validation is necessary. Weak answers list attributes without validation; strong answers explain the rules.

Query design: In the 2024 paper (Q16b) pupils were asked to complete a query design table — worth 4 marks. Always fill in all four rows. The sort order row is left blank when no sorting is needed. When a join is involved, both table names go in the Table(s) row and the join condition (FK = PK) goes first in the Search criteria row.

Task set

1. Which validation type prevents a user from leaving a field blank? TYPE 1

2. A field is defined as VARCHAR(25). Which validation does this enforce? TYPE 1

3. What is referential integrity? TYPE 1

4. Which is NOT one of the four N5 validation types? TYPE 1

5. Describe what a data dictionary is and explain why databases need one. TYPE 2

Write 2-3 sentences.

6. A Mark field should only accept 0–100. What validation type applies and how would you specify it? TYPE 2

Write 1-2 sentences.

7. Explain what referential integrity means and how it prevents orphaned records in a database. TYPE 2

Write 2-3 sentences.

8. A library database stores information about books and borrowers. The Book table has attributes: ISBN (primary key), title, genre, publicationYear. The Borrower table has attributes: borrowerID (primary key), firstName, lastName, ISBN (foreign key). Design a data dictionary for both tables. For each attribute include: Entity, Attribute, Key Type, Data Type, Size, and Validation. Include at least one of each of the four validation types across the two tables. TYPE 3

Create a complete table covering all 8 attributes.

9. Complete the query design table for this request: "Display the title and certificate of all films rated 7.5 or higher, sorted alphabetically by title." TYPE 2

Draw or write the 4-row table with correct values for Field(s), Table(s), Search criteria, and Sort order.

RowValue
Field(s)title, certificate
Table(s)film
Search criteriarating >= 7.5
Sort ordertitle ASC

10. Complete the query design table for this request: "Display the film title and the director's last name for PG films only. Sort by last name A to Z." (This query uses both the film and director tables.) TYPE 2

Note: in the films database, the foreign key is film.director_id and the primary key is director.director_id.

RowValue
Field(s)title, lastName
Table(s)film, director
Search criteriafilm.director_id = director.director_id AND certificate = "PG"
Sort orderlastName ASC
Teacher notes — Shift+T to hide

Lesson duration: 50–60 minutes. Work through examples step-by-step with pupils.

Key points:

  • Data dictionary is a specification, not just a list. Every field needs validation.
  • Validation is a design decision, not an afterthought.
  • Referential integrity (FK constraints) is a separate design concept, not a fifth validation type. The spec has exactly four: presence check, restricted choice, field length, range.
  • Validation prevents garbage-in-garbage-out (GIGO).

Common misconceptions: Students confuse data type with validation. Some think foreign keys are optional. Underestimate complexity of validation rules.

Past paper tips: Q5 asks students to design a data dictionary. Weak answers list attributes without validation; strong answers create clear tables with rules.