- Explain why related data is often stored in two tables instead of one repeated table.
- Use primary key and foreign key fields to match records in an equi-join.
- Write SQL joins using
FROM Table1, Table2and a matching condition inWHERE. - Explain what referential integrity means and why a foreign key constraint enforces it.
- Use a query design table to plan a join query before writing SQL.
- I can identify the primary key and foreign key fields that should be matched.
- I can write a correct equi-join and add an extra filter using
AND. - I can explain why a missing join line produces repeated, incorrect results.
- I can explain referential integrity and describe what happens when a foreign key constraint is violated.
- I can complete a 4-row query design table (Field(s) / Table(s) / Search criteria / Sort order) for a join query.
Use handball.sql in DataGrip. It creates two linked tables: Team and Player.
Answer before the lesson begins. These recap keys, fields, and filtering from earlier DDD lessons.
1. What is the main job of a primary key?
2. In the handball database, Team.teamName is the primary key. Why does Player.teamName store a team name too?
3. Which condition finds players who scored more than 20 goals?
Key vocabulary
=, such as Player.teamName = Team.teamName.Player.teamName. This avoids confusion when two tables have a field with the same name.WHERE line that tells SQL how records from the two tables should match.Joining tables
Why two tables are better than one repeated table
The handball database stores teams in one table and players in another. At first, it can feel easier to put every team detail beside every player: player name, position, goals, team town, training venue, email address, and contact person all in one huge table. The problem is repetition. If five players belong to Edinburgh Eagles, the town, venue, contact person, and email address would be repeated five times. If the venue changed, every repeated copy would need to be updated. One missed update would make the data inconsistent.
A relational database avoids that by storing each type of thing once. The Team table stores one record per team. The Player table stores one record per player. The link is the team name: Team.teamName is the primary key, and Player.teamName is a foreign key that points to the matching team.
teamName PKtowntrainingVenue→
playerID PKfirstname, surnameteamName FKReferential integrity
Referential integrity is a rule enforced by a foreign key constraint: every value stored in a foreign key field must match an existing primary key value in the linked table. In the handball database, Player.teamName is a foreign key. Referential integrity means you cannot enter a player with a team name that does not exist in the Team table.
This matters in two directions:
- Adding records: If you try to INSERT a player with
teamName = "Fake FC"and no such team exists, MySQL refuses the INSERT. The foreign key prevents the orphaned record from being created. - Deleting records: If you try to DELETE a team that still has players listed under it, MySQL refuses the DELETE. You must remove the child records (players) before you can remove the parent record (team).
Why set up referential integrity before adding records? Once data is in the database, it is harder to enforce. Setting up the foreign key constraint first means every new record is checked automatically. If you add records first and apply the constraint later, you may find existing data already violates it — and the constraint will fail to activate.
In the handball.sql file, referential integrity is already set up: the Player table has a FOREIGN KEY (teamName) REFERENCES Team(teamName) line. This is why MySQL prevents invalid player inserts automatically.
The N5 join pattern
In this course, we will use equi-joins only. That means the two tables are joined by checking that one field is equal to another field. The structure is deliberately simple:
FROM Table1, Table2
WHERE Table1.matchingField = Table2.matchingField;
The FROM clause names both tables, separated by a comma. The WHERE clause then gives the join condition. Without that condition, SQL does not know which player belongs with which team. When a field name appears in both tables, write the table name first, such as Player.teamName and Team.teamName. This is called qualifying the field name.
A first join
Suppose we want each player's name and the town of their team. The player name is in the Player table, but the town is in the Team table. One table is not enough, so the query needs a join.
FROM Player, Team
WHERE Player.teamName = Team.teamName;
Read the join line aloud as: "only show combinations where the player's team name is the same as the team's team name." This keeps each player with their real team. The result has one sensible row per player.
Adding an extra filter with AND
A join condition is still a condition inside WHERE. If the question also asks for a filter, add it with AND. Keep the join line first so it is easy to spot, then add the extra filter below it.
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "East";
This says: first match each player to their correct team, then only keep records where the team is in the East district. The join condition and the district filter must both be true.
What goes wrong when the join line is missing
The most important mistake in this lesson is leaving out the join condition. If the query says FROM Player, Team but has no WHERE Player.teamName = Team.teamName, SQL combines every player with every team. This is called a Cartesian product. It looks like a result, but it is wrong because most of the combinations are fake.
Worked examples
Task: Display each player's first name, surname, and team town. Sort by surname A to Z.
Player. The town is in Team, so both tables are needed.Player.teamName with the primary key Team.teamName.ORDER BY Player.surname ASC because the task asks for surname order.FROM Player, Team
WHERE Player.teamName = Team.teamName
ORDER BY Player.surname ASC;
Verified result: 29 records are returned. The first three surnames are Ahmed, Baxter, and Benedetti.
Task: Display the first name, surname, position, team name, and district for players whose team is in the East district. Sort by team name, then surname.
Player.teamName = Team.teamName.AND Team.district = "East".FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "East"
ORDER BY Team.teamName ASC, Player.surname ASC;
Verified result: 8 records are returned. The teams are Dundee Dynamos and Edinburgh Eagles.
Task: Find Division 1 players who scored at least 30 goals. Display first name, surname, goals scored, team name, and league division. Sort highest goals first.
Team, while goals are stored in Player, so this must be a join.AND Team.leagueDivision = 1 and AND Player.goalsScored >= 30.Player.goalsScored DESC so the top scorers appear first.FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.leagueDivision = 1
AND Player.goalsScored >= 30
ORDER BY Player.goalsScored DESC;
Verified result: 6 records are returned. The first three are Eilidh Buchanan (42), Sienna Park (41), and Luca Benedetti (38).
Task: Predict what is wrong with this query.
FROM Player, Team;
WHERE line telling SQL how to match them.WHERE Player.teamName = Team.teamName.FROM Player, Team
WHERE Player.teamName = Team.teamName;
Query Design for Join Queries
When a query needs fields from two tables, use the query design table as usual — but the Table(s) row lists both table names, and the Search criteria row always starts with the join condition (FK = PK) before any other filters.
Request: Display the first name, surname, and team town for all players in the East district. Sort by surname A to Z.
| Row | Value |
|---|---|
| Field(s) | firstname, surname, town |
| Table(s) | Player, Team |
| Search criteria | Player.teamName = Team.teamName AND Team.district = "East" |
| Sort order | surname ASC |
The join condition appears first in Search criteria. The district filter follows with AND. Both table names appear in Table(s). The resulting SQL maps directly from each row: SELECT → FROM → WHERE → ORDER BY.
Request: Display the first name, surname, goals scored, and league division for players who scored 30 or more goals. No sort required.
| Row | Value |
|---|---|
| Field(s) | firstname, surname, goalsScored, leagueDivision |
| Table(s) | Player, Team |
| Search criteria | Player.teamName = Team.teamName AND Player.goalsScored >= 30 |
| Sort order | (blank) |
When no sorting is needed, the Sort order row is left blank. The field leagueDivision comes from Team — so even though the display doesn't seem to need Team data, a join is still required to access it.
A coach wants a list of all goalkeepers and the training venue for each goalkeeper's team. Display firstname, surname, teamName, and trainingVenue. Sort by team name A to Z.
Answer the following:
- Which two tables are needed?
- Which join condition matches the records correctly?
- Write the complete SQL query.
- Use
Playerfor the player name and position. UseTeamfor the training venue. - The join condition is
Player.teamName = Team.teamName. - SELECT Player.firstname, Player.surname, Team.teamName, Team.trainingVenue
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Player.position = "Goalkeeper"
ORDER BY Team.teamName ASC;
Verified result: 4 records are returned — Airdrie Lions, Borders Bandits, Clyde Flyers, and Edinburgh Eagles.
FROM Player, Team on its own combines every player with every team. Always add WHERE Player.teamName = Team.teamName.Player.playerID = Team.teamName makes no sense.teamName. Write Player.teamName and Team.teamName so the query is clear.Team.district = "West" is not a join. Keep the join line, then add filters with AND.For a join question, build the answer in this order: choose the fields for SELECT, write both table names in FROM, write the PK/FK match as the first WHERE line, then add any extra filters with AND. In an exam, the missing join condition is the line most likely to lose marks.
Referential integrity questions often ask you to explain or describe — give two facts: (1) every FK value must match an existing PK value; (2) if you try to insert a record whose FK has no matching PK, the database rejects it.
If asked to complete a query design table for a join, remember: list both table names in the Table(s) row, and write the PK/FK matching condition (e.g. Player.teamName = Team.teamName) as the first line of the Search criteria row, then add any other filters below it.
Questions 1–5 are auto-checked. Questions 6–7 are self-marked prediction/debugging questions. Question 7 is a query design table. Questions 8–9 are practical DataGrip tasks. Question 10 is a self-marked referential integrity question.
1. Which pair of fields correctly joins the Player and Team tables? TYPE 1
2. What does this line do? WHERE Player.teamName = Team.teamName TYPE 1
3. Which FROM clause uses both tables needed for a player/team join? TYPE 1
4. The handball database has 29 players and 8 teams. If a query says FROM Player, Team but forgets the join condition, how many rows are produced? TYPE 1
5. Which extra line would filter a correct join to West district teams only? TYPE 1
6. Debugging: Explain what is wrong with this query and write the missing line. TYPE 2
FROM Player, Team
AND Player.position = "Goalkeeper";
The query has both tables but no join condition. It also tries to use AND before there is a WHERE clause. The missing line is:
Then the goalkeeper filter can follow with AND Player.position = "Goalkeeper".
7. Complete the query design table for this request: "Display the first name, surname, and league division for all players who play for a West district team. Sort by surname A to Z." TYPE 2
Fill in all four rows of the query design table.
| Row | Value |
|---|---|
| Field(s) | firstname, surname, leagueDivision |
| Table(s) | Player, Team |
| Search criteria | Player.teamName = Team.teamName AND Team.district = "West" |
| Sort order | surname ASC |
Note: leagueDivision is stored in Team, so even though it appears as a display field rather than a filter, you still need a join to access it.
8. Practical: In DataGrip, write and run a join query to find West district players who scored more than 20 goals. Display firstname, surname, goalsScored, and district. Sort by goals scored highest first. Paste your SQL and record the returned players. TYPE 3
Practical Task 1 of 2 — Join plus district filter
handball database in DataGrip. Keep the join condition as the first WHERE line, then add the district and goals filters with AND.- Paste the SQL and list the returned players with their goals.
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "West"
AND Player.goalsScored > 20
ORDER BY Player.goalsScored DESC;
Verified result: 4 records are returned — Sienna Park (41), Jack Edwards (37), Lewis MacDonald (29), and Aisha Munro (22).
9. Practical: In DataGrip, write and run a join query to list all right wing players with their team town and team email address. Display firstname, surname, teamName, town, and emailAddress. Sort by team name A to Z. Paste your SQL and record how many records are returned. TYPE 3
Practical Task 2 of 2 — Join for contact details
Player. The town and email address are stored in Team. This is a join because the answer needs fields from both tables.- Paste the SQL and record the number of rows returned by DataGrip.
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Player.position = "Right wing"
ORDER BY Team.teamName ASC;
Verified result: 5 records are returned — Airdrie Lions, Borders Bandits, Clyde Flyers, Edinburgh Eagles, and Harris Hurricanes.
10. The Player table has a foreign key teamName that references the Team table. A pupil tries to insert a new player record with teamName = "Stirling Stars", but there is no team called "Stirling Stars" in the Team table. TYPE 2
(a) What will the database do when the pupil runs the INSERT statement? Explain why.
(b) What is the term for this rule that prevents invalid foreign key values being stored?
(c) A different pupil tries to delete the row for "Clyde Flyers" from the Team table, but several players have teamName = "Clyde Flyers". What will the database do, and why?
(a) The database will reject the INSERT and return an error. Because of the foreign key constraint, every value stored in Player.teamName must match an existing value in Team.teamName. "Stirling Stars" does not exist in the Team table, so the constraint is violated and the record is not added.
(b) Referential integrity.
(c) The database will reject the DELETE. The foreign key constraint works in both directions: you cannot delete a parent record while child records still reference it. Deleting "Clyde Flyers" from Team would leave the player records with a teamName that no longer exists, which would violate referential integrity. The players must be deleted (or moved to another team) first.
Suggested timing: 60 minutes. Warm up and vocabulary 10 min; notes with PK/FK visual 15 min; live DataGrip missing-join demo 10 min; worked examples 12 min; now you try and task set 13 min.
Practical setup: Pupils need Laragon running, DataGrip open, and the handball database installed from handball.sql. Use a query console connected to the handball database.
Key misconception to address: All pupils may think listing two tables in FROM is enough. Demonstrate that the join condition is the important matching instruction, not just a filter.
Live demo suggestion: Run SELECT Player.firstname, Player.surname, Team.town FROM Player, Team; first and show the 232-row incorrect result. Then add WHERE Player.teamName = Team.teamName and show the result drop to 29 sensible rows.
Extension question: Ask all pupils to write a join that finds Division 1 goalkeepers and displays their team contact person. Then ask which table each displayed field came from.
Referential integrity (Q10): A common exam question type. Pupils often understand the concept but struggle to give both directions — blocked INSERT (FK value doesn't exist in parent) and blocked DELETE (parent has dependent child records). Model answer (c) is the harder direction; worth explicitly teaching both in class.
Query design table (Q7): The key distinction from a single-table query is that both table names appear in Table(s), and the first line of Search criteria is always the PK/FK join condition, not the filter. Pupils can lose marks by omitting the join condition row entirely.
SQA command words covered: "identify", "describe", "explain", "write", "predict", "debug". Emphasise that "write a query" requires the full SQL statement in the correct clause order.