DDD  ·  Database Design & Development

SQL: Joining Tables

Lesson DDD9 of 10 Approx 60 min Handball database required
Learning intentions
  • Explain why related data is often stored in two tables instead of one repeated table.
  • Use primary key and foreign key fields to match records in an equi-join.
  • Write SQL joins using FROM Table1, Table2 and a matching condition in WHERE.
  • Explain what referential integrity means and why a foreign key constraint enforces it.
  • Use a query design table to plan a join query before writing SQL.
Success criteria
  • I can identify the primary key and foreign key fields that should be matched.
  • I can write a correct equi-join and add an extra filter using AND.
  • I can explain why a missing join line produces repeated, incorrect results.
  • I can explain referential integrity and describe what happens when a foreign key constraint is violated.
  • I can complete a 4-row query design table (Field(s) / Table(s) / Search criteria / Sort order) for a join query.
🧱
Handball database required for this lesson

Use handball.sql in DataGrip. It creates two linked tables: Team and Player.

↓ Download handball.sql
Warm up — what do you already know?

Answer before the lesson begins. These recap keys, fields, and filtering from earlier DDD lessons.

1. What is the main job of a primary key?

2. In the handball database, Team.teamName is the primary key. Why does Player.teamName store a team name too?

3. Which condition finds players who scored more than 20 goals?

Key vocabulary

Join
A query that uses related records from more than one table.
Equi-join
A join where matching fields are compared using =, such as Player.teamName = Team.teamName.
Primary key
The field that uniquely identifies each record in its own table.
Foreign key
A field that stores a value from another table's primary key so the two tables can be linked.
Qualified field name
A field written with its table name, for example Player.teamName. This avoids confusion when two tables have a field with the same name.
Join condition
The WHERE line that tells SQL how records from the two tables should match.

Joining tables

Why two tables are better than one repeated table

The handball database stores teams in one table and players in another. At first, it can feel easier to put every team detail beside every player: player name, position, goals, team town, training venue, email address, and contact person all in one huge table. The problem is repetition. If five players belong to Edinburgh Eagles, the town, venue, contact person, and email address would be repeated five times. If the venue changed, every repeated copy would need to be updated. One missed update would make the data inconsistent.

A relational database avoids that by storing each type of thing once. The Team table stores one record per team. The Player table stores one record per player. The link is the team name: Team.teamName is the primary key, and Player.teamName is a foreign key that points to the matching team.

Team
teamName PK
town
trainingVenue
matches
Player
playerID PK
firstname, surname
teamName FK

Referential integrity

Referential integrity is a rule enforced by a foreign key constraint: every value stored in a foreign key field must match an existing primary key value in the linked table. In the handball database, Player.teamName is a foreign key. Referential integrity means you cannot enter a player with a team name that does not exist in the Team table.

This matters in two directions:

  • Adding records: If you try to INSERT a player with teamName = "Fake FC" and no such team exists, MySQL refuses the INSERT. The foreign key prevents the orphaned record from being created.
  • Deleting records: If you try to DELETE a team that still has players listed under it, MySQL refuses the DELETE. You must remove the child records (players) before you can remove the parent record (team).

Why set up referential integrity before adding records? Once data is in the database, it is harder to enforce. Setting up the foreign key constraint first means every new record is checked automatically. If you add records first and apply the constraint later, you may find existing data already violates it — and the constraint will fail to activate.

In the handball.sql file, referential integrity is already set up: the Player table has a FOREIGN KEY (teamName) REFERENCES Team(teamName) line. This is why MySQL prevents invalid player inserts automatically.

The N5 join pattern

In this course, we will use equi-joins only. That means the two tables are joined by checking that one field is equal to another field. The structure is deliberately simple:

SELECT fields to display
FROM Table1, Table2
WHERE Table1.matchingField = Table2.matchingField;

The FROM clause names both tables, separated by a comma. The WHERE clause then gives the join condition. Without that condition, SQL does not know which player belongs with which team. When a field name appears in both tables, write the table name first, such as Player.teamName and Team.teamName. This is called qualifying the field name.

The N5 join pattern — every part labelled SELECT fields to display FROM Player, Team WHERE Player.teamName = Team.teamName ; Which fields to show (use Table.field format) List BOTH tables, separated by a comma The join condition — NEVER omit this line FK in Player table must equal PK in Team table Omitting the WHERE join condition produces a Cartesian product — 29 × 8 = 232 wrong rows

A first join

Suppose we want each player's name and the town of their team. The player name is in the Player table, but the town is in the Team table. One table is not enough, so the query needs a join.

SELECT Player.firstname, Player.surname, Team.town
FROM Player, Team
WHERE Player.teamName = Team.teamName;

Read the join line aloud as: "only show combinations where the player's team name is the same as the team's team name." This keeps each player with their real team. The result has one sensible row per player.

Row matching: WHERE Player.teamName = Team.teamName Player name teamName (FK) Jack Edwards Edinburgh Eagles Luca Benedetti Clyde Flyers Sienna Park Edinburgh Eagles Team teamName (PK) town Edinburgh Eagles Edinburgh Clyde Flyers Glasgow Paisley Panthers Paisley ← no Player has this teamName ↓ Result — 3 rows returned (each player matched to their team's town) Player.firstname Player.surname Team.town Jack Edwards Edinburgh Luca Benedetti Glasgow Sienna Park Edinburgh

Adding an extra filter with AND

A join condition is still a condition inside WHERE. If the question also asks for a filter, add it with AND. Keep the join line first so it is easy to spot, then add the extra filter below it.

SELECT Player.firstname, Player.surname, Team.teamName, Team.district
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "East";

This says: first match each player to their correct team, then only keep records where the team is in the East district. The join condition and the district filter must both be true.

What goes wrong when the join line is missing

The most important mistake in this lesson is leaving out the join condition. If the query says FROM Player, Team but has no WHERE Player.teamName = Team.teamName, SQL combines every player with every team. This is called a Cartesian product. It looks like a result, but it is wrong because most of the combinations are fake.

Missing join line demonstration: the handball database has 29 player records and 8 team records. Without the join line, SQL produces 29 × 8 = 232 rows. Jack Edwards appears once with Clyde Flyers, but also incorrectly with Paisley Panthers, Airdrie Lions, and every other team.
Missing join line vs correct join line ✗ No join condition FROM Player, Team — no WHERE join line Player row Jack Edwards Edinburgh Eagles ✗ Clyde Flyers ✗ Paisley Panthers ✗ Airdrie Lions ✗ + 4 more teams ✗ 29 players × 8 teams = 232 rows ✗ ✓ With join condition WHERE Player.teamName = Team.teamName Player row Jack Edwards Edinburgh Eagles teamName matches ✓ 7 other teams excluded — no match with Jack Edwards Clyde Flyers — excluded 29 players, 1 team each = 29 rows ✓

Worked examples

Example 1 — join players to team towns

Task: Display each player's first name, surname, and team town. Sort by surname A to Z.

1
The player names are in Player. The town is in Team, so both tables are needed.
2
Join the tables by matching the foreign key Player.teamName with the primary key Team.teamName.
3
Add ORDER BY Player.surname ASC because the task asks for surname order.
SELECT Player.firstname, Player.surname, Team.town
FROM Player, Team
WHERE Player.teamName = Team.teamName
ORDER BY Player.surname ASC;

Verified result: 29 records are returned. The first three surnames are Ahmed, Baxter, and Benedetti.

Example 2 — join plus one text filter

Task: Display the first name, surname, position, team name, and district for players whose team is in the East district. Sort by team name, then surname.

1
Start with the same join: Player.teamName = Team.teamName.
2
Add the extra filter with AND Team.district = "East".
3
Use two sort fields because the question asks for team name order, then surname order.
SELECT Player.firstname, Player.surname, Player.position, Team.teamName, Team.district
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "East"
ORDER BY Team.teamName ASC, Player.surname ASC;

Verified result: 8 records are returned. The teams are Dundee Dynamos and Edinburgh Eagles.

Example 3 — join plus two number filters

Task: Find Division 1 players who scored at least 30 goals. Display first name, surname, goals scored, team name, and league division. Sort highest goals first.

1
The league division is stored in Team, while goals are stored in Player, so this must be a join.
2
After the join condition, add AND Team.leagueDivision = 1 and AND Player.goalsScored >= 30.
3
Sort by Player.goalsScored DESC so the top scorers appear first.
SELECT Player.firstname, Player.surname, Player.goalsScored, Team.teamName, Team.leagueDivision
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.leagueDivision = 1
AND Player.goalsScored >= 30
ORDER BY Player.goalsScored DESC;

Verified result: 6 records are returned. The first three are Eilidh Buchanan (42), Sienna Park (41), and Luca Benedetti (38).

Example 4 — missing join line

Task: Predict what is wrong with this query.

SELECT Player.firstname, Player.surname, Team.town
FROM Player, Team;
1
The query names both tables, but there is no WHERE line telling SQL how to match them.
2
SQL combines every player with every team, making 232 rows instead of 29 sensible rows.
3
Fix it by adding WHERE Player.teamName = Team.teamName.
SELECT Player.firstname, Player.surname, Team.town
FROM Player, Team
WHERE Player.teamName = Team.teamName;

Query Design for Join Queries

When a query needs fields from two tables, use the query design table as usual — but the Table(s) row lists both table names, and the Search criteria row always starts with the join condition (FK = PK) before any other filters.

Query Design — join example 1

Request: Display the first name, surname, and team town for all players in the East district. Sort by surname A to Z.

RowValue
Field(s)firstname, surname, town
Table(s)Player, Team
Search criteriaPlayer.teamName = Team.teamName AND Team.district = "East"
Sort ordersurname ASC

The join condition appears first in Search criteria. The district filter follows with AND. Both table names appear in Table(s). The resulting SQL maps directly from each row: SELECT → FROM → WHERE → ORDER BY.

Query Design — join example 2

Request: Display the first name, surname, goals scored, and league division for players who scored 30 or more goals. No sort required.

RowValue
Field(s)firstname, surname, goalsScored, leagueDivision
Table(s)Player, Team
Search criteriaPlayer.teamName = Team.teamName AND Player.goalsScored >= 30
Sort order(blank)

When no sorting is needed, the Sort order row is left blank. The field leagueDivision comes from Team — so even though the display doesn't seem to need Team data, a join is still required to access it.

Now you try

A coach wants a list of all goalkeepers and the training venue for each goalkeeper's team. Display firstname, surname, teamName, and trainingVenue. Sort by team name A to Z.

Answer the following:

  1. Which two tables are needed?
  2. Which join condition matches the records correctly?
  3. Write the complete SQL query.
  1. Use Player for the player name and position. Use Team for the training venue.
  2. The join condition is Player.teamName = Team.teamName.
  3. SELECT Player.firstname, Player.surname, Team.teamName, Team.trainingVenue
    FROM Player, Team
    WHERE Player.teamName = Team.teamName
    AND Player.position = "Goalkeeper"
    ORDER BY Team.teamName ASC;

Verified result: 4 records are returned — Airdrie Lions, Borders Bandits, Clyde Flyers, and Edinburgh Eagles.

Common mistakes
Forgetting the join condition. FROM Player, Team on its own combines every player with every team. Always add WHERE Player.teamName = Team.teamName.
Joining the wrong fields. Match the foreign key to the related primary key. In this database, Player.playerID = Team.teamName makes no sense.
Not qualifying repeated field names. Both tables have teamName. Write Player.teamName and Team.teamName so the query is clear.
Replacing the join line with the extra filter. A filter such as Team.district = "West" is not a join. Keep the join line, then add filters with AND.
Exam tip

For a join question, build the answer in this order: choose the fields for SELECT, write both table names in FROM, write the PK/FK match as the first WHERE line, then add any extra filters with AND. In an exam, the missing join condition is the line most likely to lose marks.

Referential integrity questions often ask you to explain or describe — give two facts: (1) every FK value must match an existing PK value; (2) if you try to insert a record whose FK has no matching PK, the database rejects it.

If asked to complete a query design table for a join, remember: list both table names in the Table(s) row, and write the PK/FK matching condition (e.g. Player.teamName = Team.teamName) as the first line of the Search criteria row, then add any other filters below it.

Task Set

Questions 1–5 are auto-checked. Questions 6–7 are self-marked prediction/debugging questions. Question 7 is a query design table. Questions 8–9 are practical DataGrip tasks. Question 10 is a self-marked referential integrity question.

1. Which pair of fields correctly joins the Player and Team tables? TYPE 1

2. What does this line do? WHERE Player.teamName = Team.teamName TYPE 1

3. Which FROM clause uses both tables needed for a player/team join? TYPE 1

4. The handball database has 29 players and 8 teams. If a query says FROM Player, Team but forgets the join condition, how many rows are produced? TYPE 1

5. Which extra line would filter a correct join to West district teams only? TYPE 1

6. Debugging: Explain what is wrong with this query and write the missing line. TYPE 2

SELECT Player.firstname, Player.surname, Team.trainingVenue
FROM Player, Team
AND Player.position = "Goalkeeper";

The query has both tables but no join condition. It also tries to use AND before there is a WHERE clause. The missing line is:

WHERE Player.teamName = Team.teamName

Then the goalkeeper filter can follow with AND Player.position = "Goalkeeper".

7. Complete the query design table for this request: "Display the first name, surname, and league division for all players who play for a West district team. Sort by surname A to Z." TYPE 2

Fill in all four rows of the query design table.

RowValue
Field(s)firstname, surname, leagueDivision
Table(s)Player, Team
Search criteriaPlayer.teamName = Team.teamName AND Team.district = "West"
Sort ordersurname ASC

Note: leagueDivision is stored in Team, so even though it appears as a display field rather than a filter, you still need a join to access it.

8. Practical: In DataGrip, write and run a join query to find West district players who scored more than 20 goals. Display firstname, surname, goalsScored, and district. Sort by goals scored highest first. Paste your SQL and record the returned players. TYPE 3

💻

Practical Task 1 of 2 — Join plus district filter

Open a query console for the handball database in DataGrip. Keep the join condition as the first WHERE line, then add the district and goals filters with AND.
Evidence checklist:
  • Paste the SQL and list the returned players with their goals.
SELECT Player.firstname, Player.surname, Player.goalsScored, Team.district
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Team.district = "West"
AND Player.goalsScored > 20
ORDER BY Player.goalsScored DESC;

Verified result: 4 records are returned — Sienna Park (41), Jack Edwards (37), Lewis MacDonald (29), and Aisha Munro (22).

9. Practical: In DataGrip, write and run a join query to list all right wing players with their team town and team email address. Display firstname, surname, teamName, town, and emailAddress. Sort by team name A to Z. Paste your SQL and record how many records are returned. TYPE 3

💻

Practical Task 2 of 2 — Join for contact details

The position is stored in Player. The town and email address are stored in Team. This is a join because the answer needs fields from both tables.
Evidence checklist:
  • Paste the SQL and record the number of rows returned by DataGrip.
SELECT Player.firstname, Player.surname, Team.teamName, Team.town, Team.emailAddress
FROM Player, Team
WHERE Player.teamName = Team.teamName
AND Player.position = "Right wing"
ORDER BY Team.teamName ASC;

Verified result: 5 records are returned — Airdrie Lions, Borders Bandits, Clyde Flyers, Edinburgh Eagles, and Harris Hurricanes.

10. The Player table has a foreign key teamName that references the Team table. A pupil tries to insert a new player record with teamName = "Stirling Stars", but there is no team called "Stirling Stars" in the Team table. TYPE 2

(a) What will the database do when the pupil runs the INSERT statement? Explain why.

(b) What is the term for this rule that prevents invalid foreign key values being stored?

(c) A different pupil tries to delete the row for "Clyde Flyers" from the Team table, but several players have teamName = "Clyde Flyers". What will the database do, and why?

(a) The database will reject the INSERT and return an error. Because of the foreign key constraint, every value stored in Player.teamName must match an existing value in Team.teamName. "Stirling Stars" does not exist in the Team table, so the constraint is violated and the record is not added.

(b) Referential integrity.

(c) The database will reject the DELETE. The foreign key constraint works in both directions: you cannot delete a parent record while child records still reference it. Deleting "Clyde Flyers" from Team would leave the player records with a teamName that no longer exists, which would violate referential integrity. The players must be deleted (or moved to another team) first.

Teacher notes — Shift+T to hide

Suggested timing: 60 minutes. Warm up and vocabulary 10 min; notes with PK/FK visual 15 min; live DataGrip missing-join demo 10 min; worked examples 12 min; now you try and task set 13 min.

Practical setup: Pupils need Laragon running, DataGrip open, and the handball database installed from handball.sql. Use a query console connected to the handball database.

Key misconception to address: All pupils may think listing two tables in FROM is enough. Demonstrate that the join condition is the important matching instruction, not just a filter.

Live demo suggestion: Run SELECT Player.firstname, Player.surname, Team.town FROM Player, Team; first and show the 232-row incorrect result. Then add WHERE Player.teamName = Team.teamName and show the result drop to 29 sensible rows.

Extension question: Ask all pupils to write a join that finds Division 1 goalkeepers and displays their team contact person. Then ask which table each displayed field came from.

Referential integrity (Q10): A common exam question type. Pupils often understand the concept but struggle to give both directions — blocked INSERT (FK value doesn't exist in parent) and blocked DELETE (parent has dependent child records). Model answer (c) is the harder direction; worth explicitly teaching both in class.

Query design table (Q7): The key distinction from a single-table query is that both table names appear in Table(s), and the first line of Search criteria is always the PK/FK join condition, not the filter. Pupils can lose marks by omitting the join condition row entirely.

SQA command words covered: "identify", "describe", "explain", "write", "predict", "debug". Emphasise that "write a query" requires the full SQL statement in the correct clause order.