Unions
The UNION operation is a fundamental part of SQL that serves a very different purpose than JOINs or subqueries. Let’s break it down comprehensively to grasp the concept in depth.
By understanding UNION, you add a powerful tool for data aggregation and reporting to your SQL toolkit, allowing you to solve a different class of problems than JOINs or subqueries address.
1. What is a SQL UNION?
Section titled “1. What is a SQL UNION?”A SQL UNION operator is used to combine the result sets of two or more SELECT statements into a single result set. It stacks the results vertically.
Core Concept: UNION is used for concatenating rows, not for combining columns like a JOIN.
Basic Syntax:
Section titled “Basic Syntax:”SELECT column1, column2 FROM table1UNIONSELECT column1, column2 FROM table2;2. Why Use UNION? (The “Why”)
Section titled “2. Why Use UNION? (The “Why”)”UNION is designed for specific scenarios where data needs to be aggregated from different sources or queries into a single, unified list.
- Combine Similar Data from Different Tables: Useful when you have the same category of data split across multiple tables (e.g., a
2023_salestable and a2024_salestable). - Merge Data from Different Perspectives: To create a combined list from different queries against the same table (e.g., getting a list of both the highest-paid and lowest-paid employees).
- Data Normalization and Reporting: Essential for creating comprehensive reports that need to pull together disparate but similar data sets.
- Alternative to Full Outer Join (sometimes): In databases that don’t support
FULL OUTER JOIN(like MySQL),UNIONcan be used to simulate one by combining aLEFT JOINand aRIGHT JOIN.
3. When and How to Use UNION? (The “When” & “How”)
Section titled “3. When and How to Use UNION? (The “When” & “How”)”The key to using UNION lies in understanding its strict rules.
A. UNION vs. UNION ALL
Section titled “A. UNION vs. UNION ALL”This is the most important distinction and a very common interview question.
| Feature | UNION | UNION ALL |
|---|---|---|
| Purpose | Combines result sets and removes duplicate rows. | Combines result sets and keeps all duplicate rows. |
| Performance | Slower because it must sort and deduplicate the entire combined result set. | Faster because it simply appends the results without any checks. |
| When to Use | When you know there might be duplicates and you want a distinct list. | When you know there are no duplicates, or when duplicates are meaningful (e.g., counting items). |
Example: Get a distinct list of all cities where either employees or departments are located.
SELECT city FROM employees_locationsUNIONSELECT city FROM departments_locationsORDER BY city; -- ORDER BY can only be used on the final resultExample: Get a combined list of all product names, including duplicates if they exist.
SELECT product_name FROM current_productsUNION ALLSELECT product_name FROM discontinued_products;B. The Rules of UNION
Section titled “B. The Rules of UNION”For a UNION to work, each SELECT statement must adhere to two rules:
- Same Number of Columns: Each
SELECTmust have the same number of columns. - Compatible Data Types: The corresponding columns in each
SELECTmust have compatible data types (e.g.,VARCHARtoTEXT,INTtoBIGINT). The column names do not need to be the same; the column names from the firstSELECTstatement will be used in the final result.
Visualization with Mermaid:
This diagram shows how UNION and UNION ALL process the data differently.
flowchart TD
A[Start UNION Operation] --> B{Which Operator?};
B --> C[UNION];
B --> D[UNION ALL];
subgraph C_Flow [UNION Process]
direction TB
C1[Execute Query 1] --> C2[Execute Query 2];
C2 --> C3[Combine All Rows];
C3 --> C4[Sort Combined Results];
C4 --> C5[Remove Duplicate Rows];
C5 --> C6[Final Distinct Result];
end
subgraph D_Flow [UNION ALL Process]
direction TB
D1[Execute Query 1] --> D2[Execute Query 2];
D2 --> D3[Append Results No Deduplication];
D3 --> D4[Final Result with All Rows];
end
C --> C_Flow;
D --> D_Flow;
Example of the Rules: This query will work even though the column names are different.
-- Gets a combined list of people's names and company namesSELECT first_name AS 'Name' FROM employees -- Column is VARCHARUNIONSELECT department_name FROM departments -- Column is also VARCHARUNIONSELECT 'Fixed String'; -- Compatible with VARCHAR4. UNION vs. JOIN: The Fundamental Difference
Section titled “4. UNION vs. JOIN: The Fundamental Difference”This is a crucial interview topic. UNION and JOIN solve completely different problems.
| Operation | Combines Data… | Visual Analogy | Purpose |
|---|---|---|---|
| JOIN | Horizontally (adds more columns) | Venn Diagram (overlap) | Combine related data from different tables based on a key. |
| UNION | Vertically (adds more rows) | Stacking Lists | Aggregate similar data from different sources into one column list. |
Example Scenario: You have an employees table and a departments table.
-
A JOIN Question:
“Show me employees alongside their department name.” (You need columns from both tables).
SELECT e.first_name, d.department_nameFROM employees eINNER JOIN departments d ON e.department_id = d.department_id;Result: A wide table with columns from both tables.
-
A UNION Question:
“Get a single list of all names we have in the system, both employee names and department names.”
SELECT first_name AS 'Name' FROM employeesUNIONSELECT department_name FROM departments;Result: A single column list with values from both tables stacked on top of each other.
5. Practical Use Cases and Interview Questions
Section titled “5. Practical Use Cases and Interview Questions”Common Use Cases:
Section titled “Common Use Cases:”- Combining Historical and Current Data:
SELECT sale_id, amount, '2023' AS year FROM sales_2023UNION ALLSELECT sale_id, amount, '2024' AS year FROM sales_2024;
- Creating Summary Reports:
SELECT 'High Salary' AS category, COUNT(*) FROM employees WHERE salary > 100000UNIONSELECT 'Medium Salary', COUNT(*) FROM employees WHERE salary BETWEEN 50000 AND 100000UNIONSELECT 'Low Salary', COUNT(*) FROM employees WHERE salary < 50000;
- Simulating a FULL OUTER JOIN in MySQL:
-- Find all employees and departments, showing matches and unassigned/missing onesSELECT e.name, d.department_nameFROM employees eLEFT JOIN departments d ON e.dept_id = d.dept_idUNIONSELECT e.name, d.department_nameFROM employees eRIGHT JOIN departments d ON e.dept_id = d.dept_id;
Classic Interview Questions:
Section titled “Classic Interview Questions:”-
“What is the difference between
UNIONandUNION ALL?”Answer:
UNIONremoves duplicate records from the combined result set, whileUNION ALLincludes all records, including duplicates.UNION ALLis faster because it doesn’t require the database to process duplicates. -
“What are the rules for using a
UNIONoperator?”Answer: Each
SELECTstatement within theUNIONmust have the same number of columns. The corresponding columns must also have compatible data types. -
“When would you use a
UNIONinstead of aJOIN?”Answer: I would use a
JOINwhen I need to combine columns from different tables based on a relationship to create a wider result. I would use aUNIONwhen I need to combine rows from different queries or tables into a single, unified list, essentially making the result set longer. -
“How would you combine data from two tables with different structures?”
Answer: I would use
UNIONby selecting the same number of columns and usingNULLplaceholders or constant values for missing columns to make the structures match.SELECT product_id, product_name, NULL AS supplier_name FROM productsUNIONSELECT supplier_id, NULL, supplier_name FROM suppliers;