SQL 多表查询是数据库操作的核心技能之一,主要用于从多个关联表中整合和分析数据。以下是主要的多表查询技术和应用场景:
一、主要多表查询类型
1. JOIN 连接查询
-- INNER JOIN(内连接):返回两个表匹配的记录
SELECT orders.order_id, customers.name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;
-- LEFT JOIN(左连接):返回左表所有记录,右表匹配的记录
SELECT employees.name, departments.dept_name
FROM employees
LEFT JOIN departments ON employees.dept_id = departments.dept_id;
-- RIGHT JOIN(右连接):返回右表所有记录,左表匹配的记录
SELECT orders.order_id, customers.name
FROM orders
RIGHT JOIN customers ON orders.customer_id = customers.customer_id;
-- FULL OUTER JOIN(全外连接):返回两个表的所有记录
SELECT employees.name, departments.dept_name
FROM employees
FULL OUTER JOIN departments ON employees.dept_id = departments.dept_id;
2. UNION 联合查询
-- 合并多个查询结果(去重)
SELECT product_name FROM products_2023
UNION
SELECT product_name FROM products_2024;
-- 合并所有结果(不去重)
SELECT city FROM suppliers
UNION ALL
SELECT city FROM customers;
3. 子查询(Subqueries)
-- 在 WHERE 中使用子查询
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- 在 FROM 中使用子查询(派生表)
SELECT dept_name, avg_salary
FROM (
SELECT dept_id, AVG(salary) as avg_salary
FROM employees
GROUP BY dept_id
) dept_stats
JOIN departments ON dept_stats.dept_id = departments.dept_id;
-- 在 SELECT 中使用子查询
SELECT
order_id,
order_date,
(SELECT name FROM customers WHERE customer_id = orders.customer_id) as customer_name
FROM orders;
二、高级多表查询技巧
1. 多表 JOIN 链式连接
-- 连接三个或更多表
SELECT
orders.order_id,
customers.name,
products.product_name,
order_items.quantity,
suppliers.supplier_name
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
JOIN order_items ON orders.order_id = order_items.order_id
JOIN products ON order_items.product_id = products.product_id
JOIN suppliers ON products.supplier_id = suppliers.supplier_id;
2. 自连接(Self Join)
-- 查询员工的经理信息
SELECT
e.employee_name as employee,
m.employee_name as manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
3. CROSS JOIN(笛卡尔积)
-- 生成所有可能的组合
SELECT sizes.size_name, colors.color_name
FROM sizes
CROSS JOIN colors;
三、性能优化建议
1. 索引优化
-- 为连接字段创建索引
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_id ON order_items(order_id);
2. 查询优化技巧
- 使用 EXISTS 代替 IN(当子查询结果集较大时)
- 避免 SELECT *,只选择需要的列
- 使用 LIMIT 限制返回的行数进行测试
- 合理使用临时表和CTE(Common Table Expressions)
3. EXPLAIN 分析
EXPLAIN SELECT * FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;
四、实际应用场景
1. 电商数据分析
-- 分析客户购买行为
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) as order_count,
SUM(oi.quantity * p.price) as total_spent,
MAX(o.order_date) as last_order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC;
2. 库存管理查询
-- 查询需要补货的产品
SELECT
p.product_id,
p.product_name,
p.current_stock,
s.supplier_name,
s.reorder_level
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
WHERE p.current_stock < p.minimum_stock;
五、现代SQL特性
1. CTE(公共表表达式)
WITH sales_summary AS (
SELECT
product_id,
SUM(quantity) as total_sold,
SUM(quantity * price) as total_revenue
FROM order_items
JOIN products USING(product_id)
GROUP BY product_id
)
SELECT
p.product_name,
ss.total_sold,
ss.total_revenue
FROM sales_summary ss
JOIN products p ON ss.product_id = p.product_id;
2. 窗口函数结合多表查询
SELECT
d.dept_name,
e.employee_name,
e.salary,
AVG(e.salary) OVER(PARTITION BY d.dept_id) as avg_dept_salary
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
最佳实践总结
明确需求:先确定需要哪些数据,来自哪些表
选择合适的JOIN类型:根据业务逻辑选择INNER、LEFT、RIGHT等
优化查询性能:使用索引,避免不必要的连接
保持可读性:使用表别名,格式化SQL语句
测试验证:先用少量数据测试,再应用到生产环境
掌握多表查询是进行复杂数据分析的基础,合理运用这些技术可以大幅提升数据处理效率和数据洞察能力。