I guess its time to refresh my sql - moving into finance means I could be running a lot of adhoc sql queries.

I am convinced I have used correlated subqueries before but had never heard of the term.
A correlated subquery is a query where the subquery uses values from the outer queries where clause.


SELECT empnum, name
FROM employee AS e1
WHERE salary > (SELECT avg(salary)
FROM employee
WHERE department = e1.department);

e1.department joins with the where cause of the other query.

This causes the query engine to execute the inner query for each match on the outer 'where' cause
This could produce some very inefficient queries. I guess as a programmer you can think of it as a 'for loop' - as in get all the data matching the where cause, loop though each row and perform another query.


mysql> select * from employee;
+------+-------+--------+---------------+
| id | name | salary | department_id |
+------+-------+--------+---------------+
| 2 | paul | 101 | 1 |
| 1 | bob | 100 | 1 |
| 3 | randy | 300 | 2 |
| 4 | sally | 305 | 2 |
+------+-------+--------+---------------+
4 rows in set (0.00 sec)


mysql> SELECT avg(salary) FROM employee
-> ;
+-------------+
| avg(salary) |
+-------------+
| 201.5000 |
+-------------+
1 row in set (0.06 sec)


mysql> SELECT id, name FROM employee AS e1 WHERE salary > (SELECT avg(salary) FROM employee WHERE department_id = e1.department_id);
+------+-------+
| id | name |
+------+-------+
| 2 | paul |
| 4 | sally |
+------+-------+
2 rows in set (0.00 sec)

The above query gives employees who have above average salary.

It much more effient to create a in memory table called a derived table called sal_ave using the syntax 'AS':

mysql> select name from employee e, (SELECT department_id, avg(salary) salary FROM employee group by department_id) AS sal_ave where e.salary > sal_ave.salary and e.department_id = sal_ave.department_id ;
+-------+
| name |
+-------+
| paul |
| sally |
+-------+
2 rows in set (0.00 sec)

Another common question in interviews is what is the difference between a WHERE and a HAVING cause

"HAVING specifies a search condition for a group or an aggregate function used in SELECT statement.

HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

A HAVING clause is like a WHERE clause, but applies only to groups as a whole, whereas the WHERE clause applies to individual rows. A query can contain both a WHERE clause and a HAVING clause.

The optimizer always automatically analyzes each statement you enter and selects an efficient means of executing it. In general, that means eliminating undesired rows in earlier clauses."