SQL: Find a Row That You Don’t Have

LEFT OUTER JOINSeveral years ago I had to create Oracle DB structure for one of my side projects. One of the data types to be stored contained results of certain observation. Each object had some meta-information (like timestamp, name of the observer, location and so on) plus dynamic collection of various key-value pairs – zero or more per observation. I went with the classic “one to many” pattern – main table with observation metadata and additional table for key-value pairs, connected via observation ID. And this worked pretty well for some time…

OBSERVATIONS:               OBS_VALUES:
+----+-----------+-----+	+--------+-----+-------+
| ID | TIMESTAMP | ... |	| OBS_ID | KEY | VALUE |
+----+-----------+-----+	+--------+-----+-------+
|  1 |           |     |	|      1 | A   |   123 |
|  2 |           |     |	|      1 | B   |   456 |
|  . |           |     |	|      2 | A   |   ... |
|  . |           |     |	|      . |     |       |
+----+-----------+-----+	+--------+-----+-------+

Recently I worked on the next version of that tool. One of the requests that I got was less trivial than others – “we need to find all observations in given time period that do not have key-value records with some given key”. It took me time to recall about outer joins (I rarely work with databases directly nowadays), but the initial version of the query was ready almost instantly:

SELECT obs.id FROM observations obs
LEFT JOIN obs_values v ON obs.id=v.obs_id
WHERE key is null

Unfortunately, this worked only for observations that did not have any key-value pairs at all – the rest of the records had real data rows on the right side of the joined table and did not match the NULL key condition. So, the final version was bit more complex and took much more time to produce. I had to “enrich” main table with additional column containing the key in question (see line 2 below) – and only then I was able to use proper left outer join on both record ID and the key, which resulted in “null” row parts for records with no key in the values table.

SELECT obs.id FROM
(SELECT *,? as key from observations) obs
LEFT JOIN obs_values v ON obs.id=v.obs_id and obs.key=v.key
WHERE v.key is null

It was fun to recall a bit of Oracle world, but I’m glad I don’t have to deal with that in my everyday work 🙂