SQL: Find a Row That You Don’t Have

July 28, 2016April 9, 2020 ~ Anton Khitrenovich

LEFT OUTER JOIN Several years ago I had to create Oracle DB structure for one of my side projects. One of the data types to be stored contained results of certain observation. Each object had some meta-information (like timestamp, name of the observer, location and so on) plus dynamic collection of various key-value pairs – zero or more per observation. I went with the classic “one to many” pattern – main table with observation metadata and additional table for key-value pairs, connected via observation ID. And this worked pretty well for some time…

OBSERVATIONS:               OBS_VALUES:
+----+-----------+-----+	+--------+-----+-------+
| ID | TIMESTAMP | ... |	| OBS_ID | KEY | VALUE |
+----+-----------+-----+	+--------+-----+-------+
|  1 |           |     |	|      1 | A   |   123 |
|  2 |           |     |	|      1 | B   |   456 |
|  . |           |     |	|      2 | A   |   ... |
|  . |           |     |	|      . |     |       |
+----+-----------+-----+	+--------+-----+-------+

Recently I worked on the next version of that tool. One of the requests that I got was less trivial than others – “we need to find all observations in given time period that do not have key-value records with some given key”. It took me time to recall about outer joins (I rarely work with databases directly nowadays), but the initial version of the query was ready almost instantly:

SELECT obs.id FROM observations obs
LEFT JOIN obs_values v ON obs.id=v.obs_id
WHERE key is null

Unfortunately, this worked only for observations that did not have any key-value pairs at all – the rest of the records had real data rows on the right side of the joined table and did not match the NULL key condition. So, the final version was bit more complex and took much more time to produce. I had to “enrich” main table with additional column containing the key in question (see line 2 below) – and only then I was able to use proper left outer join on both record ID and the key, which resulted in “null” row parts for records with no key in the values table.

SELECT obs.id FROM
(SELECT *,? as key from observations) obs
LEFT JOIN obs_values v ON obs.id=v.obs_id and obs.key=v.key
WHERE v.key is null

It was fun to recall a bit of Oracle world, but I’m glad I don’t have to deal with that in my everyday work 🙂

Published by Anton Khitrenovich

View all posts by Anton Khitrenovich

5 thoughts on “SQL: Find a Row That You Don’t Have”

Dror Harari says:

July 29, 2016 at 01:16

Nice example Anton but can use a bit clarification – you could use:

SELECT DISTINCT obs.id FROM
(SELECT *, ? as key from observations) obs
LEFT JOIN obs_values v ON obs.id=v.obs_id and obs.key=v.key
WHERE v.key is null

and it would work. Why did you have to go with the more obscure variation that uses DUAL?

Reply
1. Anton Khitrenovich says:
  
  July 29, 2016 at 14:28
  
  I was not aware that you can select constant from any table, not just DUAL…
  Thank you for pointing that out!
  
  BTW, are there any performance differences between those options? (And what is wrong with using DUAL?)
  
  Reply
  1. Dror Harari says:
    
    July 31, 2016 at 16:43
    
    In a perfect world there should not be any performance difference but ‘the dual approach’ specifies a redundant nested join which a savvy enough optimizer may identify and transform into the simpler approach – this is not trivial because dual could have contained more rows. The optimizer usually have an estimate of the number of rows of a table, not exactly the count so it may be premature for it to switch to the simpler form. Oracle may be more intimately familiar with dual and thus may ‘know’ it to have just one row so maybe for Oracle this is easier.
    
    Regardless of the above discussion, the simpler form is also portable to other databases and, frankly, the nice bit of advice in the post is more about adding the key to be able to use the LOJ so the simpler the query that shows it, the better.
    
    Reply
    1. Anton Khitrenovich says:
      
      July 31, 2016 at 22:28
      
      According to the docs, DUAL always contains single row (https://docs.oracle.com/cd/B19306_01/server.102/b14200/queries009.htm), so Oracle optimizer should probably be able to deal with that. Yet, I fully agree with you that having it cross-platform and without the need to rely on optimizer sounds much better option. I will update the post shortly.
      
      Thank you for the advice!
      
      Reply
    2. Anton Khitrenovich says:
      
      July 31, 2016 at 22:36
      
      Just found that starting from 10g, Oracle does not use physical DUAL table in most of the cases (actually, all but fetching DUMMY column).
      https://docs.oracle.com/cd/E11882_01/server.112/e41084/queries009.htm#SQLRF20036
      
      Reply

SQL: Find a Row That You Don’t Have

Related

Published by Anton Khitrenovich

5 thoughts on “SQL: Find a Row That You Don’t Have”

Leave a ReplyCancel reply