Several years ago I had to create Oracle DB structure for one of my side projects. One of the data types to be stored contained results of certain observation. Each object had some meta-information (like timestamp, name of the observer, location and so on) plus dynamic collection of various key-value pairs – zero or more per observation. I went with the classic “one to many” pattern – main table with observation metadata and additional table for key-value pairs, connected via observation ID. And this worked pretty well for some time…
OBSERVATIONS: OBS_VALUES: +----+-----------+-----+ +--------+-----+-------+ | ID | TIMESTAMP | ... | | OBS_ID | KEY | VALUE | +----+-----------+-----+ +--------+-----+-------+ | 1 | | | | 1 | A | 123 | | 2 | | | | 1 | B | 456 | | . | | | | 2 | A | ... | | . | | | | . | | | +----+-----------+-----+ +--------+-----+-------+
Recently I worked on the next version of that tool. One of the requests that I got was less trivial than others – “we need to find all observations in given time period that do not have key-value records with some given key”. It took me time to recall about outer joins (I rarely work with databases directly nowadays), but the initial version of the query was ready almost instantly:
SELECT obs.id FROM observations obs LEFT JOIN obs_values v ON obs.id=v.obs_id WHERE key is null
Unfortunately, this worked only for observations that did not have any key-value pairs at all – the rest of the records had real data rows on the right side of the joined table and did not match the NULL key condition. So, the final version was bit more complex and took much more time to produce. I had to “enrich” main table with additional column containing the key in question (see line 2 below) – and only then I was able to use proper left outer join on both record ID and the key, which resulted in “null” row parts for records with no key in the values table.
SELECT obs.id FROM (SELECT *,? as key from observations) obs LEFT JOIN obs_values v ON obs.id=v.obs_id and obs.key=v.key WHERE v.key is null
It was fun to recall a bit of Oracle world, but I’m glad I don’t have to deal with that in my everyday work 🙂
Nice example Anton but can use a bit clarification – you could use:
SELECT DISTINCT obs.id FROM
(SELECT *, ? as key from observations) obs
LEFT JOIN obs_values v ON obs.id=v.obs_id and obs.key=v.key
WHERE v.key is null
and it would work. Why did you have to go with the more obscure variation that uses DUAL?
I was not aware that you can select constant from any table, not just DUAL…
Thank you for pointing that out!
BTW, are there any performance differences between those options? (And what is wrong with using DUAL?)
In a perfect world there should not be any performance difference but ‘the dual approach’ specifies a redundant nested join which a savvy enough optimizer may identify and transform into the simpler approach – this is not trivial because dual could have contained more rows. The optimizer usually have an estimate of the number of rows of a table, not exactly the count so it may be premature for it to switch to the simpler form. Oracle may be more intimately familiar with dual and thus may ‘know’ it to have just one row so maybe for Oracle this is easier.
Regardless of the above discussion, the simpler form is also portable to other databases and, frankly, the nice bit of advice in the post is more about adding the key to be able to use the LOJ so the simpler the query that shows it, the better.
According to the docs, DUAL always contains single row (https://docs.oracle.com/cd/B19306_01/server.102/b14200/queries009.htm), so Oracle optimizer should probably be able to deal with that. Yet, I fully agree with you that having it cross-platform and without the need to rely on optimizer sounds much better option. I will update the post shortly.
Thank you for the advice!
Just found that starting from 10g, Oracle does not use physical DUAL table in most of the cases (actually, all but fetching DUMMY column).
https://docs.oracle.com/cd/E11882_01/server.112/e41084/queries009.htm#SQLRF20036