2012/03/18

Double joins

I finally groked something I was probably told to do 5 years ago. Actually I have no memory of being told to do this trick, but I should have been. It's so sweet. It's allowed me to rewrite and rethink the database queries for my archiving app.

If you don't know that a table join is, you'll have to learn about that first. So go do that and come back.

OK, now what is so sweet about a double join? First off, I'll point out that I'm not joining 3 tables together, I'm joining 2 tables together TWICE.

Why would I do this? A common idiom is to have a main table that contains all the fields that every widget (documents, articles in your store, whatever) has and another table that contains one or more rows per-widget that contain fields that only that widget or only that class of widget has. The alternative is to have one table that contains a field for every possible field of every possible class of widget, which can be very wasteful. And then you have to MODIFY TABLE each time you want to add or remove a field. Or you could have a table per class of widget, which reduces waste, but increases complexity.

So, a hypothetical ecommerce schema would looks roughly like:

CREATE TABLE main (
    int SKU PRIMARY KEY,
    timestamp created,
    int price,
    int in_stock,
    int category
);

CREATE TABLE extra {
    int SKU PRIMARY KEY,
    varchar(20) field,
    varchar(255) value
};

(Yes, extra.field really should be an ENUM or an INT.)

Now you want to find all the items with the keyword 'tablet', but you want to sort on the french name of the product:

SELECT SKU,E1.value as keyword,ES.value AS sort_field FROM main 
    JOIN extra AS E1 ON main.SKU = extra.SKU AND field = 'keyword'
    JOIN extra AS E2 ON main.SKU = extra.SKU AND field = 'fr'
    WHERE E1.value = 'tablet'
    ORDER BY sort_field;

Look at that, it's like you've temporarily added 2 new fields to main, 'keyword' and 'title'. What's more, to sort in English, I just change 'fr' to 'en'. To sort by manufacturer, change it to 'manufacture.'

Seems to me this invalidates 75% of the reason for NoSQL. The other 25% is that joins like the one above aren't very efficient for huge (aka web-scale) databases, the kind with multimillion records. The answer to that is that RAM IS CHEAP and USE SSDs.

2012/03/16

Shitlist

Added: Agrupur.
For dedicating a entire face of a 2 liter milk jug to an ad for a certain privacy invading website who's name begins with eff.

Though I do give them points for the decent use of whitespace.