Friday, October 2, 2009

Trees & Hierarchies in SQL


Hierarchies and trees are a challenge in SQL.
This is an example of a very simple tree

Parent ID
ID
Name

0001
Earth
0001
0002
EU
0001
0003
USA
0002
0004
UK
0002
0005
France
0004
0006
England
0006
0007
Yorkshire

With our application the Parent ID and ID columns are both GUIDs. To help with readability I have used numbers in the examples.


This looks like
Earth
  EU
    France
    UK
      England
        Yorkshire
  USA


The table structure above will work very well in most cases. It's very easy to select the parent of an item and select the children of an item. You can easily add new items or delete items. You get problems when you need to find things like all the decedents of an item.

You can use CTE (common table expressions) but under heavy load and with lots of data it just does not work.

We looked at the nested model. This does not work with simple deletes and inserts. (http://www.developersdex.com/gurus/articles/112.asp )

The final approach we took was to add an incremental number to our table and a row path column. As follows :

Row bitint
RowPath varbinary(160)
When we generate the rowpath we pad it with zeros so each row number starts at the same place. We do not use delimiters we used fixed spaces.

This is a simple representation of how the table will look. This is simple as I am just padding the rowpath with 2 zeros (not 16).

Row
Parent ID
ID
Name
RowPath
1

0001
Earth
01
2
0001
0002
EU
0102
3
0001
0003
USA
0103
4
0002
0004
UK
010204
5
0002
0005
France
010205
6
0004
0006
England
01020406
7
0006
0007
Yorkshire
0102040607


This is an example of how the data looks in SQL with it padded with 16 zeros.

0x0000000000000795
0x00000000000007950000000000000796
0x0000000000000797
0x00000000000007970000000000000798
0x000000000000079700000000000007980000000000000799
0x00000000000007970000000000000798000000000000079A
0x00000000000007970000000000000798000000000000079B
To get the child items, then we need to run the following SQL :

@path is the path that we need to find the child items from
@pathplusone is our path plus one, see the function below for how to increment this.
Select Name from Table where RowPath > @path and RowPath < @pathplusone


We increment the path with the following SQL

CREATE FUNCTION [dbo].[IncrementRowPath]
(
    @VarBinary VARBINARY(160) = 0x0000000000000000
)
RETURNS VARBINARY(160)
AS
BEGIN
    DECLARE @Result    VARBINARY(160)
    -- If the data is the wrong size.
    IF(@VarBinary IS NULL OR DATALENGTH(@VarBinary) % 8 != 0)
        SET @Result = 0x0000000000000000
    -- Increment the last 8 byte segment.
    ELSE
    BEGIN
        DECLARE @LastSegment BIGINT
        SET @LastSegment = CAST(SUBSTRING(@VarBinary, DATALENGTH(@VarBinary) - 7, 8) AS BIGINT)
        SET @Result = SUBSTRING(@VarBinary, 1, DATALENGTH(@VarBinary) - 8) + CONVERT(VARBINARY(8), @LastSegment + 1)
    END
    RETURN @Result
END

I hope this helps you out with trees.

The only issue is that you need and index on rowpath, this index is going to be large, but in our testing and our live implementations it's not caused an issue.

1 comment:

  1. This is really really great!!!! I had to stumble across something like the first exmaple in the past, it works, however, this is much much better.

    ReplyDelete