Celko's SQL Puzzle
From IBM Database Magazine Wiki
SQL PUZZLE #1
If you remember your freshman data structures class, there was a chapter on graphs. These were general diagrams made up of nodes and edges that model everything from roadways to flowcharts. The edges connect pairs of nodes and might have a weight assigned to them to represent distances or cost of a trip on that edge.
Here is a simple five-node graph. Your job is to write an SQL statement that will give us the cheapest paths between any two nodes. The table that holds your data is what is called an adjacency list for obvious reasons.
CREATE TABLE Graph
(source_node CHAR(2) NOT NULL,
dest_node CHAR(2) NOT NULL,
CHECK (source_node <> dest_node),
wgt INTEGER DEFAULT 0 NOT NULL
CHECK (wgt >= 0),
PRIMARY KEY (source_node, dest_node));
INSERT INTO Graph
VALUES('a', 'd', 1),
('d', 'e', 1),
('e', 'c', 1),
('c', 'b', 1),
('b', 'd', 1);
Notice that we do not allow a node to be its own source and destination in this table. There is only one edge between any two nodes and all the weights are one. Yet another simplification for you is that this is a functional graph. That means each node has one and only one exiting edge so that each node models a function -- many inputs, but only one output.
That gives the graph some important properties that we will see shortly.
Here is a first attempt in SQL/PSM code to get you started.
CREATE PROCEDURE FirstTry()
LANGUAGE SQL
MODIFIES SQL DATA
BEGIN
DECLARE old_row_count INTEGER;
DECLARE new_row_count INTEGER;
SET old_row_count = 0;
SET new_row_count = (SELECT COUNT(*) FROM Graph); WHILE old_row_count < new_row_count
DO INSERT INTO Graph (source_node, dest_node, wgt)
SELECT DISTINCT G1.source_node, G2.dest_node, (G1.wgt + G2.wgt)
FROM Graph AS G1, Graph AS G2
WHERE G1.dest_node = G2.source_node
AND NOT EXISTS
(SELECT *
FROM Graph AS G3
WHERE G1.source_node = G3.source_node
AND G2.dest_node = G3.dest_node
AND G3.wgt <= (G1.wgt + G2.wgt));
SET old_row_count = new_row_count;
SET new_row_count = (SELECT COUNT(*) FROM Graph);
END WHILE;
END;
The idea is that you concatenate the adjacent edges and add up their weights and insert the constructed rows into the table. During each iteration, the paths get longer and longer until no more can be added. If you test it on the original graph, it works just fine and stops when it gets to paths of weight 4.
However, if you add more nodes, more edges or change the weights, things do not work out so well. Your real task is to fix this code so that it actually works in the general case.
Ideas? Talk about them on the Discussion page. See Solution: Celko's SQL Puzzle when you're ready.
About Joe Celko
Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards. He has written more than 800 columns in the computer trade and academic press, mostly dealing with data and databases.
Celko is the author of seven books on SQL for Morgan-Kaufmann: SQL for Smarties (1995, second edition 1999, third edition 2005), SQL Puzzles & Answers (1997), Data & Databases (1999) and Trees & Hierarchies in SQL (2004), SQL Programming Style (2005), Analytics & OLAP in SQL (2005), Thinking in Sets (2008) and Data Measurements & Standards in SQL (working title, 2009)
