MESSAGE
DATE | 2014-12-13 |
FROM | Ruben Safir
|
SUBJECT | Subject: [LIU Comp Sci] (fwd) Re: relational algrabra division?
|
From owner-learn-outgoing-at-mrbrklyn.com Sat Dec 13 23:26:57 2014 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix) id E766E16115D; Sat, 13 Dec 2014 23:26:56 -0500 (EST) Delivered-To: learn-outgoing-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix, from userid 28) id D1314161163; Sat, 13 Dec 2014 23:26:56 -0500 (EST) Delivered-To: learn-at-nylxs.com Received: from mailbackend.panix.com (mailbackend.panix.com [166.84.1.89]) by mrbrklyn.com (Postfix) with ESMTP id 2259916115D for ; Sat, 13 Dec 2014 23:26:56 -0500 (EST) Received: from panix2.panix.com (panix2.panix.com [166.84.1.2]) by mailbackend.panix.com (Postfix) with ESMTP id D7CCC13F98 for ; Sat, 13 Dec 2014 23:26:55 -0500 (EST) Received: by panix2.panix.com (Postfix, from userid 20529) id AF86F33C87; Sat, 13 Dec 2014 23:26:55 -0500 (EST) From: Ruben Safir To: learn-at-nylxs.com Subject: [LIU Comp Sci] (fwd) Re: relational algrabra division? User-Agent: tin/2.0.0-20110823 ("Ardenistiel") (UNIX) (NetBSD/6.1.5 (i386)) Message-Id: <20141214042655.AF86F33C87-at-panix2.panix.com> Date: Sat, 13 Dec 2014 23:26:55 -0500 (EST) Sender: owner-learn-at-mrbrklyn.com Precedence: bulk Reply-To: learn-at-mrbrklyn.com
-- forwarded message -- X-Received: by 10.50.27.65 with SMTP id r1mr2469700igg.7.1418390449334; Fri, 12 Dec 2014 05:20:49 -0800 (PST) X-Received: by 10.182.65.229 with SMTP id a5mr7778obt.26.1418390449132; Fri, 12 Dec 2014 05:20:49 -0800 (PST) Path: reader1.panix.com!panix!usenet.stanford.edu!h15no18576442igd.0!news-out.google.com!jh1ni11954igb.0!nntp.google.com!h15no18576431igd.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.databases.theory Date: Fri, 12 Dec 2014 05:20:48 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse-at-google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=107.207.76.36; posting-account=eTE9_AoAAAD1dS9O9Ccywd_vfKFzS40A NNTP-Posting-Host: 107.207.76.36 References: User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <1affebc8-525b-4862-ae5a-5ee4ae196d2c-at-googlegroups.com> Subject: Re: relational algrabra division? From: -CELKO- Injection-Date: Fri, 12 Dec 2014 13:20:49 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Lines: 155 Xref: panix comp.databases.theory:75273
Relational division is one of the eight basic operations in Codd's relation= al algebra. The idea is that a divisor table is used to partition a dividen= d table and produce a quotient or results table. The quotient table is made= up of those values of one column for which a second column had all of the = values in the divisor. There is a really good presentation on four ways to = do this at: http://www.cs.arizona.edu/people/mccann/research/divpresentatio= n.pdf
This is easier to explain with an example. We have a table of pilots and th= e planes they can fly (dividend); we have a table of planes in the hangar (= divisor); we want the names of the pilots who can fly every plane (quotient= ) in the hangar. To get this result, we divide the PilotSkills table by the= planes in the hangar.=20
CREATE TABLE PilotSkills=20 (pilot CHAR(15) NOT NULL,=20 plane CHAR(15) NOT NULL,=20 PRIMARY KEY (pilot, plane)); =20 PilotSkills=20 pilot plane=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D 'Celko' 'Piper Cub' 'Higgins' 'B-52 Bomber' 'Higgins' 'F-14 Fighter' 'Higgins' 'Piper Cub' 'Jones' 'B-52 Bomber' 'Jones' 'F-14 Fighter' 'Smith' 'B-1 Bomber' 'Smith' 'B-52 Bomber' 'Smith' 'F-14 Fighter' 'Wilson' 'B-1 Bomber' 'Wilson' 'B-52 Bomber' 'Wilson' 'F-14 Fighter' 'Wilson' 'F-17 Fighter' =20 CREATE TABLE Hangar (plane CHAR(15) NOT NULL PRIMARY KEY); =20 Hangar plane=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 'B-1 Bomber' 'B-52 Bomber' 'F-14 Fighter' =20 PilotSkills DIVIDED BY Hangar pilot =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D 'Smith' 'Wilson'
In this example, Smith and Wilson are the two pilots who can fly everything= in the hangar. Notice that Higgins and Celko know how to fly a Piper Cub, = but we don't have one right now. In Codd's original definition of relationa= l division, having more rows than are called for is not a problem.=20
The important characteristic of a relational division is that the CROSS JOI= N (Cartesian product) of the divisor and the quotient produces a valid subs= et of rows from the dividend. This is where the name comes from, since the = CROSS JOIN acts like a multiplication operator.=20
Division with a Remainder
There are two kinds of relational division. Division with a remainder allow= s the dividend table to have more values than the divisor, which was Codd's= original definition. For example, if a pilot can fly more planes than just= those we have in the hangar, this is fine with us. The query can be writte= n in SQL-89 as
SELECT DISTINCT pilot=20 FROM PilotSkills AS PS1 WHERE NOT EXISTS=20 (SELECT * FROM Hangar WHERE NOT EXISTS=20 (SELECT * FROM PilotSkills AS PS2 WHERE (PS1.pilot =3D PS2.pilot) AND (PS2.plane =3D Hangar.plane)));
The quickest way to explain what is happening in this query is to imagine a= n old World War II movie where a cocky pilot has just walked into the hanga= r, looked over the fleet, and announced, "There ain't no plane in this hang= ar that I can't fly!" We are finding the pilots for whom there does not exi= st a plane in the hangar for which they have no skills. The use of the NOT = EXISTS() predicates is for speed. Most SQL systems will look up a value in = an index rather than scan the whole table. The SELECT * clause lets the que= ry optimizer choose the column to use when looking for the index.=20
This query for relational division was made popular by Chris Date in his te= xtbooks, but it is not the only method nor always the fastest. Another vers= ion of the division can be written so as to avoid three levels of nesting. = While it is not original with me, I have made it popular in my books.=20
SELECT PS1.pilot=20 FROM PilotSkills AS PS1, Hangar AS H1 WHERE PS1.plane =3D H1.plane GROUP BY PS1.pilot=20 HAVING COUNT(PS1.plane) =3D (SELECT COUNT(plane) FROM Hangar);
There is a serious difference in the two methods. Burn down the hangar, so = that the divisor is empty. Because of the NOT EXISTS() predicates in Date's= query, all pilots are returned from a division by an empty set. Because of= the COUNT() functions in my query, no pilots are returned from a division = by an empty set.=20
In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS (Addison= -Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another operator (DI= VIDEBY ... PER) which produces the same results as my query, but with more = complexity.=20
Exact Division
The second kind of relational division is exact relational division. The di= vidend table must match exactly to the values of the divisor without any ex= tra values.=20
SELECT PS1.pilot=20 FROM PilotSkills AS PS1 LEFT OUTER JOIN Hangar AS H1 ON PS1.plane =3D H1.plane GROUP BY PS1.pilot=20 HAVING COUNT(PS1.plane) =3D (SELECT COUNT(plane) FROM Hangar) AND COUNT(H1.plane) =3D (SELECT COUNT(plane) FROM Hangar); =20 This says that a pilot must have the same number of certificates as there p= lanes in the hangar and these certificates all match to a plane in the hang= ar, not something else. The "something else" is shown by a created NULL fro= m the LEFT OUTER JOIN.
Please do not make the mistake of trying to reduce the HAVING clause with a= little algebra to:
HAVING COUNT(PS1.plane) =3D COUNT(H1.plane)=20
because it does not work; it will tell you that the hangar has (n) planes i= n it and the pilot is certified for (n) planes, but not that those two sets= of planes are equal to each other.
Note on Performance=20
The nested EXISTS() predicates version of relational division was made popu= lar by Chris Date's textbooks, while the author is associated with populari= zing the COUNT(*) version of relational division. The Winter 1996 edition o= f DB2 ON-LINE MAGAZINE (http://www.db2mag.com/96011ar:htm) had an article e= ntitled "Powerful SQL:Beyond the Basics" by Sheryl Larsen which gave the re= sults of testing both methods. Her conclusion for DB2 was that the nested E= XISTS() version is better when the quotient has less than 25% of the divide= nd table's rows and the COUNT(*) version is better when the quotient is mor= e than 25% of the dividend table.=20
-- end of forwarded message --
|
|