MESSAGE
DATE | 2014-11-26 |
FROM | Ruben
|
SUBJECT | Subject: [LIU Comp Sci] =?UTF-8?B?UmU6IERhdGFiYXNlIE1hbmFnZW1lbnQgU3lzdGVtczogQ1MgNjQ5IEE=?=
|
From owner-learn-outgoing-at-mrbrklyn.com Wed Nov 26 01:47:01 2014 Return-Path: X-Original-To: archive-at-mrbrklyn.com Delivered-To: archive-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix) id A52E3161154; Wed, 26 Nov 2014 01:47:01 -0500 (EST) Delivered-To: learn-outgoing-at-mrbrklyn.com Received: by mrbrklyn.com (Postfix, from userid 28) id 888E916115B; Wed, 26 Nov 2014 01:47:01 -0500 (EST) Delivered-To: learn-at-nylxs.com Received: from mail-qc0-f179.google.com (mail-qc0-f179.google.com [209.85.216.179]) by mrbrklyn.com (Postfix) with ESMTP id 80FB0161154 for ; Wed, 26 Nov 2014 01:47:00 -0500 (EST) Received: by mail-qc0-f179.google.com with SMTP id c9so1599413qcz.38 for ; Tue, 25 Nov 2014 22:46:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=bjW4/O+soaH8BchAzQk+5yryurJUaA8+rFqnyFlg//c=; b=j9jwQsRYkwQoeaWno/m7xRI7Bzdmq9PZ94X9TZ2iJ1spyekc0cuggdU08vT5osCwRA MjkMut4VWvVYaDFgUcHDQhzSWe4UOjBt9r15FoPiw61f5VsFn6K35i6IzrjfAXzYwjyh CYs+628AmzTZLle6iyNS1fyo8TRGWuTsgZn8m879RrE5avjC1ZUfMazCRkCsDhiWKAFh h/ajynaa964sAiFim4Rha4LmXLUZtbM5N/82uJKzbesr0hjbBt5kBQOeaO8trdApSUXo DEtTPyTBgRgYphcIr23IPDdS0XBi2Nu6GHWe2a2XbtoJ4ZCtWdjRPP9vGA9phnuYeDsi UYQQ== X-Gm-Message-State: ALoCoQnKsCdv1/huKM1/DeIoh0ggS+7ke7WoeBC+8ZP+tGKuvMrxCvKoY4Vf9ZD3VhIfDGMKF6eu X-Received: by 10.224.2.135 with SMTP id 7mr44497875qaj.64.1416984417866; Tue, 25 Nov 2014 22:46:57 -0800 (PST) Received: from [10.0.0.42] ([96.57.23.82]) by mx.google.com with ESMTPSA id f3sm3232261qag.49.2014.11.25.22.46.56 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Nov 2014 22:46:57 -0800 (PST) Message-ID: <547577AE.50205-at-my.liu.edu> Date: Wed, 26 Nov 2014 01:48:14 -0500 From: Ruben User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: Ping.Chung-at-liu.edu, samir Iabbassen , learn-at-nylxs.com Subject: [LIU Comp Sci] =?UTF-8?B?UmU6IERhdGFiYXNlIE1hbmFnZW1lbnQgU3lzdGVtczogQ1MgNjQ5IEE=?= =?UTF-8?B?bm5vdW5jZW1lbnQgLSBIb21ld29yayDigJMgUmVsYXRpb25hbCBBbGdlYnJhLCA=?= =?UTF-8?B?UmVsYXRpb25hbCBDYWxjdWx1cywgYW5kIE5vcm1hbGl6YXRpb24=?= References: <2013137211.2780.1416658541481.JavaMail.bbuser-at-b-ap1b.liu.edu> In-Reply-To: <2013137211.2780.1416658541481.JavaMail.bbuser-at-b-ap1b.liu.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: owner-learn-at-mrbrklyn.com Precedence: bulk Reply-To: learn-at-mrbrklyn.com
On 11/22/2014 07:15 AM, Ping.Chung-at-liu.edu wrote: > CS 649 Database Management Systems Fall 2014 > Instructor: Prof. Ping-Tsai Chung > Homework – Relational Algebra, Relational Calculus, and Normalization > (Total: 200 Points) Due: Dec. 10, 2014 (One day before our Thursday class) > > Send your file to pingtsaichung-at-gmail.com > > > > http://lambda-the-ultimate.org/node/3762
Why Normalization Failed to Become the Ultimate Guide for Database Designers?
While trying to find marshall 's claim that Alberto Mendelzon says the universal relation is an idea re-invented once every 3 years (and later finding a quote by Jeffrey Ullman that the universal relation is re-invented 3 times a year), I stumbled across a very provocative rant by a researcher/practitioner: Why Normalization Failed to Become the Ultimate Guide for Database Designers? by Martin Fotache. It shares an interesting wealth of experience and knowledge about logical design. The author is obviously well-read and unlike usual debates I've seen about this topic, presents the argument thoroughly and comprehensively.
The abstract is:
With an impressive theoretical foundation, normalization was supposed to bring rigor and relevance into such a slippery domain as database design is. Almost every database textbook treats normalization in a certain extent, usually suggesting that the topic is so clear and consolidated that it does not deserve deeper discussions. But the reality is completely different. After more than three decades, normalization not only has lost much of its interest in the research papers, but also is still looking for practitioners to apply it effectively. Despite the vast amount of database literature, comprehensive books illustrating the application of normalization to effective real-world applications are still waited. This paper reflects the point of view of an Information Systems academic who incidentally has been for almost twenty years a practitioner in developing database applications. It outlines the main weaknesses of normalization and offers some explanations about the failure of a generous framework in becoming the so much needed universal guide for database designers. Practitioners might be interested in finding out (or confirming) some of the normalization misformulations, misinterpretations, inconsistencies and fallacies. Theorists could find useful the presentation of some issues where the normalization theory was proved to be inadequate, not relevant, or source of confusion.
The body of the paper presents an explanation for why practitioners have rejected normalization. The author also shares his opinion on potentially underexplored ideas as well, drawing from an obviously well-researched depth of knowledge. In recent years, some researchers, such as Microsoft's Pat Helland, have even said Normalization is for sissies
(only to further this with later formal publications such as advocating we should be Building on Quicksand ). Yet, the PLT community is pushing for the exact opposite. Language theory is firmly rooted in formal grammars and proven correct 'tricks' for manipulating and using those formal grammars; it does no good to define a language if it does not have mathematical properties ensuring relaibility and repeatability of results. This represents and defines real tension between systems theory and PLT.
I realize this paper focuses on methodologies for creating model primitives, comparing mathematical frameworks to frameworks guided by intuition and then mapped to mathematical notions (relations in the relational model), and some may not see it as PLT. Others, such as Date, closely relate understanding of primitives to PLT: Date claims the SQL language is to blame and have gone to the lengths of creating a teaching language, Tutorial D, to teach relational theory. In my experience, nothing seems to effect lines of code in an enterprise system more than schema design, both in the data layer and logic layer, and often an inverse relationship exists between the two; hence the use of object-relational mapping layers to consolidate inevitable problems where there will be The Many Forms of a Single Fact (Kent, 1988). Mapping stabilizes the problem domain by labeling correspondances between all the possible unique structures. I refer to this among friends and coworkers as the N+1 Schema Problem, as there is generally 1 schema thought to be canonical, either extensionally or intensionally, and N other versions of that schema.
*Question: Should interactive programming languages aid practitioners in reasoning about their bad data models, (hand waving) perhaps by modeling each unique structure and explaining how they relate?* I could see several reasons why that would be a bad idea, but as the above paper suggests, math is not always the best indicator of what practitioners will adopt. It many ways this seems to be the spirit of the idea behind such work as Stephen Kell's interest in approaching modularity by supporting evolutionary compatibility between APIs (source texts) and ABIs (binaries), as covered in his Onward! paper, The Mythical Matched Modules: Overcoming the Tyranny of Inflexible Software Construction . Similar ideas have been in middleware systems for years and are known as /wrapper architecures/ (e.g., Don’t Scrap It, Wrap It! ), but haven't seen much PLT interest that I'm aware of; "middleware" might as well be a synonym for Kell's "integration domains" concept.
By Z-Bo at 2010-01-09 00:24 | Critiques | History | other blogs | 26452 reads
Comment viewing options
Select your preferred way to display the comments and click "Save settings" to activate your changes.
live programming
This is sort of related to one of my principle for live programming: the program should always run in a reasonable, even if it has errors in it. That their is code and that something was specified should always be apparent in the program, even if what the code does is undefined because of its erroneous state. Likewise, defaults should be reasonable so that we can see things; e.g., if you create a rectangle and forget to set its size, it should not be invisibly small (like in WPF...), but rather something you can see and remember...oops I forgot to set the size. Likewise, NaN shouldn't mean put fly off the screen into imaginary space, perhaps you could just start shaking or something. The point is to provide visible feedback so the programmer can more quickly understand what's wrong.
Likewise, why are systems so brittle? In PL, we expect that a program has one rigid unambiguous meaning, which means that any bug/mistake will cause the system to explode vs. just degrading gracefully. So let's say you fail to read a file because it doesn't exist...why not just log the error and return some random file anyways? Sometimes, it won't even matter. Martin Rinard's work on run time software patching comes to mind here; e.g., the Living in the comfort zone
paper from Onward 2007.
This is not mainstream PLT, but maybe it should be. At any rate, the systems community are pragmatic enough that they are exploring this area fairly well.
By Sean McDirmid at Sat, 2010-01-09 05:25 | login or register to post comments
Normalization Failed?
Seems to me like academic twaddle.
In practice (as opposed to the world of Date) a good understanding of 3rd normal form is the essential starting point for any database designer. That is analysis, not design.
And then, as every serious analyss and design methodology has explained since 1980, you denormalise to support the required processes - as need be.
By grahamberrisford at Sat, 2010-01-09 15:14 | login or register to post comments
I agree
3NF (actually, BCNF) is extremely helpful, especially given the recent changes in hardware (solid state disks) and database research (Adam Marcus's MIT masters thesis on heap file structures suitable to "navigable" relational databases; see BlendDB: Blending Table Layouts to Support Efficient Browsing of Relational Databases ). In my books, if the hardware guys can solve the "write problem" with solid state (which I don't believe they have), then you will see a dramatic reshaping of scaling practices. Solid state is simply a gamechanger; it removes the "denormalize for performance" advice from the equation, because with constant time disk access, redundant data actually slows clusters of disks down!
I am not just pitching this topic out there. I am fairly well-read in relational database theory. You can't just call it academic twaddle. There is real tension between systems theory, database systems theory, and PLT views on how to best solve problems. See: The Great MapReduce Debate and the follow-up Mike Stonebraker's counterarguments to MapReduce's popularity . Obviously, head technical folks at Google were very much in disagreement with Stonebraker, calling his comparison a "category error " and saying Stonebraker is no longer on the cutting edge (mind you, Stonebraker has the best track record for start-up ventures using cutting edge research of anybody in IT history; this was like saying Brett Favre should just retire). Outside of Google, others criticized Stonebraker as well. To me, this seems like a modularity problem with database systems, and opens the gateway for using MapReduce-like techniques to help build SELF-* based systems.
as every serious analyss and design methodology has explained since 1980, you denormalise to support the required processes - as need be.
Understanding behavioral requirements (processes) is non-trivial, especially in the face of mergers and acquisitions. This is why model checking tools like Alloy exist (and are based on relational logic). Where I work, we try to avoid enterprise-style integration wherever possible. For clients that don't need it, it is simply more costly and just a development hassle. I agree with Stonebraker here; there is just too much middleware :
I think my pet peeve is one of the things I talked about this morning in my invited talk at SIGMOD 2002: there is just too much middleware. The average corporation has bought a portal system, has bought an enterprise application integration system, has bought an ETL (Extraction, Transformation, and Loading) system, has bought an application server, maybe has bought a federated data system. All of these are big pieces of system infrastructure that run in the middle tier; they have high overlap in functionality, and are complicated, and require system administrators. The average enterprise has more than one of all of these things, and so they have this spaghetti environment of middleware, big pieces of moving parts that are expensive to maintain and expensive to use.
Everyone seems to recognize this problem, and the conventional commercial wisdom is to expand the role of an application server so it does what all of these packages do. Web Sphere, for example, from IBM, is becoming a very, very rich package which does a lot of middleware functionality.
I think a federated database system is a much better base on which to build a middleware platform than is an application server. And the reason is that application servers only manage code, and then the data is relegated to the bottom tier. If an application needs some data, it runs in the middle tier and requests data from the bottom tier. You end up moving data to the code. If you had a federated data system, so that the data manager was running at the middle tier and at the bottom tier---and object-relational engines are perfectly happy to store and activate functions--- then code and data could be co-mingled on a platform. And you could then do physical database design in such a way that you put the data near the code that needed it, and you wouldn’t end up shipping the data to the code all the time. I think that’s a much more long-term, robust way to build sophisticated middleware. So I’d work on trying to prove that that was a good idea if I had some more cycles at work---but I don’t.
|
|