Saturday, April 13, 2013
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edition, Ralph Kimball
Compact tome with perfect font for the mature crowd. Like the clear and understandable way in which the various topics are presented and the use of simple diagrams.
The book's complete title is "Data warehouse toolkit: the complete guide to dimensional modeling". What is dimensional modeling? Chapter 1, "Dimensional modeling primer", will surely explain. Page 1 - nothing, page 2 - nothing... page 8 - nothing, page 9 - "By default, normalized databases are excluded from the presentation area, which should be strictly dimensionally structured". What is "dimensionally structured" though? Have I missed the definition? Leafing back... no, Kimball is just using a concept before he defined it, moving on... Page 10: "Dimensional modeling is a new name for an old technique for making databases simple and understandable". Great, what is it then? Page 11 - "Dimensional modeling is quite different from third-normal-form (3NF) modeling". Yees? Page 12 - nothing, page 13 - "If the presentation area is based on a relational database, then these dimensionally modeled tables are referred to as star schemas". Finally! Now, this sort-of-definition would not help someone who did not know about star schema, but thankfully I do, and anyway, this is the closest thing to a definition that you get - although things start to get clearer on page 16, where fact tables and dimension tables are introduced. The essence of dimensional modeling, it seems, is "Star is good; snowflake is bad".
A couple of pages later, on page 18, I see this passage. "The fact table itself generally has its own primary key made up of a subset of the foreign keys. This key is often called a composite or concatenated key. Every fact table in a dimensional model has a composite key, and conversely, every table that has a composite key is a fact table. Another way to say this is that in a dimensional model, every table that expresses a many-to-many relationship must be a fact table". I am confused, for three reasons. A fact table's primary key is "generally" made up of a subset of foreign keys? This is not the case with Kimball's own first fact table on page 36 - "POS Transaction Number" definitely should be part of a primary key (he does not define one, so I assume), but it does not foreign-key into anything. Oh, and Sentence 3 means it's "always", not "generally", if we follow the "conversely" path. Is the "another way to say this" part true? ... And overall, isn't this all just a confused way to say that fact tables have foreign keys and dimension tables don't? "Stars, no snowflakes". (What's wrong with snowflakes, apart from the increased design complexity? Among the reasons listed on page 60 - views are never mentioned - the technical and the scariest one is "snowflaking defeats the use of bitmap indexes").
The two examples above are representative of the book's style, and I am quite sure that it could use a lot more editing. I wish that somebody did a better job, but don't know a reasonable substitute. (Yes, I have seen Inmon's book - not a fan). Nonetheless, it's an impressive, concrete book that will give you a lot of practical ideas, and, when it suggests something that looks suboptimal or incomplete or self-contradictory, will make you think about schema design. Not a sufficient reference on the subject, but a very necessary one.
PS. I recommend "Kimball Group Reader" as the alternative to this book: I believe that it covers the material here, and offers a lot of additional information.
I had the fact and dimensions tables backwords. Very useful book on modeling data warehouse. It show how much thought needs to go into the design of a functional datawarehouse
I read this book in 2007. At first, I was a little put off by the examples not fitting our business; but around the 4th or 5th chapter, it started to make sense. By the end of it, I had enough understanding to design and build a data warehouse that works for our international company. It has been a long process, and it is not finished, but the fundamental design is based on the things discussed in the book. We can now deliver reports that start out with names (raw materials, products, customers) that are all over the map but come out standardized and meaningful.
I highly recommend it if you want to build your data warehouse or understand underlying concepts of how data can be standardized when it comes from disparate sources.
After reading the reviews, I thought this would be a good book. After reading the first four chapters, I deceided to return it because I believe it is not direct and to the point; there is way too much description to make a simple point.
Having been around data warehousing since 2004, I have relied on Kimball's teaching. Having gone to his training sessions in person, this is as close as you can get to taking him home. I really find the various examples timeless and a great starting point for almost any industry. Anyone starting a data warehouse, or trying to put the concepts together should give this book a serious look.
Although I got a second hand book, it looks as good as a new book. No tear or wear on the book.
This book is an amazing one. This is the best book to learn about Data Warehousing. Ralph has described using the most real time scenarios.
If you are new to Data Warehousing, I say you should definitely read this book. And if you are working, then you sure must go through the individual chapters based on domains. He has explained it very beautifully!!
This is not for how to create cubes, step by step tutorials or for one time study. This shows what is needed to create cubes. It helps you to form the fundamentals, your perception on solutions to problems. It has to be read from cover to cover. Contents or Index doesn't have any significance in this book. After each chapter, writing down the concepts, applying on your real/imaginary problems will help to understand next chapter easily. At times I found it boring, too wordy. But its not a sweet energy drink to quickly consume for running. It is a kind of slow, steady and study book.
Ralph got it right with this book: focus on the model and the system will perform. This book takes you through step by step on how to model for the data warehouse. It provides practical examples on how to build real industry-relevant data models. It teaches concepts like Conforming Dimensions and Slowly Changing Dimensions. This book is a must for technical AND business practitioners starting out in the BI world.
This is my first Kindle version of any book so keep this in mind. I just wanted to say that the quality of the scan is not perfect. In some places the words are scrunched together. Other places there seem to be ink blotches. I just realized these blemishes might be in the paperback version too, so if they are disregard what I'm saying. But if the paperback version does not, then the eBook translation is not 100%. No biggie, just my first impression of a Kindle eBook.
Content-wise, I just started reading so I expect this book to still be useful even though it's a bit long in the tooth.
The first chapter is completely filled with run on sentences. I found this thing impossible to read. SCD is referenced in the index for page 201 with no other references in the index but the real explanation is on page 403. I think the author is sick in the head and should definitely stop writing books. That is why this one is not a New York Times Bestseller. Scaring 'techs' away with a first chapter like that does not help ideas get across to the reader. I definitely vote that this is badly written. Probably best quoted by literary professionals as do not be bothered with buying this.
I am a complete newbie to datawarehousing though I have been in the industry for 10 years and I have basic knowledge of databases and data structures. For anyone who needs to understand not only the hows but also the whys of datawarehousing this book is a must read. After reading just two chapters of this book, I thought I was master of datawarehousing :). I will recommend this book to everyone in software industry. Even if you are not in a data warehousing project this book will still be a great asset.
The title for this review of Ralph Kimball's book is chosen with set purpose in mind. Within the past several days, I've completed reading both Kimball's data warehousing tome, here reviewed, and that of Bill Inmon. Now, ostensibly there rages a debate within the corporate data warehousing community between the disciples of Kimball's and Inmon's competing approaches. For this reason, it was interesting and enlightening to read both books in short succession. It is also important to note that my assessment of the debate is influenced by over twenty-five years worth of experience in the discplines of logical data modeling and relational database design. And this is the reason for the selection of my review title, mentioned above. I employ a term from the relational world to draw attention to the fact that, when we are talking about databases today, we still must reasonably do so from a relational perspective. And it is this important perspective that is so evidently absent from Kimball's approach.
Kimball's concept is founded on the notion of a "dimensional model" for database. Quite interestingly, Kimball pleads ignorance relative to the question of the actual origins of this dimensional approach. With this, I can be of assistance. In the early days of the Decision Support Software industry, there was a product known as Express. I believe the vendor was Management Decision Sciences, Inc., or something like that. This product competed, at one level, with IFPS, the Integrated Financial Planning System(IFPS), which was sort of like fancy Fortran, and at another level with the then emerging world of relational database software. I still remember meetings from back in the early 80's when proponents of Express would argue passionately that data ought to be organized in "cubes", the forerunner, and predecessor, Ralph, of dimensions. Now, when you pinned the technical folks advocating such an approach down, they would finally admit that what they were talking about was really nothing more than a fancy array processor. That's what it was. And that is the essence of this whole "dimensional model" concept.
It is interesting to compare and to contrast the approaches taken by Inmon and Kimball in their respective books on Data Warehousing. Inmon acknowledges that there is a debate extant. He also respectfully cites Kimball's contributions to the debate within the corpus of his text. Kimball is silent on the identity of his rival. And this silence really speaks volumes. He, Kimball, that is, is also strangely silent on even the efficacy of a relational design of any warehouse data structure, finally allowing that you may allow such a thing in a "staging area". But you mustn't let your users know about it. This is the strangest sort of censorship of important corporate data I've ever encountered. Consider the following: Suppose we work for an organization with say, seven million customers. Should we not, in this instance, have a relational database table somewhere that has seven million rows, one row representing each customer? And should not this table be readily available to our user community? These questions are intended to be rhetorical. However, on reading Kimball's book, we judge that he, and his followers, would strongly resist such a common sense line of reasoning.
Kimball's book is noteworthy in so far as he does present many interesting, and potentially useful, designs. However, his mute avoidance of the essence of the ongoing debate says all we really need to know about his outreach. Were the good Dr. Codd, inventor of the Relational Model for Database, alive today, it seems clear that he would give Ralph Kimball a good scolding, and direct him to stick to end user analysis, leaving actual issues of database design to more fully arrived professionals.
Very well written. We are guided step by step with many recaps. The methodology is logical and strong. Authors are very good teachers. From now on this is my reference book for dimensional modeling.
Product Details :
Paperback: 580 pages
Publisher: Big Nerd Ranch Guides; 1 edition (April 7, 2013)
Language: English
ISBN-10: 0321804333
ISBN-13: 978-0321804334
Product Dimensions: 7 x 1.5 x 9.9 inches
More Details about The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edition
or
Download The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edition PDF Ebook
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment