We debunk another commonly proposed approach for making a row-store perform like a column-store: vertically partitioning a row-store. Vertical partitioning is a performance enhancing trick that some DBAs perform to enhance performance on read-mostly data warehouse workloads. The idea is to store an n-column table in n new tables. Each of these new tables contains two columns - a tuple ID column and data value column from the original table.
Continue reading "Debunking Another Myth: Column-Stores vs. Vertical Partitioning" »
Consider a traditional, row-oriented database. Indexes are known to improve performance in database systems. They can greatly reduce I/O costs by avoiding the need to perform table scans since they directly contain the data you need to answer a query or contain pointers to such data. If you have a query that accesses only two out of thirty columns from a large table, and you have an index on these two columns, then you can use the indexes to avoid scanning all of the data in a table.
Continue reading "Debunking a Myth: Column-Stores vs. Indexes" »
Both column-stores and data cubes are designed to provide high performance on analytical database workloads (often referred to as Online Analytical Processing, or OLAP.) These workloads are characterized by queries that select a subset of tuples, and then aggregate and group along one or more dimensions. In this post, we study how column-stores and data cubes would evaluate a query on a sample database.
Continue reading "Understanding the Difference Between Column-Stores and OLAP Data Cubes" »
In a two-part post, we will illustrate how -- using a database query optimizer as an example -- the strategy of retrofitting databases in a distributed, shared-nothing grid computing architecture can fail. This argument will require some understanding of how centralized query optimizers work. Therefore, we will divide this discussion into two parts. The first installment will provide a background on centralized query optimization; the second installment will show why retrofitting a centralized query optimizer to work on the grid can lead to poor performance when evaluating queries.
Continue reading "Designing Systems for the Grid: The Problem with "Retrofitting," Part 1" »
There will soon be a myriad of announcements of DBMS offerings in the cloud. Many of these will NOT be marriages made in heaven. However, the most innovative new DBMS software married to new cloud computing services are here today and truly take advantage of the cloud architecture in order to change the economics and the responsiveness of business analytics.
Continue reading "DBMS innovations that will make analytics in the cloud a reality" »
Daniel's research interests are in database system architecture and implementation, cloud computing, and the Semantic Web. He currently serves on the Yale computer science faculty as an Assistant Professor. At Yale heteaches both undergraduate and graduate level classes on database systems, and directs DR@Y, the database research group at Yale....
Continue reading "Database Column contributor: Daniel Abadi" »
Cloud computing is ushering in a new era of analytic data management for business intelligence by enabling organizations to analyze terabytes of data faster and more economically than ever before. The key change: It's delivered in an on-demand basis. This alternative to traditional, in-house data analytics infrastructure will transform the economics of BI and open up many new possibilities for organizations of all sizes.
Continue reading "There's a bright cloud on the horizon ... and it will transform the economics of BI" »
In this post, Mike Stonebraker tackles two issues with regards to row- versus column-store databases. In the first issue, he looks at performance challenges given the demands of users. In the second issue, he discusses the availability of third-party connectivity as well as automatic database design tools.
Continue reading "Supporting Column Store Performance Claims" »
In this response to a Curt Monash post over at the DBMS2 blog, Mike Stonebraker offers his reactions. He sees two categories of relational analytic/data warehouse databases, row stores and column stores, and notes that they have very different characteristics and should not be lumped together. He also points out that if high performance is required, current high-end relational engines can be beaten by a factor of 80 or so on TPC-C.
Continue reading "In response to Monash's post on the four categories of RDBMS" »
In this post, Mike Stonebraker comments on a post over at DBMS2 titled "Database management system choices - overview." Mike makes two points. First, he offers his list of the different types of DBMSs that he sees as viable. Second, he discusses OLTP and the shared nothing architecture.
Continue reading "Responding to Monash's recent post on diversity of database systems" »