Following last week’s blog Roy asked why DBMS vendors continue to implement logical rows as physical rows indistinguishable from records in files and suggested that we at the conclusion of his comment he questioned how far we’d come since the mid-70′s. Bill Maimone, Chief Architect at Actian, had the following thoughts on the topic:
Actually, we’ve come a long way since the mid-70′s.
The beginning of the forward march of progress was making that separation between logical tables and physical tables. This abstraction made it easy to write a program that accessed those records without depending on the actual physical location or layout of the records. It’s not that one couldn’t have written programs like this in 1975. The code base I was working in back then, written in Fortran and Macro, separated out data access routines into separate modules, and when record formats changed only those few modules needed to modified. Nonetheless it was not all that convenient. Adding a field or changing widths required not just the isolated module changes, but also conversion of old existing data files (made less fun at the time as most data sets were on archived on punch cards). And I remember noticing that the one guy who designed and wrote those low level data access routines was the smartest, most senior person in the team. One of many advances of formal database products was to take care of the messy business of physical layout, largely maintaining compatibility of existing programs and data. In the early days of relational databases I saw quite a few new customers adopt relational databases by creating one table for each existing physical file — a clear affront to the relational purist. At first I thought this was strange; why buy a relational database and then use like a file system? Because these customers saw tremendous value in the “DBMS”, even if they didn’t care so much yet about the “R”. You could add, widen, and shuffle fields in a file without breaking programs, and you didn’t have to be a genius to do it. As an added bonus these new “RDBMS” products enabled new access paths and indexing, including access spanning multiple files.
And then the real fun starts. The rows in a relational table might seem at first to correspond to physical file records, and certainly you’re welcome to think of them that way. But databases give you much more. The logical record might correspond to a physical table structured as a heap, a b-tree, or a hash table, or the table might be horizontally or vertically partitioned as with Stonebraker’s Vertica. Depending on the form of request, the same piece of data might be retrieved from the table itself, or from a secondary index. Your “table” might actually be composed by joining together or subsetting several tables. The accessing program doesn’t need to know the actual physical structure, and the same program can run against data sets with different physical layouts. Certainly, sometimes a row in a table is just like a record in a file, but quite often they aren’t at all alike. To this programmer we have indeed come a very long way from the mid-70′s.