These days, the phrase “data models are dead” seems to find its way into high-debate conversations with IT application development, business intelligence (BI) teams, and data management vendors, and is brought about by the confluence of several recent major trends in IT, BI, and technology that are challenging the classic data modeling paradigm. The real debate, however, should be about how semantics should be analyzed or discovered and where that definition should be maintained for data going forward.
One major driver in this debate is the current technology adoption shift: the rise in data technologies such as NoSQL data stores – that are flexible with schema-less key-value stores – along with the mainstream acceptance of analytic databases that leverage similar columnar and in-memory architectural benefits for BI. These technologies allow data elements to be arranged into “tuples” (or, records based on a programmer’s definition) outside of the physical data model, and simultaneously enable the ever increasing drive by the business that applications be built quicker and more flexible for competitive advantage (i.e. first to market and the ability to adapt quicker than the competitor). There is also more acceptance – realization – of the fact that the business and analysts don’t always know what is needed, but want to discover what is desired through user interactions.
Living More and More Without Data Models
From the programmer-centric view, accessing data in key-value formats matches the objects they are loading data into for application execution, yet object databases never really became the mainstream as many hoped, and the “object-to-relational” layer gained traction with incumbent relational databases. Primarily, it’s been the flexibility, adaptability, and speed that have driven many application developers to use key-value stores. This is because it moves the semantic definition away from the rigid structured physical data model to the application layer, wherein a developer can control simple changes or additional data elements in code, then simply compile and redeploy it. Depending on the application at hand, many developers also are embracing document stores as their data repository.
BI developers, on the other hand, have been finding value with what key-value stores (like Hadoop) have to offer from both an information discovery and analysis perspective. Once again, when you remove the semantic definition – or, perspective bias – from data, analysts are able to discover and test and witness new relationships among data elements: analysts can work with the semantic definitions in very quick and iterative fashions through the use of abstracted or data virtualization layers above the data. Testing semantic definitions early on in BI projects are proving to be invaluable in attaining a more complete understanding of data quality and avoiding business rules issues that could be disruptive and cause significant impact later. Finally, the interactive process involving business users alongside analysts and modelers is proving to create more accurate and faster BI products, similar to the agile BI process.
Can’t Live Without Data Models
As we see more applications move the semantics of data into their application layer and away from physical data models, we must also recognize that those applications are the source systems for many data warehouses (DW). If the business use of operational data sits in the application itself and not the physical database, then BI analysts and integration specialists are flying blind – or worse yet, may misrepresent operational data in the DW.
When working with application and BI development teams, we have seen two approaches (or a hybrid of these two approaches) that work well. First, we argued that “an order is an order” for well-understood entities used in operational data models, basically encouraging application teams to only make parts of their data models “dynamic” where they needed them – like in sub-typing. This process would have a super-type for product, with respective sub-types, and one specialized sub-type to allow for dynamic creation of new sub-product-types that could be migrated later to a formal sub-type. This approach satisfies the need for flexibility and speed. Second, the use of metadata models encourages the desire for meta-driven applications, while providing the BI team with a “key” to unlock data semantics and receive warning beforehand of dynamic data changes.
However, mostly important is the distinction that data models exist not only for an application’s use, but rather to persist data within context for many information consumers throughout the enterprise. BI and information applications not only deliver reports and information, but also support (and should encourage) ad-hoc requests and analytics within the proper context. Data models, especially in BI, are becoming a part of the data governance umbrella that govern whether data is made available to the right people, at the right time, and used properly.
There will always be a strong need for a reference data warehouse. With good data governance, this data platform will enable business users to have self-service capabilities and prevent the misuse of information that could cripple an organization.
Where Data Models are Born
What is being discussed today is not really about whether the data model itself is dead, but rather how analysis is being conducted, discovery-oriented, and where the results of analysis – context – should be persisted. (Sometimes context should reside in application code that can deal with change faster, and sometimes instead in physical data models that can ensure that as many business users as possible can leverage a commonly agreed upon and proper context for decision-making in the business.)
Modern data platforms balance and integrate the use of both flexible and structured data stores through Hadoop and RDBMSs, but it’s the analytics lifecycle methodologies that will enable information discovery and the governance to decide whether to migrate and manage analytics throughout the enterprise.
Modeling is about performing thorough analysis and understanding of the business; the resulting data models should represent the data persisted by the business in databases or virtual data layers. Key-value stores may be where a discovery process – as a form of analysis – leads to the “birth of data models,” which then can be properly persisted for business information consumers to share and leverage.