DATAVERSITY’s second annual Data Architecture Summit brought some 350+ data architects, CIOs, CDOs, IT directors, and other data professionals together in Chicago this year to learn, network, and discuss the challenges and trends surrounding enterprise data architecture and platforms. Conference programming and conversations revealed that even with rapid advances in technologies – such as those associated with AI, cloud, and data science – many of the traditional challenges associated with enterprise data warehouse initiatives remain relevant in our modern world, including effective data governance, metadata management, and data quality. Data is still data, except now there’s more of it to manage coming at you faster than ever, hence the need for AI assistance in the world of data management.
Many attendees weren’t shopping for specific solutions; instead, most were more interested in hearing about generalized topics such as best practices or recommendations for new data engineering efforts, team requirements, and conceptual understanding of new data platforms. More than one attendee described goals for the conference beyond data engineering mechanic specifics. Instead, common concerns related to operational aspects of handling communication between analytics delivery teams (including data engineering) and core data teams to get the data they need. At Radiant Advisors, we typically recommend that clients establish a dedicated core data management team that supports analytic teams organizational strategy, and this emphasizes the importance of the customer service aspect of their teams while enabling others as part of the mission of core data services.
Other discussions revealed the conundrum of whether new business-sided data engineering teams should operate with great autonomy, and thus be forced to become full-stack developers; such an approach facilitates freedom but lacks sufficient guidance in areas where technical skills may be lacking. Whether it’s Kubernetes, Dockers, or centralized best practices with GitHub, these teams would benefit from a Data Engineering Competency Center for DataOps – a modern equivalent similar to what traditional BICC standards and practices established.
The session, “Architecting an Analytics Solution on Amazon Web Services“ by Gabriel Villa from RevGen Partners, prompted good discussions as attendees questioned whether Amazon Glue was a capable replacement for their Informatica or DataStage on AWS EC2 or simply reserved for light-weight basic transformations. The discussion really centered on common drivers for migrating off of traditional ETL servers and what that migration strategy would look like. While this was a valuable discussion, the point of Gabriel’s session was instead to demonstrate the ease of provisioning services with AWS. Within 45 minutes he was able to create an S3 bucket, load a CSV file, create an RDS instance, set up a Glue and Lambda task to load the RDS database, and then demonstrate how the Amazon Machine Learning service could use that newly created data pipeline to predict customer churn.
Graph databases and data catalogs were both top discussion topics, as attendees asked about practical applications, how they integrate into enterprise environments, and real-world success stories. Graph databases were clearly recognized to be highly flexible for storing the complexities of business context for data in a fast-changing data world. Data catalog discussions usually first tackled how to compare different data catalog products, which then often prompted the thought-provoking question, “Is the data catalog a product or a feature?” Others discussed the relationship between their existing metadata repository and new data catalogs: Is there a difference between top-down and bottom-up grassroots efforts for capturing departmental tribal knowledge? Is the data catalog only relegated to self-service data efforts and data lakes? And, is it necessary to purchase a data catalog when many data prep and data lake management ecosystems may include it as part of their offering?
Personally, I disagreed with a couple of the key statements presented during the Summit (but that happens with most people at most conferences). First, one of the speakers recommended that architecture projects should go find application project budgets in order to secure funding. In my experience, application teams don’t appreciate having a “tax” placed on their projects to pay for something that benefits other teams while increasing the requested budget amount, duration, and risk of their own. We promote the approach that any architecture project or initiative needs to justify its own ROI and business value to stand on its own. Architecture should be able to quantify ROI in faster analytics delivery, better data quality, or a new analytic capability. Second, rather than committing to a rigid 3-5 year recommended roadmap for architecture frameworks or conceptual architectures – regardless of how compelling and visionary they may be – it is essential to incorporate periodic reviews and evaluations to assess the impact of changes in the industry. I believe that enterprise architects should change their mindset to an agile architecture or evolving Architecture that accounts for course adjustments as business priorities and technologies change over time.
Overall, the event demonstrated growth since the inaugural Summit last year and provided a valuable forum for architects to learn, network, and exchange ideas with peers.