The Strata Data conference once again took over New York’s Javits Center last week, celebrating its 9th year in New York City. This conference seemed markedly different from the super-techy Hadoop, data science, and deep learning focus of the past, with more emphasis on solving data needs and challenges across the whole organization – including the business side.
During the opening keynotes, O’Reilly Media Chief Data Scientist Ben Lorica shared research on the state of the data industry and highlighted important themes:
- Data ingestion and processing
- Data governance, catalogs, and lineage
- Data quality
- Data management and storage
- Data lakes, evolved
- Business intelligence and visualization / Analytics and machine learning
These themes were certainly evident in the conference sessions, vendor exhibit hall, and thought leader and attendee conversations. Specifically, artificial intelligence and machine learning, architecture constructs – particularly platforms, containers, and hybrid and multicloud – and data governance, including data catalog and lineage led the way.
AI and ML: Now is the Time
The dominant phrase and trend was certainly anything related to AI/ML, as vendors and conference planners and participants wanted to be on the front of the wave and make AI relevant to the data crowd while practitioners and speakers sought to take AI beyond the hype. (In light of this, it’s unsurprising that in 2020 Strata Data conference will be collocated with Strata AI conference.) It was common to hear “Without data you can’t have AI,” and “Without quality data you won’t have good AI” – after all, if you use bad data for machine learning, you will have a poorly functioning and inaccurate model.
In his keynote chat with Tim O’Reilly, founder of O’Reilly Media, IBM’s Rob Thomas pointed to real-world examples in different industry verticals and stated that “AI is going to be about empowering humans to do their jobs better.” This will largely occur through automation that enables people to focus on more important work. Addressing the fear factor behind AI, Thomas stated: “AI isn’t going to replace managers, but managers that use AI will replace managers that do not.” Accordingly, data science skills are becoming increasingly important.
Architecture: the Foundation for AI/ML and Analytics
IBM’s Thomas also stated “There is no AI without IA [information architecture],” adding that architecture considerations are not the primary challenge. The supporting technology exists, largely through hybrid and multicloud options and platforms designed to unify data ecosystems. During Cloudera’s opening keynote with IBM’s Hillery Hunter, Hunter shared research insights into the landscape for hybrid and multicloud. Notably, 94% of enterprises are using multiple clouds today, and two-thirds of those are using multiple public clouds.
Further, Cloudera announced the general availability of its Cloud Data Platform, which it is promoting as the way for IT and business organizations to be agile and flexible, quickly enabling users to do more with data with an “enterprise data cloud” that provides multifunction analytics in a secure, governed, open manner. Cloudera emphasized how the CDP platform supports various data needs across different roles throughout the enterprise, enabling business to act with agility and IT to act quickly and appropriately to meet business challenges.
Architecture considerations was a topic frequently tackled by speakers and vendors, and many acknowledged the reality that most organizations face complex combinations of architecting and embracing hybrid and multicloud scenarios – as well as public and private cloud and their own data centers. With the goal of speeding the time to insight, cloud or on-prem platforms must support all the data ingestion, processing, self-service analytics, workflows, AI/ML development and scoring, and orchestration required for the enterprise, and DevOps is situated to develop these capabilities to deliver use cases while continuously refining and improving along the way.
Governance: Now at the Forefront
One of the evident ways the Strata Data Conference has evolved over the years is to now encompass broader data needs from the business side as well the tech/engineering side. This was clear in the prominent theme of data governance – and the number of attendees who came specifically for the topic. Clearly, data governance is now accepted as necessary to safely and appropriately address the data needs of today’s self-enabled users across the data environment. More than just regulatory compliance or risk avoidance, enterprises are embracing governance, and many vendors and speakers dedicated attention and emphasis to the foundational need for a data catalog, data lineage, and metadata management as critical to help organizations know what data they have, where it is, and what state it’s in – both for protection/regulation and for practical use.
Evolution of Messaging, Evolution of Technology, Evolution of Needs
We now live in a world where data and analytics are pervasive. Ultimately, the journey, as Rob Thomas spelled it out, is from data collection to organization, to analysis, to infusing the data and models in the organization.
The diverse job titles of attendees at Strata are representative of different roles in the enterprise – each with its specific data needs and resources. Fortunately, conferences, learning resources, and technology are advancing to create and support teams of enabled, empowered people working together for the most impact.
I would like to thank the Cloudera Analyst Relations team for hosting the Cloudera Analyst conference before Strata, yielding greater insights.