Insights from Strata Data Conference 2018

September 11-13, 2018 New York

The 2018 Strata Data Conference in New York had a different buzz than the hype and promise of years past. The sentiment this year was a bit more subdued as attendees grappled with the reality and complexities of how hard it is to operationalize the modern data and analytic platforms necessary to deliver machine learning and artificial intelligence. This shift in focus could be considered a balancing act for AI maturity with the combination of people (the data scientists), technology (machine learning), and now process (application and operationalization). As the pendulum swings, data governance, environment management, and multi-cloud architectures surfaced as key topics due to their fundamental roles in applying AI and ML for business impact.

Strata Hadoop may have started as the original Hadoop big data and MapReduce conference but, Strata Data has become the Data Engineering conference for implementing data lakes, data streams, Spark, and Kubernetes to drive enterprise analytics. (Some attendees commented that a search in the Strata Event app for “Hadoop” returned zero results found.)  Conference host O’Reilly Media also has a separate conference dedicated to applied Artificial Intelligence, which allows Strata Data to focus on the data side – including details regarding data engineering and managing modern data platforms across Hadoop, multiple clouds, or containers for the reliable delivery of business analytics and AI.

Many attendees consider themselves “Data engineers,” and typically were interested in learning how to make the plumbing for the modern data platforms work in a production environment with new and better data streaming technologies and techniques that reliably deliver and monitor analytics and ML at enterprise scale. Many wanted to know: “Are we doing this right?” or “Can we do this better?” One of the primary challenges in data and analytic platform modernization comes with maintaining it reliably with new and open source technologies on new multi or hybrid cloud environments – with all of the overhead of the data management work. Technology can help with that, too, and a common theme among vendors at the conference was how automation, abstraction, or AI can mask or assist with managing the complexity of fast-changing and large-scale data integration.

If you looked closely, you could see the re-emergence of the “Data Warehouse” term (worn by Cloudera and new exhibitor Yellowbrick, as well as noticeable elsewhere) as Strata shifts from Big Data to All Data. I believe this is a recognition of the fact that companies still need to grow and evolve their enterprise data warehouses alongside their new enterprise analytic capabilities for self-service data prep and data science. Or it could be due to the progress that Snowflake –noticeably not an exhibitor at the event – is making with marketing their data warehouse-as-a-service offering. Personally, I believe the revival of the term is due to the convergence of data warehousing and big data analytics platforms to deliver comprehensive enterprise analytics while sharing scarce ETL /data engineer resources.

Visible from messaging, sessions, and attendee discussions, one key take away from the event is that Kubernetes is the way forward. As data platforms are becoming ubiquitous, so must the application layer with containers. The Hortonworks announcement on Open Hybrid Architecture Initiative backs that, along with similar signals from MapR and Cloudera. Deploying Hadoop as a Kubernetes container in a multi-cloud environment will continue to evolve how modern data and analytics are deployed from Dockers in YARN.

Strata typically arranges star-studded keynote presentations, and this year was no exception. Notably, Google Cloud’s Chief Decision Scientist Cassie Kozyrkov presented insights on how to manage the development of AI based on experience from training over 17,000 people on decision intelligence. Kozyrkov defined decision intelligence as data science plus social science plus managerial science, and emphasized that data science cannot be relied upon alone. Echoing Google’s approach, she urged the need for testing AI as much as possible to provide the trust needed in AI, and she reminded everyone that all technology is an echo of people who designed it. She called upon members of the audience to responsibly apply their skills to increase decision intelligence and to look to make a difference by solving the right problems.

I would like to thank all the vendors that shared their company, product, and customer updates, especially Infoworks, data Artisans, MapR, Cazena, Hortonworks, Yellowbrick, and Iguazio.

*Image courtesy of O'Reilly Conferences via Flickr


John O’Brien is Principal Advisor and CEO of Radiant Advisors. A recognized thought leader in data strategy and analytics, John’s unique perspective comes from the combination of his roles as a practitioner, consultant and vendor CTO.

Login to Read This Article

Sign up or login for FREE.
Get instant access to all the research articles published by Radiant Advisors.

Login Sign Up