The 2017 Strata + Hadoop World conference in San Jose, CA, was notably different from past Strata + Hadoop conferences. In the past, the emphasis on big data and Hadoop carried an enthusiasm and energy that propelled attendees into almost frantic learning. This year, keynotes and session presenters blissfully geeked out on machine learning and artificial intelligence topics, leaving Hadoop out of the conversation almost entirely.
The opening keynote was a precautionary tale on the conference theme of “Machine Learning Renaissance,” tracing the original inception of AI dating back to the 1960s. Mike Olson, chief strategy officer and chairman of Cloudera, referred to what followed that era as an “AI winter,” when progress and much of the research stalled because we had the theory right but we didn’t have the technology capabilities or enough data to be able to support those theories. Only recently have we entered a renaissance – a turning point where we now have the computing and processing power for AI routines that move it from theoretical to practical. The word of caution is that the ML libraries, services and platforms coming out are so easy that we may make huge mistakes in the effort to move quickly. Olson urged people to be careful and slow down to avoid future setbacks.
A common theme surfaced throughout the conference: enabling more people to develop machine learning applications with easier and faster platforms, specifically improved or enhanced data science platforms and environments. Most of the keynotes centered on successful machine learning stories, many of which focused on the greater good of people and business. Coursera Co-founder Daphne Koller did an excellent job of showing how to tackle the challenges of massive online learning with machine learning; Jason Waxman, Intel corporate VP for the data center solutions group, shared how AI is helping tackle cancer in China; and Eric Frenkiel, CEO of MemSQL, described MemSQL’s support for the non-profit Thorn that utilizes facial recognition processing to prevent the exploitation of children on the Internet. Other keynotes shared examples of AI and machine learning in “Fake News,” robotics, genome sciences, disaster responses and behavioral sciences in government agencies.
The definitions for AI and ML are not concrete, and despite the intense focus at the event, a lot of people sought to resolve any confusion about what was what. The sessions focused primarily on machine learning topics presumably because you don’t really teach AI, per se. As such, sessions on deep learning and neural networks fit the bill. To attempt define and categorize the space, many people and vendors have published versions of hierarchies and taxonomies that jockey the position and relationships between these related technologies. Rob Craft, the machine learning product lead for Google Cloud presented a hierarchy in his keynote (see slide below) that did provide a bit of the desired clarification. He went on to share the advancements that Google was working on for AI and its 15 competencies from machine vision to deep learning.
Sessions throughout the conference focused mostly on machine learning topics by vendor sponsors. Sessions by AI model companies such as Uber and Netflix sometimes seemed difficult to relate to, since most companies do not have infrastructure and platforms to accommodate some 9 petabytes of data per day. My favorite session was the AWS half-day tutorial that was a hands-on session to create a fully working end-to-end data pipeline from Kinesis Firehose to Kinesis Analytics, Lambda, ElasticSearch, Redshift, S3, Athena, Zeppelin and Quicksight. The second major theme through the conference was data streaming and data pipelines with great sessions on Kafka by Confluent and Kinesis by AWS.
For the bleeding edge people, Michael Jordan of UC Berkeley AMPLab introduced “Ray,” a new software platform for machine learning that researchers developed to replace Spark. It is intended to be more robust and efficient at ML, whereas Spark is primarily a multi-advanced analytics platform. The memory models and processing of Ray are all built to function as a machine learning platform. The platform is in its early days and its alpha release is on GitHub now, with a solid platform anticipated by next year.
With all the AI and ML buzz, one couldn’t help but notice what people weren’t talking about: Hadoop. Highlighting this departure from past events, halfway through the second day of keynote sessions Doug Cutting came on stage to proclaim that it was time to change the name of the conference. Because attendees are moving forward finding new valuable ways to leverage data with machine learning and AI, the “+Hadoop” part of the conference name is not as relevant as it had been in the past, he stated. The conference will officially be called the “Strata Data Conference” going forward.
This Strata conference marked the tipping point where attendees – and the data industry as a whole – are moving into this next generation of AI-driven applications to deliver value to society and business needs. While AI is rooted in the 1960s, it’s a green field of opportunity today due to the tremendous availability of data and accessible computing power. Hype is easy to generate but AI is an advanced field and next era of data will require prudence and patience in the beginning.