October 15, 2015
By Radiant Advisors Staff
Big data practitioners, analysts, architects, and providers gathered at the O’Reilly Strata + Hadoop World in New York City for what proved to be an insightful, educational, and thought-provoking event. More than 6,300 people registered for the event at the Javits Center, and many standing-room-only sessions demonstrated the pervasive industry need for practical approaches and techniques for making better decisions with data.
The air was buzzing with methods and technologies for improving decision-making, and the importance of the impact of big data was emphasized through the theme of making a difference by solving practical, important problems. Unsurprisingly given the name of the event, Hadoop and Spark were the most common topics of discussion as attendees search for knowledge regarding latest components and advances in a rapidly changing ecosystem. Beyond specific technologies, several themes emerged in the packed keynote sessions, presentations, and panel discussions.
- A call to use data to make a difference. Big data’s potential for value is huge, but speakers encouraged attendees to use data to enable decision-making for solving real problems. Incremental improvements in recommendation engines are great, but where can the use of data really make a difference? Jake Porway of DataKind emphasized the need to use data for the best of intentions, sharing how thousands of data scientists are coming together to do pro bono work on more than 60 DataKind projects with the goal of using data for social good. Read more about applying data science for social good in this O’Reilly report: https://www.oreilly.com/ideas/five-principles-for-applying-data-science-for-social-good.
- How data is used can be scary, and people are spooked. Along with big data comes responsibility and accountability; we must be thoughtful about how/what data is collected, analyzed, stored, shared, etc. Practitioners and business leaders must consider appropriate data practices and foster an environment of ethical data use. DJ Patil, chief data scientist for the White House, called attendees to be involved and to continue the data ethics conversation, which he says is the first stage in gaining traction.
- Start with a business problem in mind. This point was emphasized again and again. Doug Wolfe, CIO for the CIA: “Start with the question we’re trying to answer.” From there, the approach he takes is to look at the type of data that would make a big difference, and then what types of data when joined together would make a big difference.
- Data may be king now, but don’t underestimate the value of skills and expertise to help make the decisions. In a keynote addressing data versus creativity, David Boyle of BBC Worldwide explained that sometimes the perfect answer as identified by an algorithm may be far from ideal in the real world. Contributing factors exist that humans can identify but machines cannot.
- People and organizations want to data science. Many discussions from speakers and attendees centered on how to advance the culture of data science with machine learning, advanced analytics, and practical, how-to-do-data-science activities, especially when the skills around Hadoop (and other advanced technologies) may not be available within the organization. The CIA’s Wolfe commented that the culture change of bringing in data science is a challenge and called for intuitive tools and UIs to enable the operators and analysts to increase the impact.
Many attendees were looking for practical tips and techniques about architectures, applying data science, moving from traditional business analysts to data science stuff, etc, and a bustling expo hall filled with vendors provided an environment to foster such discussions. Several case studies, such as those from Netflix and Wall Street Journal, served as inspiration and, based on the number of photos taken of the architecture slides, perhaps model design architectures for attendees’ organizations.
Overall, big data and advanced analytics have moved into the mainstream, and it is not just the early adopters or bleeding edge companies finding value from such initiatives. Hadoop and open source components around its ecosystem supported by vibrant, engaged communities are proving to make a difference when it comes to data-driven decision-making. Further, the potential of Spark will drive innovation within such environments.