SQL-on-Hadoop Performance Benchmark – Q2 2016
With the number of new and rapidly evolving technologies for big data analytics, how do you know which SQL engine to use in order to optimize queries and workloads in your Hadoop environment? This independent benchmark, conducted by Radiant Advisors and sponsored by Teradata, tested performance of the latest versions of Hive, Impala, Presto and SparkSQL in GA at the time of testing, and also verified the impact of Hadoop distributions, file formats, volumes, and degree of query customization required. Our intent was to derive pragmatic applications of each SQL engine that help companies implement and optimize analysis and reporting in their Hadoop environments. Click here to access the report from Teradata’s website.
Open-Source SQL-on-Hadoop Performance
This independent performance benchmark focused on the performance dimensions of speed and SQL capabilities by comparing SQL-on-Hadoop options that are open-source and free to download for existing Apache, Hortonworks, and Cloudera distributions. We also illustrated the relationship to available data file encodings — such as Optimized RC, Sequential, Parquet, or InfiniDB — for compression, performance, and openness. Click here to download the report. Miss the webinar? Click here to watch the recording!
Three Checkpoints for Governed Data Discovery
For successful data discovery, analysts must be able to move quickly and iteratively through the discovery process with as little friction – and as much IT-independence – as possible. With self-sufficient people involved, ensuring data and analytics are both trustworthy and protected becomes more difficult and imperative. This becomes a careful balance of freedom versus control, and brings the role of governance to the forefront of the discovery conversation. But discovery and governance are seemingly at odds — how is it possible to create an environment that facilitates discovery while providing secure, self-sufficient access to data and insights? In this new e-book, learn what governed data discovery is and how to institute checkpoints within the discovery process to reduce friction between analysts and IT and enable governance and sharing of trusted, valid data and insights. Click here to download the ebook. Or, click here to read the full research report.
Data Lake Adoption and Maturity Survey Findings Report
By surveying both current and potential adopters in the industry, this study documents key perceptions, challenges and successes by focusing on data organization, integration, security, and definitional clarification to address key areas of concern and interest in ongoing data lake adoption.
The study sheds light on how companies perceive and are addressing critical lake success factors, including rethinking data for the long-term, establishing governance first, and tacking security needs up front.
Click here to download the report.
The Data Visualization Competency Center™
Data visualization offers a tremendous opportunity to reach insights from data by leveraging our intrinsic hard-wiring to understand complex information visually. However, successful data visualization requires using the right kind of graphicacy to correctly interpret and analyze the data, as well as employing the right combination of design principles to curate a meaningful story. This report introduces the role of the Data Visualization Competency Center (DVCC)™ to support the use of effective self-service data visualization by providing best practices, standards, and education on how these information assets should be designed, created, and leveraged in the business.
- Educate users on visual design principles and key cognitive elements affected by data visualization
- Provide best practices and proven standards for understanding types of data and how to visually present them
- Foster a culture of communication, collaboration, and collective learning that enables a review network for newly created data visualizations
Enabling Governed Data Discovery
Today, data discovery and self-service are making data governance a charged topic. As business-driven data discovery continues to become fundamental, ensuring data and analytics are trustworthy and protected becomes more difficult and imperative. This research explains how to manage the barriers and risks of self-service and enable agile data discovery across the organization by extending existing data governance framework concepts to the data-driven and discovery-oriented business.
- Understand the “freedom vs. control” paradox
- How to design for iterative, “frictionless” discovery
- Key governance checkpoints in data discovery
Click here to download the report.
The Definitive Guide to the Data Lake
It would be an understatement to say that the hype surrounding the data lake is causing confusion in the industry. Today’s newcomer to the data world vernacular – the “data lake” – is a term that has endured both the scrutiny of pundits who harp on the risk of digging a data swamp and, likewise, the vision of those who see the potential of the concept to have a profound impact on enterprise data architecture. As the data lake term begins to come off its hype cycle and face the pressures of pragmatic IT and business stakeholders, the demand for clear data lake definitions, use cases, and best practices continues to grow. This paper aims to clarify the data lake concept by combining fundamental data and information management principles with the experiences of existing implementations to explain how current data architectures will transform into enterprise data operating systems. While the data lake is a metaphor for this transformation, enterprise data management will continue to evolve the data lake according to established principles, drivers, and best practices that will quickly emerge as hindsight is applied at companies.
Click here to download the report.
Miss the webinar? Click here to watch the recording!
Overcoming Barriers to Data Virtualization Adoption
The challenges of data management are getting exponentially harder. With the ever-increasing quantities, sources, and structures of data – as well as the insurgence of new tools and techniques for storing, analyzing, and deriving deeper insights from this information – data-driven companies continue to evaluate and explore data management technologies that better integrate, consolidate, and unify data in a way that offers tangible business value. Data virtualization is so compelling because it addresses business demands for data unification and supports high iteration and fast response times, all while enabling self-service user access and data navigability. However, adopting data virtualization is not without its set of barriers. Primarily, these relate to building a business case that can articulate the value of data virtualization in terms of speed of integration alongside the ability to manage ever-growing amounts of data in a timely, cost-efficient way. Supported by Cisco Data Virtualization, Radiant Advisors had the opportunity to further explore and understand the barriers experienced by companies considering data virtualization adoption, and then to pose these questions to companies that have already adopted data virtualization to glean their insights, best practices, and lessons learned. Together, the two halves of this research facilitate a practicable, independent, and unscripted “cross-talk” to fill information gaps and assist companies in overcoming barriers to data virtualization adoption. Click here to download the report.
Why Spark Matters
Spark is quickly becoming a standard for writing deep analytics that leverage in-memory performance, streaming data, machine learning libraries, SQL, and graph analytics. The Spark environment provides big data developers and data scientists a quicker way to build advanced analytics programs with its abilities to overcome shortcomings of MapReduce and to meet the demand for faster and more powerful processing for the full data pipeline.
- The rise of Spark
- Overview of what Spark is
- The drivers for its adoption
- What to expect as the ecosystem continues to evolve
- Five questions that are top-of-mind when considering adoption
Click here to download the paper.
The Visual Design Checklist: Balance, Emphasis and Unity
User-friendly features and functionality of modern BI, analytics and visualization tools allow more people to independently discover and communicate insights within organizations. However, this freedom requires understanding of proper design principles in order to create meaningful, accurate visualizations in ways our brains are naturally wired to perceive. The new Visual Design Checklist teaches emerging strategies in visual design in order to facilitate effective communication of trends and insights within data. This checklist is a guide to creating effective data visualizations and designing for visual dialogue. Key concepts explored include:
- Emphasis, balance and unity between design elements
- Understanding the picture superiority effect — and why it matters
- Operating within the triangle of forces visualization constraints
Click here to download the paper.
Driving the Next Generation Data Architecture with Hadoop Adoption
Hadoop has been a phenomenon, both as a framework for big data workloads and its operational capabilities. Major initiatives brought Hadoop from its batch-oriented roots to the interactive capabilities that are delivering improved performance in SQL engines and with distributed in-memory engines. Operational analytics are leading the way as “one of the first” steps towards operationalizing Hadoop as a platform. There are core data management principles that will guide Hadoop adoption, however there is also a change in mindset needed to rethink the role of Hadoop beyond a big data and analytic platform. This paper examines the emergence of Hadoop as an operational data platform, and how complementary data strategies and increasing year-over-year adoption can accelerate consolidation and realize business value in agility and reduced development efforts. Click here to download the paper. Miss the webinar? View it here.
The Power of Strategic Partnerships in Thriving Hadoop Ecosystem
Alignment of the organization’s enterprise technology strategy and existing technologies to a Hadoop distribution vendor-partner alliances and product ecosystem is critical on the journey to the successful Data Lake. In this white paper, we classify the hierarchy of technology vendor-partnerships to provide a method for understanding vendor-partner alliances to recognize key relationships that will matter most for companies adopting Hadoop. Then, we apply these classifications to several of Hortonworks’ strategic vendor-partner alliances to highlight the significant collaborations and shared-vision commitments that provide the most unique competitive advantage for customers. Click here to download the paper.
Key Considerations for Analytic Solutions for Life Sciences
From pharmaceuticals to global health to the environment, twenty-first century life sciences companies are transforming into data-driven life sciences companies, leveraging vast amounts (and new forms) of data. A strong emphasis on analytics and data discovery for new insights is introducing challenges in how data is leveraged into the fabric of life sciences organizations. Today’s analytic challenges for life sciences companies, then, can be separated into three distinct categories: the integration challenge, the management challenge, and the discovery challenge. However, the answer to these challenges isn’t the development of new tools or technologies, but instead that life sciences companies should turn to collaborative and transformative solutions that already exist. Download this white paper to see how embracing a data unification strategy through the adoption and continued refinement and governance of a semantic layer to enable agility, access, and virtual federation of data, as well as by incorporating solutions that take advantage of scalable, cloud-based technologies that provide advanced analytic and discovery capabilities — including visualization — life sciences companies can continue to become even more data-capable organizations. Click here to download the paper. Miss the webinar? Click here to watch the recording.
Enabling Competitive Advantage with Modern Data Platforms
While understanding the term “big data” continues to vary, its underlying business value proposition is clear: the ability to affordably store all the data you can imagine, and work with it in ways never before possible. Big data has opened the door to the next true revolution – the Age of Data. Many talk about big data as being today’s booming oil wells, with so much crude oil available that anyone who can refine it into valuable, consumable, data-driven products will have a successful business model. Businesses today can aspire to leverage data in ways previously only attainable by data giants like Google, Facebook, Amazon, and LinkedIn. However, it’s the experience – giving the consumer the information they want in a simple, intuitive, and instantaneous manner – that makes services like Google, Yahoo, or Bing truly valuable. Providing consumers with correct information is important, and accuracy increases statistically as more data becomes available to work with. Simplicity is the result of a product or service’s ability to mask back-end complexity for the user. Being instantaneous, in turn, comes from having the right technologies and platform that deliver the right information in the fleeting “moment of opportunity” for users. The key to successfully creating new, competitive business value in the Age of Data requires equal parts big data, ultra-performance, and relevant context.
Unleashing Business Processes with SAP HANA In-Memory Database
Learn about SAP® HANA® in-memory database and discover how the technology can help transform and optimize your business processes. Review best practices, analytic capabilities, and business process framework to guide you on your strategy and planning. HANA is a fast, agile and scalable database solution that uses columnar database technology to compress data efficiently. It leverages advanced features such as vector processing and single instruction, multiple data (SIMD) parallelization. HANA implements three best-of-breed database engines — columnar, text analytic and graphing database engines — in a single in-memory system. In order to get the most out of HANA, you need to implement, integrate and coordinate across multiple business and technology domains. You also need to identify business processes that are good candidates for analytic enablement and enrichment. The following information can help you learn more about HANA and take advantage of its speed, flexibility and analytic power:
- Business process and decision framework enabled by HANA technology and its features
- HANA technology convergence and its analytical capabilities
- Business Process and Technology Maturity Index
iVEDiX: Making Visual Discovery Mobile-First
Over the past few years, iVEDiX has emerged as a proven solution differentiated by its highly customizable visual analytics platform. As a true mobile-first platform that focuses on intuitive user interaction to drive engagement, iVEDiX embraces the mobile paradigm with unique, cutting-edge visual analytics in a compelling, meaningful, and collaborative way. Click here to download the solution brief.
Dundas BI: From Dashboard to Self-Service Platform
Over its more than twenty years in the data visualization industry, Dundas has progressed on a product roadmap to grow from developer components to a compelling dashboard framework. Today, with the Dundas BI platform, Dundas is introducing an enterprise-class BI solution designed to deliver a self-service experience on one flexible platform. Click here to download the solution brief.
The challenges of big data are well worth the opportunities. The ability to gain insights from new forms of data (and previously difficult to work with data) is now a matter of choice in architected deployment options. With the democratization of big data, too, everyone can enjoy its benefits. The strategic Hadoop ecosystem partnership between Hortonworks and Microsoft delivers a new set of architected solutions with key benefits for more companies. The strategic partnership between Hortonworks and Microsoft demonstrates a shared vision for democratizing big data since first announced in 2011. Because of the combined capabilities of both companies in data technologies, this partnership of more than two years has significant benefits for the enterprise. Click here to download the solution brief. Miss the webinar? Click here to watch the recording!
Modern Data Platform Playbook Series
CIOs and their chief enterprise and information architects are quickly realizing that the reference architectures and best practices of the past have served them well, but are now challenged to meet the business demands of today’s data intensive and analytical environments. Companies require economical scalability for big data, but also want high-performance; they require information discovery and flexibility, but want to govern semantics and enterprise consistency; they want to benefit from advanced analytics and unstructured data, but also want broad accessibility with SQL. Despite all the recent mega-hype of big data, business analytics, and data scientist headlines, there is real business value and competitive advantage to be realized in these data technologies and skills. Based on the same fundamental set of data management principles that created past reference architectures, the Modern Data Platform (MDP) from Radiant Advisors is a framework that envisions a new reference architecture able to meet today’s challenges. The MDP strategy incorporates accepted and emerging technologies that allow existing data warehousing (DW) and business intelligence (BI) environments to transform in an agile, iterative process of adopting, integrating, and growing a powerful data platform — one that companies can drive at their own pace and needs.
Dell Information Management
As one of the world’s largest software companies, Dell’s Information Management strategy focuses on the challenges facing IT organizations that result from an increasingly heterogeneous data environment. A highly-optimized or “best of breed” architecture, like MDP, doesn’t come without its trade-offs, such as managing a more complex environment and integrating the environment to be a singular platform for users, applications, and devices. By focusing on end-to-end solution domains and key partnerships, Dell’s portfolio offers solutions directly targeted at the ongoing struggle to balance IT standardization and control, while unlocking the business value of innovation and empowering users. Download
By aligning the strengths and unique differentiators of the Kognitio Analytical Platform with MDP framework and principles, enterprise and information architects can cultivate a strategy that enables big data and advanced analytics capabilities for the business in ways that are clear and planned within their roadmap. This Kognitio Playbook for Modern Data Platforms focuses on understanding the MDP, the Kognitio technology, and its role within a big data strategy to enable today’s companies to transform into tomorrow’s competitive data-driven organizations. Download
Visual Series #1: The Science of Data Visualization
The best data visualizations are designed to properly take advantage of “pre-attentive features” – visual properties hard-wired into our visual systems that help facilitate quick understanding. Thus, to create the most effective visuals, it’s important to understand the science behind visual cognition. In this first brief in a four-part series we take a high-level look at:
- An introduction to the science of data visualization
- Key cognitive ingredients to have a visual dialogue with data
- How to curate meaning in data through visual cues
Click here to download the paper.
Visual Series #2: The Building Blocks of Visual Design
Visual elements like lines, textures, shapes, colors, and typography help us organize information in a way that quickly facilitates meaning. But do we fully understand how to best use them as we design visualizations?
As we continue this four-part series originally published by the International Institute of Analytics (IIA), author Lindy Ryan teaches about these fundamental building blocks of visual discovery and how they work together to maximize the visual capacity of data visualization.
Click here to download the paper.
Visual Series #3: Designing for Experience
Part 3 of this visual design series moves beyond the premise of achieving balance in art and science, to understanding how to create a visual experience for learning complex information through the lens of data visualization.
Click here to download the paper.
Visual Series #4: Designing for Influence
This final edition of the visual series explores the importance of viewer perception and how this can be leveraged to influence the user by uniting an idea with emotion for effective data storytelling.
Click here to download the paper.
Big Data Total Cost of Ownership: Evaluating Hard Costs and Options
Big Data platforms are extending data infrastructures, enabling capacity and performance increases in ways that are becoming more and more economical and attainable for today’s data-driven companies. However, estimating the total cost of ownership (TCO) for Hadoop can be challenging, with many costs hidden or not well understood when Hadoop environments are initially built and deployed. In this Insight paper, we explore the total costs of ownership from a tangible costs perspective. We analyze and illuminate actual and hidden hard costs of implementation with a Hadoop environment by option (including self-managed clusters and service-based environments). This includes hardware and software costs, as well as the support staffing and skills involved. Click here to download the paper.
Big Data Total Cost of Ownership: Evaluating Soft Costs and Options
Big Data platforms are extending data infrastructures, enabling capacity and performance increases in ways that are becoming more and more economical and attainable for today’s data-driven companies. However, estimating the total cost of ownership (TCO) for Hadoop can be challenging, with many costs hidden or not well understood when Hadoop environments are initially built and deployed. In this Insight paper, we explore the total costs of ownership of Hadoop Big Data environments from a soft costs (or hidden) perspective. We analyze and discuss “soft costs” – those skills and resources necessary to build, run, and maintain a Hadoop environment, including the costs of opportunity and time to market; staffing Hadoop knowledge experience; and executing analytics. Click here to download the paper.
Self-Sufficient Data Discovery by Design
In today’s emerging discovery culture, business users demand more independence to acquire, analyze, and sustain new insights from their data. As discovery continues to reshape how we earn insights from our data, discovery tools must continue to balance user intuition and self-service capabilities with high-performance for sharable, actionable timeframe insights across the organization. This paper reviews the elements that enable discovery by design, and discusses how disruptive discovery tools allow companies to truly capitalize on the business process of discovery. Click here to download the paper.
From Self-Service to Self-Sufficiency: How Discovery is Driving the Business Shift
In the past several years, “self-service” has come to be understood as users having self-service access to information they need. Today, that definition is being redefined. Now, self-service is less about access and much more about ability: it’s a fundamental shift from being able to consume something that has been predefined and provided to be able to develop it – to discover it – yourself. With the advent of increasingly robust technologies, there is no shortage of self-service tools on the market today. And more important, these tools – BI and beyond — are good. In fact, they are more than good: these next-generation tools are the catalyst enabling business users to be increasingly more self-sufficient from IT in their data needs – if they choose. This paper identifies the key aspects empowering new, savvy business users with those BI/DW capabilities and roles traditionally reserved for IT via intuitive and powerful enabling tools, and how that shift will change IT forever. Download
All About Analytics
These days, few terms seem more meaningless than “analytics.” As a predicate, “analytics” gets applied to a confusing diversity of assets or resources – from banal operational reports to a machine analysis involving terabytes of information and thousands of predictive models. The confusion is regrettable, but understandable: the truth is that there’s simply a surfeit of analytic technologies, starting with bread-and-butter multidimensional assets – i.e., reports, dashboards, scorecards, and the like. Even in an era of so-called “big data analytics,” these assets aren’t going anywhere. Increasingly, they’re being buttressed by analytic insights from a host of other sources. Advanced practices such as analytic discovery and “investigative computing” – this last describes the methodological application of machine learning (also known as predictive analytics) at massive scale – involve different tools, different methods, and (to some extent, anyway) very different kinds of thinking. This begs a question: how do you meaningfully distinguish between analytic categories and technologies? How do you grow – or establish – a richly varied analytic practice? What must you change in your existing data warehouse environment to support or enable more sophisticated analytic practices? What can – and probably should – stay the same? Sponsored By Visit www.software.dell.com to learn more. Download