The integrated data platform

Earlier this year, I wrote an article entitled “Why an integrated platform is the key to real-time insights.”* In the article, I discussed how a fully integrated and optimized big data and analytics platform, leveraging both in-memory solutions and the Hadoop Distributed File System (Hadoop), was key to paving the way toward achieving deep, advanced analytics capabilities. This is a topic of great interest to today’s data-driven organizations that are seeking to increase forward-looking visibility to improve operations, increase profitability and, even, reduce cost.

Due to the interest in the article that I have seen on Twitter, LinkedIn, and other social media sites, I thought it prudent to write a follow-up piece that went into a bit more depth from a technical perspective, and showcase some specific architectural considerations that should be taken into account when heading down the path toward creating an integrated data platform.

Why have an integrated data platform?

Let’s first explore the underlying reasons why an organization would want, or need, such a platform. I’ve detailed, here, five key drivers:

1. The need to have both structured and unstructured data available and to have this data fully integrated for a 360 degree view of the customer or other specific business situation.

It’s no longer about the size of the data, but how companies are collecting and managing disparate data from multiple sources and using advanced data analytics to derive value from all data that drives this need.

2. The desire or requirement to reduce the cost and streamline the process of storing cold data (archival) and warm data (data not required for regular analytics) outside of the core systems (ERP, CRM, EDW, etc.), yet still have this data available for near real-time analytics.

Leveraging emerging technologies, such as Hadoop, to supplement and complement existing technology investments allows an organization to manage data in a way that reduces the overall cost associated with a traditional data warehouse. It lets you have the right data available at the right place and at the right time, while aggressively exploiting new analytical capabilities.

3. The need for actionable intelligence (the desire to leverage all available data to make key business decisions) and the ability to be nimble and alter course at a moment’s notice based on new insights.

Advanced analytics (analytics that provide for the forecasting of future events and behaviors, allowing businesses to conduct “what-if” analyses to predict the effects of potential changes in business strategies) can turn poor business decisions, made using haphazard guesswork, into well-thought-out and successful business decisions that improve operational efficiencies, drive revenue and provide a competitive advantage in the market.**

4. The opportunity for driving significant bottom-line improvements through the automation of routine decisions.

Automated decision processing can optimize a multitude of business processes and is manifested through a diverse set of applications that span everything from recommendation engines that enable cross-sell and up-sell opportunities, to ad platforms that enable microtargeting of promotions, to predictive asset maintenance systems that minimize disruption and downtime.

5. The imperative to have a fully scalable, business-focused solution that gives the organization the flexibility required to hold exponentially increasing amounts of new, mostly ad hoc, unstructured data sources while providing the flexibility to adapt to the complex, ever-changing business and data environment.

A properly designed advanced analytics solution squeezes new insights out of a business’s current untapped “exhaust” of existing data, while serving as a launchpad for leveraging the vast potential of dark data*** and new sources of data yet to be created.

What does a truly integrated platform look like?

Now that we have addressed the “why,” let’s turn our attention to the “how” of an integrated platform. First, allow me to describe my view of what having a true integrated platform really means.

A fully integrated data platform (specific to big data and analytics) is one where data from legacy systems (such as ERP, CRM, EDW, HR and payroll.) is enriched in value when connected to, or integrated with, new sources of data of all kinds (structured, unstructured, internal and external) and brought together using emerging technology solutions (such as Apache Hadoop and in-memory data engines) in such a way that enterprise-wide advanced analytics is possible.

Failure to create an integrated and optimized data landscape can leave the organization unable to drive truly actionable value from its legacy technology investments and recent technology acquisitions. It really is a story of data integration and optimization. Integrating a Hadoop framework into an organization’s infrastructure is no different from integrating other technologies to create business value. There needs to be a well-considered approach to what questions Hadoop will answer, how ready the organization is for Hadoop, including how the data will be managed, and how Hadoop fits into the overall technology landscape within the organization. Failure to create such an approach to implementing an integrated platform leaves the business with just another data silo, which is difficult to manage and (nearly) incapable of providing true analytics value.

Gone (or should be) are the days of the “one-and-done” pilots and “throwaway” proofs of concept that show some initial promise but never materialize into sustainable value for the organization. If companies are going to make the shift successfully to enterprise-wide, longer-term, more strategic projects, they need to consider how best to integrate their existing and (soon to be acquired) solutions and tools to enable a broader data management vision. Only then can they capitalize on the immense value that data can provide to the business.

Investing in a business data vision

The true value of a fully integrated data platform can only be realized when both the business and IT teams invest in creating a business data vision that guides the creation of this platform. The adage that data is a company’s most valuable asset has always been true, but given the explosion in both the availability of data for business and new technologies to turn that data into insights, a carefully thought out and aligned business data vision must be put in place before embarking on any attempt to create an integrated data platform.

This business data vision must effectively address all aspects of “business as usual” that threaten to hobble the disruptive business capability, from department data silos to standardization of KPIs and business metrics to data security and vulnerability. A properly prepared business data vision, backed up with leadership and support from the C-suites, creates the framework for an effective data management road map and enabling technology infrastructure that helps ensure that the projects in the road map are successful.

Much as a building needs a properly designed foundation for support and strength, a well-designed business data vision serves as the foundation upon which to build all components of a successful integrated data platform. Once the vision is in place, the platform’s components can be envisioned, designed and developed.

What are the common components of successful platforms?

In addition to supporting big data technology for business, enterprise architects must plan for platform technologies that promote data sharing, technology efficiency, security and privacy. This should include an in-memory data platform, data virtualization, Hadoop, big data integration, and a semantic layer to maintain the context and business language of data.****

While the particular business needs will dictate the components included in a data platform, each component of a successful integrated data platform serves a specific purpose supported by one or more business and technical imperatives. Common platform components that are present in any successful integrated data platform include:

  • Storage for hot, warm and cold data as defined by the business data vision
  • A data ingestion engine that brings data sources into the platform
  • A data integration engine that links the data together, and supports both batch processing and persistent data storage as well as real time processing and data streaming
  • An analytics accelerator that uses new platform technologies to turn data into information, and information into insights
  • An information delivery platform that delivers this information into the hands of decision-makers or serves as the source for insights for downstream systems
  • An advanced analytics system to leverage the latest breakthrough technologies in the areas of predictive, cognitive and adaptive insights as a catalyst for disruptive business decision-making

Achieving success via a blended approach

While there is a large variety of tools and software solutions in the marketplace, my experience in managing many large implementations has shown that using an integrated approach, with a well-laid-out data management strategy, can help to ensure the success of the implementation and provide a better long-term impact for the business (increased operational efficiency, improved profitability, reduced cost and a competitive edge in the marketplace).

Hadoop can store massive quantities of data, and helps an enterprise put huge data sets to work, while trimming storage costs. However, it can also prove somewhat cumbersome to administer, and requires a fairly unique set of programming skills. Hadoop solutions from vendors such as Hortonworks (NASDAQ: HDP) are seeing great progress in overcoming these limitations, making Hadoop components a compelling part of the integrated data platform. On the other hand, in-memory solutions, such as SAP HANA, run indexing and other tasks in physical memory, thus speeding processing, yet can increase cost.

This integrated big data and analytics platform must also connect to legacy business applications such as ERP, CRM and HR and home in on industry-specific needs and requirements to be successful. The answer lies in a blended approach that strategically places the right tool and technology at the right place to meet business requirements. When combined and integrated across the entire organizational data platform, true, value-added, integrated, end-to-end business intelligence is achieved.

* S. Schlesinger, “Why an integrated platform is the key to real-time insights,” 2014, http://performance.ey.com/wp-content/uploads/downloads/2014/12/EY-Performance-Real-time-business.pdf, accessed September 2015.
** S. Schlesinger, “Why an integrated platform is the key to real-time insights,” 2014, http://performance.ey.com/wp-content/uploads/downloads/2014/12/EY-Performance-Real-time-business.pdf, accessed September 2015.
*** Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes,” http://www.gartner.com/it-glossary/dark-data, accessed September 2015.
**** Q&A: Forrester’s Top Five Questions About Big Data, Forrester Research, 2015.

The article was written by:

  • Scott H. Schlesinger
    Principal, IT Advisory, EY, US

With:

  • Ed Patterson
    Senior Manager, IT Advisory, EY, US
  • Rich Sarcomo
    Senior Manager, IT Advisory, EY, US
EY refers to one or more of the member firms of Ernst & Young Global Limited (EYG), a UK private company limited by guarantee. EYG is the principal governance entity of the global EY organization and does not provide any service to clients. Services are provided by EYG member firms. Each of EYG and its member firms is a separate legal entity and has no liability for another such entity's acts or omissions. Certain content on this site may have been prepared by one or more EYG member firms