
OLAP (Online Analytical Processing) is the main concept behind many Business Intelligence (BI) tools used for data discovery, reporting, what-if analysis, budgeting, forecasting, planning etc.
Many organizations are trying to find ways to offload some of the high volume, complex processing-intensive Enterprise Data warehouse (EDW) data marts and business logic to the Hadoop platform. Business community attempting to derive benefits from the Hadoop platform for doing big data analytics discovered a number of challenges. This is due to the fact that the entire business community finds it difficult to change the tools they are currently using without disturbing their day-to-day activities. They should be able to continually use the current business intelligence platform and work on their reports etc. While many BI vendors provide the capability to connect their tools to Hadoop, the business community remains uninterested due to data querying performance and the knowledge/skills required to access this data from Hadoop directly.
This resulted in many companies moving the data back into their EDW data mart model and conducting analysis there, which put additional stress on the EDW, causing delays and limitations on the amount of data movement and storage costs. There are a lot of initiatives and solutions emerging that address this challenge of making the Hadoop platform more business-friendly, and more complementary to existing EDW/BI platforms. The goal is to leverage the Hadoop platform more and more on high data volumes and not dump everything into the EDW platform.
No one doubts the good that comes from adopting the Hadoop platform at the enterprise level. But enterprises will see value only if data can be easily accessed and democratized for the user community so they can derive value out of this platform.
One solution is to bring the data analytics capabilities to Hadoop instead of moving data into the EDW, by provisioning an OLAP layer implementation directly on Big Data. By doing this the existing BI stack can be connected to the OLAP layer, thus significantly improving the ease of use and the performance. Another benefit of an OLAP-based offerings for Hadoop is to enable an organization to use its existing BI stack and save the investment made on such BI tools.
There are companies that have emerged which build their OLAP products between Hadoop and BI tools. Both BI companies and OLAP companies are actively partnering, and this development is gaining more and more maturity and mutual benefits in exploring Big Data opportunities. This accelerates the acceptance of Hadoop as an enterprise infrastructure and mitigates some of the challenges posed in the EDW platform. This OLAP layer allows the business users to build workflows via drag and drop of data elements with no programming required.
On the commercial front, companies such as AtScale and Kyvos focus on MOLAP, ROLAP etc. In the OpenSource world, Apache Kylin is an Open Source distributed analytics engine that provides MOLAP on Hadoop and supports seamless integration with BI tools. Another option is Druid, which is an open source data store designed for OLAP queries on event data. These OLAP products provide seamless integration with various BI tools as well as with Microsoft Excel.
Apache Kylin Architecture
Source – http://kylin.apache.org/
Highlights of some of the OLAP features on Hadoop include:
- Provides simplicity of data in measures and dimensions
- Ability to analyze large amount of data directly in Hadoop clusters
- Incremental refresh of Cubes
Hadoop is accepted as an enterprise platform, and this platform is used for storing a variety of data — and at the same time there are challenges to use Hadoop for interactive analysis. The business community needs to pay attention to OLAP Cubes on Hadoop and understand this layer in an overall Big Data landscape and the benefits this can bring to organizations.