Big Data Open Source Projects vs Amazon Web Services (AWS)

All data engineers are looking for the latest trends regarding the Vs (such as Volume, Variety, Velocity and others) of Big Data. Most approaches lead to higher ingested Volume of enterprise data, an increased Variety of enterprise or cloud source systems, with constant Velocity demand. Data engineers and data analysts are looking to migrate data warehousing to the Cloud to increase performance and lower costs.

Big Data architects are looking for new intelligent solutions to govern the data swamp and in the end, to create robust security models to protect data and manage “data lakes” with full respect for compliancy. The rules of the game always push the return of investment to the limit, so most of the time organizations need to find a balanced technical solution between open source technologies and proprietary/commercials ones.

From an engineering point of view, a big data discussion always starts with the cluster type. At the beginning most of the clusters were built “on premise”, but evolution led to the “public cloud” and nowadays we get full benefit of “hybrid” ones.

Anything related to provisioning, dynamic commissioning and decommissioning is already offered as a service – Infrastructure (IaaS) or Platform (PaaS) as Service. Amazon EC2 (Elastic Compute Cloud) provides a full custom integrated scalable environment but also leaves space for open source platforms like Cloudera, Hortonworks and MapR.

From a financial point of view, the hotspot instances combined with Amazon EMR (Elastic MapReduce) services definitely raise the bar in terms of capacity planning. IaaS & PaaS are already mature enough to offer solid support to embrace the transition from capital expense to operational costs. In the same context there are few big questions: “Which solution is the most cost effective? In terms of licensing costs or support costs?” I would respond that it’s a combination of them.

From a strategic point of view any full integration with a big data platform strongly affects “independence of work”. Amazon’s Big Data Platform has definitely heavily integrated open source technologies, an area where it is a big contributor, but it also offers innovative services. Some examples are related to data persistence services like S3, Glacier, EBS and Hadoop’s HDFS.

In addition to this brief blog introduction, I am also providing a video presentation from Ness TechDays virtual conference that consists of 2 distinct parts:

  • The 1st part reviews where enterprise systems end and big data solutions begin.
  • The 2nd part is a comprehensive comparison between Apache open source projects and Amazon AWS. This snapshot of the current valuable technologies in the Big Data ecosystem is meant to shorten the time needed for architectural decisions.

The comparative approach covers architectural aspects, such as cost model, performance, availability, scalability and elasticity for analytics and data warehousing, outlining available AWS services and open source alternatives.

The final goal of the presentation is to offer a reference for a typical transition of a software solution from “on premise infrastructure” to “hybrid cloud infrastructure.”  View the full “Big Data Open Source Projects vs Amazon Web Services” presentation here.

Related Insights

6 top trends that will impact data management in 2020

Across industries, we saw a rapid evolution and implementation of artificial intelligence-driven technologies,...

AI and ML Trends to Watch in 2020

The year 2019 will be remembered in the software world as the year when containerization, cloud native...

The Next Evolution in Big Data & Analytics

Data is everywhere, and there’s a lot of it. How can we best leverage it? In an article for Toolbox,...

Contact us to learn more

Ness likes to work collaboratively with its customers, so please ask us questions if you would like to learn more about our services. We look forward to answering them.
Browser warning
We noticed that you are using Internet Explorer. Not all of our website’s functions will work on Internet Explorer. For a better experience, we recommend that you visit us on a different browser. Click here for a list of the latest versions.