Data & Analytics
Spark
April 29, 2020

What Do You Get When You Cross A Data Warehouse With A Data Lake?

by Kent Teague, Managing Consultant – Altis Melbourne

Did you guess Data LakeHouse? And, no I’m not talking about the latest episode of Grand Designs. Although like Kevin McCloud in his quest to follow design projects from laying the foundation through to the building of dream homes. We’ll be looking at the foundation of the latest Data Platform Architecture Paradigm the Data LakeHouse.

But before we jump right in and talk about the Data LakeHouse. Let’s revisit the definition of the Data Lake as coined by James Dixon, the founder and CTO of Pentaho:

“If you think of a DataMart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

So, what is a Data LakeHouse? The Data LakeHouse is essentially a hybrid concept that offers the key features of both a Data Lake and a Data Warehouse. Effectively serving as the middle-ground between your Data Warehouse and Data Lake by combining the structured, standardised, and connected data entities found in a traditional Data Warehouse with the low-cost/flexible storage of data in a Data Lake.

In other words, as described by Databricks “They are what you would get if you had to redesign data warehouses in the modern world, now that cheap and highly reliable storage (in the form of object stores) are available”.

In addition, the design of the Data LakeHouse serves to address the criticism that a Data Warehouse requires a large amount of upfront effort to cleanse, standardise, and build relationships between entities, whereas the Data Lake require too little effort in these areas.

So, what are some of the real-world benefits a Data LakeHouse implementation offers? Databricks states that Data LakeHouse offers the following as key benefits:

ACID Transaction support
Schema enforcement and governance
Using BI tools directly over source data
Storage is decoupled from compute
Open and standardised storage formats
Support for diverse data types ranging from unstructured to structured data
Support for diverse workloads
End-to-end streaming and real-time reporting

For those interested in reading further about the Data LakeHouse, I recommend starting with the following:

To discuss this topic further or for an initial review of an existing or planned data platform get in touch with us.

Hyperscaler native vs best of breed services – what should you choose for your cloud data & analytics solution?

Blog Posts

Enhanced dbt Data Quality Observability at Speed

Blog Posts

Why you may need a dedicated stream to tackle source system onboarding as part of your data transformation program

Blog Posts

Modernising your legacy time-series solution using Cloud Data ecosystems

Connect with us

If you’d like to be kept in the loop on courses, events and other related topics, simply complete your details and we’ll add you to our list.

Consulting Services

Frameworks

Training

What Do You Get When You Cross A Data Warehouse With A Data Lake?

Share

Leave a Reply Cancel reply

Recent

Hyperscaler native vs best of breed services – what should you choose for your cloud data & analytics solution?

Enhanced dbt Data Quality Observability at Speed

Why you may need a dedicated stream to tackle source system onboarding as part of your data transformation program

Modernising your legacy time-series solution using Cloud Data ecosystems

Connect with us

We are Altis

Quick Links

Contact us

Connect with us