What is a lakehouse, exactly?

The term gets used loosely. At its core, a lakehouse is a data architecture that combines the low-cost, flexible storage of a data lake with the structure, performance and governance features of a data warehouse — in a single system.

Traditional architectures forced a trade-off. Data lakes were cheap and flexible but messy — no schema, no transactions, hard to query reliably. Data warehouses were structured and fast but expensive, rigid, and required moving data out of the lake just to run analytics on it.

The lakehouse collapses that into one layer: open-format storage (Delta Lake or Iceberg), with ACID transactions, schema enforcement and optimised query performance built on top.

Lakehouse vs data warehouse vs data lake

The quickest way to understand a lakehouse is by what came before it. Each earlier pattern solved one problem and created another — the lakehouse keeps the strengths and drops the trade-off:

Data lakeData warehouseLakehouse
Storage costLowHighLow
Structure & schemaNoneStrictEnforced, flexible
ACID transactionsNoYesYes
BI performancePoorExcellentWarehouse-grade
Data typesAny, incl. unstructuredStructured onlyAny
Storage formatOpenOften proprietaryOpen (Delta / Iceberg)
Copies of dataOne, hard to queryCopied out of the lakeOne — analysed in place

The lakehouse keeps the lake's cheap, open, any-format storage and adds the warehouse's transactions, schema and speed on top — so a single copy of data serves both the data engineers and the BI team. For the deeper trade-offs once you're inside Microsoft Fabric, see our guide to Fabric architecture decisions.

How it works in Microsoft Fabric

In Microsoft Fabric, the lakehouse is built on OneLake — a single, tenant-wide storage layer that underpins everything. You don't move data between Fabric services; they all read from the same store.

  • Data lands in OneLake in Delta Lake format
  • Power BI connects via Direct Lake mode — no import, no copy, no scheduled refresh
  • PySpark, SQL, and Data Wrangler all work on the same tables
  • Microsoft Purview handles lineage and governance across the whole workspace

The practical effect: a BI developer and a data engineer can work in the same workspace, on the same data, without any integration layer between them.

"The lakehouse isn't a new product — it's the removal of an unnecessary boundary between where you store data and where you analyse it."

The open format underneath: Delta and Parquet

What makes a lakehouse a lakehouse — rather than a warehouse with cheaper storage — is the open table format. In Microsoft Fabric that's Delta Lake, sitting on top of Apache Parquet files in OneLake.

  • Parquet is the columnar file format the data physically lives in — compressed, efficient to scan, and readable by virtually every analytics engine.
  • Delta Lake adds a transaction log over those Parquet files, giving you ACID transactions, time travel, schema enforcement and the reliability a warehouse expects.
  • Because the format is open, your data isn't locked in — Databricks, Spark, Snowflake (via mirroring) and external tools all read the same files, with no proprietary export.

This is also where lakehouse analytics performance is won or lost: well-maintained Delta tables — compacted files, sensible partitioning, and V-Order in Fabric — are what make querying the lake as fast as a warehouse.

When it makes sense for mid-market companies

Not every company needs a lakehouse. If you have a single data source and 10 reports, a well-structured Power BI Premium model is probably enough. The lakehouse pattern earns its complexity when:

  • You have multiple source systems that need to be combined (ERP + CRM + flat files)
  • Different teams need different views of the same underlying data
  • You have both operational and analytical queries hitting the same data
  • Data volumes are growing and warehouse storage costs are becoming a line item
  • You need auditability — who changed what, when

Common mistakes in lakehouse implementations

Most failed lakehouses fail for the same reasons.

1. No medallion architecture

Raw data lands in the same layer as certified, cleaned data. Within 6 months, no one knows which tables are safe to use in a report. The fix is a bronze/silver/gold structure from day one — raw ingestion, cleaned and conformed, and business-ready certified data in separate layers.

2. Too much data, too few consumers

Teams ingest everything "because we might need it later." Storage is cheap but maintenance isn't. Start with the data that answers a known business question.

3. No semantic model

Reports are built directly on raw tables. Every developer writes their own version of "revenue." The semantic model is where you define metrics once, certify them, and prevent the organisation from having 12 different answers to the same question.

Thinking about moving to Microsoft Fabric? We run a 5-day Migration Audit that maps your current stack, identifies readiness gaps and produces a written roadmap. Get in touch to scope it.

Getting started without over-engineering it

The pragmatic approach for a mid-market company with limited data engineering resource:

  1. Pick one business domain and one source system. Don't boil the ocean.
  2. Set up a Fabric workspace with a simple three-layer lakehouse (bronze, silver, gold).
  3. Build one Dataflow Gen2 pipeline to land raw data in bronze.
  4. Write a simple notebook to clean and conform data into silver.
  5. Build a semantic model on gold and connect Power BI to it via Direct Lake.
  6. Ship one report. Get feedback. Expand from there.

The goal of the first implementation isn't to build a complete platform — it's to prove the pattern works for your organisation, with your data, with your team.

Frequently asked questions

What is lakehouse architecture?
A lakehouse is a data architecture that combines the low-cost, flexible storage of a data lake with the structure, performance and governance of a data warehouse — in a single system. Data is stored once in an open format such as Delta Lake or Apache Iceberg, with ACID transactions, schema enforcement and optimised query performance built on top, so you analyse data where it lands rather than copying it into a separate warehouse.
What is the difference between a lakehouse, a data warehouse and a data lake?
A data lake is cheap and flexible but unstructured — no schema, no transactions, hard to query reliably. A data warehouse is structured and fast but expensive and rigid, and requires moving data out of the lake. A lakehouse combines both: open-format lake storage with warehouse-grade transactions, schema and performance on top, so one copy of data serves both data engineering and BI.
What is lakehouse analytics?
Lakehouse analytics means running BI and analytical workloads directly on lakehouse storage rather than on a separate warehouse. In Microsoft Fabric, Power BI reads Delta tables straight from OneLake via Direct Lake — near real-time performance with no import, copy or scheduled refresh — so reporting happens on the same governed copy of data the engineers curate.
How does a lakehouse work in Microsoft Fabric?
In Fabric the lakehouse is built on OneLake, a single tenant-wide storage layer. Data lands once in OneLake as Delta Lake tables; Spark, SQL and Power BI all read the same tables; Power BI connects via Direct Lake with no copy; and Microsoft Purview handles lineage and governance across the workspace — removing the boundary between where data is stored and where it is analysed.
When should you use a lakehouse architecture?
A lakehouse earns its complexity when you have multiple source systems to combine, different teams needing different views of the same data, both operational and analytical queries on the same data, growing data volumes, or a need for auditability. For a single source and a handful of reports, a simpler Power BI model is usually enough.

If you're evaluating whether a lakehouse makes sense for your stack, we offer a free 30-minute scoping call. No pitch — we'll tell you honestly what we'd do in your position.