Microsoft Fabric is a SaaS platform built to remove friction. One lake (OneLake), one capacity model, every workload integrated. You can stand up a workspace, land some data and ship a Power BI report in an afternoon. That ease is exactly why teams get into trouble: the platform lets you defer architecture decisions that are cheap to make on day one and painful to unwind on day ninety.
This article walks the decisions that matter, deliberately ordered from hardest to reverse to mostly an operational concern. If you only have an hour to think about your Fabric architecture, spend it at the top of this list.
"Land data once in OneLake, pick the engine per workload, serve BI via Direct Lake, and size capacity to your smoothed load — not your peak."
1. Lakehouse vs Warehouse for your Gold layer
This is the first fork in the road and the hardest to change later. Both a Lakehouse and a Warehouse store data as Delta tables in OneLake, and both expose a T-SQL endpoint — so on the surface they look interchangeable. They are not. They target different engines, users and write patterns.
| Dimension | Lakehouse | Warehouse |
|---|---|---|
| Primary engine | Spark (PySpark, Scala, SQL) + read-only SQL endpoint | T-SQL (MPP engine), full read/write |
| Best for | Data engineering, ETL/ELT, ML, streaming, unstructured data | BI, governed star schemas, T-SQL teams |
| Writes via SQL | SQL endpoint is read-only | Full INSERT / UPDATE / DELETE / MERGE |
| Transactions | Delta ACID, per table | Multi-statement, multi-table T-SQL |
| Data types | Structured + semi/unstructured | Structured, relational |
The trap is treating this as one decision for the whole estate. It isn't — it's a decision per workload. Because everything lives in the same OneLake, the overwhelmingly common pattern is:
- Lakehouse for raw-to-refined engineering — your bronze and silver layers, where Spark does the heavy lifting.
- Lakehouse or Warehouse for the curated gold layer, decided by whether the consuming team is SQL-native and needs full DML and multi-table transactions.
Lean Warehouse when your team is T-SQL-first, you serve governed star schemas, and you need strict schema, relational constraints and high-concurrency reporting — finance, HR and sales models are the classic fit. Lean Lakehouse when you ingest semi-structured or unstructured data, your team is Python/Spark-oriented, or you're doing data science. If any source is streaming or time-series, a third option enters: the Eventhouse (KQL), which is the right tool for true real-time and the wrong tool for governed BI.
2. Medallion layer separation
The medallion pattern — bronze (raw, as-ingested), silver (cleaned, conformed, typed), gold (business-ready entities) — is well understood. The decision that gets skipped is ownership: does each layer get its own Lakehouse, or do you separate layers with folders inside one Lakehouse?
Separate Lakehouses per layer buy you three things that are expensive to retrofit:
- Independent access control — engineers can own bronze/silver without exposing it to report consumers.
- Cleaner lineage — it's obvious which tables are raw and which are certified.
- The ability to reprocess silver without touching gold's consumers.
A single Lakehouse with folder separation is simpler and perfectly fine for a small team. It couples everything together, though, and that coupling hurts at scale. This is decision number two on the list precisely because it shapes team autonomy and access control — and untangling a single shared Lakehouse after a year of growth is a genuine project, not a refactor.
3. Direct Lake vs Import vs DirectQuery
How Power BI reads your data sets the ceiling on both performance and freshness, and it's wired deep into your BI architecture.
| Mode | Behaviour | Choose when |
|---|---|---|
| Direct Lake | Reads Delta/Parquet directly from OneLake, near real-time, import-like speed | Default for Lakehouse-backed models |
| Import | Snapshot in memory, fastest queries, stale until refresh | Complex DAX Direct Lake can't handle |
| DirectQuery | Always live, every visual hits the engine | Warehouse + sub-minute freshness |
Direct Lake is the headline feature: import-mode performance with no import, no copy and no scheduled refresh. A "refresh" just re-points the model at the newest Delta files in seconds. But its speed is a function of table design, not refresh settings — and under certain conditions (unsupported features, very large frames) it silently falls back to DirectQuery, which is much slower. Designing your gold tables to avoid that fallback is the work. Which brings us to the levers that make Direct Lake actually fast:
- Keep files large and few. The "small files problem" is the number-one performance killer. Run
OPTIMIZEto compact many small Parquet files into large row groups. - Partition carefully. Only on low-cardinality columns (rule of thumb: under 100–200 distinct values). Over-partitioning recreates the small-files problem.
- Keep V-Order on for read-heavy gold tables. It's a write-time optimisation that buys up to ~50% more compression and faster scans for the engines behind Power BI — at a small write cost, and it stays 100% open Parquet.
-- Compact small files and cluster for data-skipping
OPTIMIZE gold.sales; -- Fabric V-Orders by default
OPTIMIZE gold.sales ZORDER BY (date_key);
VACUUM gold.sales RETAIN 168 HOURS; -- drop old unreferenced files
4. Where Row-Level Security lives
This one is on the list for a different reason than the others. Get it wrong and you don't have a bug — you have a breach.
The mistake is assuming that RLS defined in the Power BI semantic model protects the underlying data. It does not. Semantic-model RLS is enforced by Power BI and bypassed entirely by Spark and by direct SQL-endpoint access. If a single analyst can connect a notebook or SSMS to the Lakehouse, model-level RLS is decorative.
- RLS in the semantic model — acceptable only if every consumer, without exception, goes through Power BI.
- RLS in the Lakehouse / Warehouse — enforced at the data layer, so every tool respects it. This is the safe default whenever direct access is possible.
- Dynamic RLS via gold table design — generate per-role Delta tables in Spark and mount them as separate datasets when the security model is complex.
Security boundaries are the cheapest thing to design up front and the most expensive thing to discover you got wrong. If you're unsure whether your Fabric access model holds, that's exactly the kind of thing our Migration Audit is built to catch.
5. CI/CD and the connection problem
Source control and deployment feel like a "later" problem. The connection part of it isn't — it bites at the first promotion from Dev to Test.
- Branching model — workspace-per-environment (Dev/Test/Prod) is the durable choice; feature branches in a single workspace are simpler but get messy fast.
- What goes in git — notebooks, pipelines and semantic model definitions, yes. Delta data, no.
- Connections don't travel. Connection GUIDs are not part of item definitions, so they need environment-specific mapping at deploy time. This is the detail that breaks naïve "just deploy the workspace" attempts.
- Automate with a service principal, not a person's account — pipelines tied to an individual break the moment that person changes roles.
6. Capacity sizing and workload isolation
Fabric runs on F SKUs (F2 through F2048), and by default every workload draws from the same pool of Capacity Units. That sharing is the most common cause of mysterious production slowdowns: a heavy Spark job during business hours quietly starves your Power BI refreshes.
The mechanics that make sizing counter-intuitive — and forgiving — are smoothing and bursting:
- Fabric smooths the accounting for consumed CUs across a window — minutes for interactive operations, up to 24 hours for background jobs — so a brief spike doesn't force you onto a bigger SKU.
- Bursting lets a single operation temporarily exceed the SKU baseline to finish faster, then repays it against the quieter minutes that follow.
- The consequence: size to your smoothed average, not your peak. A modest SKU absorbs spiky workloads — but sustained overuse beyond what smoothing can repay triggers throttling: interactive delay, then interactive rejection, then background rejection.
Two practical levers: use workload-management rules to reserve capacity per workload type (so Spark can't starve BI), and decide whether dev and prod genuinely need separate capacities or whether isolation on one is enough. Pause non-production capacities outside business hours — that alone is often the biggest line-item saving. And watch it all in the Fabric Capacity Metrics app, whose 14-day Compute view exposes your heaviest operations and any throttling events before users feel them.
Second-order decisions worth flagging early
These don't reshape the whole architecture, but each one is cheaper to decide now than to retrofit:
- Shortcut vs full copy. A OneLake shortcut is a zero-copy pointer to data that lives elsewhere — no egress, always fresh, but you don't own the lifecycle. Rule of thumb: external team owns the source → shortcut; your domain owns it → ingest.
- Compute tool per job. Spark for complex transforms, Dataflow Gen2 for Power Query–style citizen development, Pipelines for orchestration. The trap is using Dataflow Gen2 for heavy transforms (it strains above ~1 GB) or reaching for Spark to filter a few rows.
- Where business logic lives. Pushing logic down into the gold layer makes it reusable across tools and testable; encoding it in DAX locks it to Power BI. For a deliverable someone else will maintain, lean gold-ward.
- Real-time, even if the answer is "not yet." Bolting streaming on later means re-architecting ingestion. If any source is event- or IoT-based, decide between an Eventstream → Eventhouse path and micro-batching into Delta at design time.
The decision checklist
If you take one table away from this, make it this one — the questions, and which way to lean:
| Question | Lean toward |
|---|---|
| T-SQL DML and multi-table transactions needed? | Warehouse |
| Spark / Python / ML / unstructured data? | Lakehouse |
| Raw + curated in one place (medallion)? | Lakehouse bronze/silver + WH or LH gold |
| Real-time BI without refresh management? | Direct Lake on a well-designed model |
| Any consumer outside Power BI? | RLS at the data layer |
| Broad free-user BI distribution? | Size capacity at F64+ |
| Steady 24/7 production load? | Reserved F SKU (~41% saving) |
| Spiky / dev / experimental load? | Pay-as-you-go, pause when idle |
| Throttling or latency complaints? | Check Capacity Metrics; resize or isolate |
One licensing note that quietly drives the capacity decision: report consumers need Power BI Pro licenses unless the capacity is F64 or larger, at which point free users can consume content. For broad BI distribution, that single threshold often justifies sizing up to F64.
None of these decisions are reversible for free once data, reports and users are sitting on top of them — which is the whole argument for spending an afternoon on them before you spend a quarter undoing them. If you're standing up a Fabric environment, or inheriting one that's already drifting, we offer a free 30-minute scoping call. No pitch — we'll tell you honestly what we'd change in your position.