Medallion Architecture: The Bronze, Silver, Gold Pattern That Actually Works

If you've spent any time in the modern data ecosystem over the past few years, you've likely encountered the medallion architecture. Originally popularized by Databricks but now widely adopted across the industry, this Bronze-Silver-Gold pattern has become the default way data teams structure their data lakes and lakehouses.

But here's the thing: while the concept is simple on the surface, the implementation details matter enormously. I've seen teams struggle with over-engineered medallion architectures that create more problems than they solve, and I've seen teams use it to build remarkably clean, performant data platforms. The difference comes down to understanding not just what medallion architecture is, but why it exists and how to apply it thoughtfully.

What Is Medallion Architecture?

At its core, medallion architecture is a data design pattern that organizes data in a lakehouse or data lake into three progressive layers of data quality:

Bronze Layer: Raw data ingestion and persistence
Silver Layer: Cleaned, validated, and enriched data
Gold Layer: Business-level aggregates and curated datasets

Think of it as a refinement process, much like ore processing in metallurgy (hence the metal-based naming). Each layer adds value by improving data quality, structure, and usability while maintaining a clear lineage from source to consumption.

The Bronze Layer: Your Source of Truth

The Bronze layer is where raw data lands in your lakehouse, unchanged from its source format. This is your historical archive and recovery point.

What Goes in Bronze

Bronze should contain:

Exact copies of source data with minimal or no transformations
Original data types and schema from the source system
Ingestion metadata (timestamps, source identifiers, batch IDs)
Both full snapshots and incremental updates, depending on your ingestion pattern

The key principle here is immutability. Once data lands in Bronze, it shouldn't be modified. If you need to re-ingest, you append new data with updated ingestion timestamps. This gives you the ability to replay your entire pipeline from any point in time—a capability that will save you when (not if) something goes wrong downstream.

Common Bronze Patterns

In practice, Bronze typically uses formats optimized for write performance and schema flexibility:

bronze/
  source_system=salesforce/
    table=accounts/
      date=2024-01-15/
        batch_001.parquet
  source_system=postgres_prod/
    table=orders/
      date=2024-01-15/
        batch_001.parquet

I'm opinionated here: use Parquet or Delta Lake format even in Bronze. Yes, you could use JSON or CSV to preserve the "raw" nature, but you're just creating unnecessary pain. Modern columnar formats handle schema evolution beautifully and give you much better performance when you inevitably need to reprocess Bronze data.

The Silver Layer: Where the Real Work Happens

Silver is where your data engineering discipline shows. This layer transforms Bronze data into cleaned, conformed, and enriched datasets ready for analytics.

What Happens in Silver

Typical Silver transformations include:

Data quality enforcement: Removing duplicates, handling nulls, validating constraints
Schema standardization: Converting to consistent data types, renaming columns to standards
Deduplication: Implementing SCD (Slowly Changing Dimensions) patterns where appropriate
Enrichment: Adding derived columns, parsing nested fields, joining reference data
Filtering: Removing soft-deleted records or test data

Here's a critical distinction: Silver should still be relatively granular. You're cleaning and structuring, not heavily aggregating. If your Bronze has individual order line items, your Silver should too—just with better data quality.

Silver Design Decisions

One question teams always face: should Silver be normalized or denormalized? My take: it depends on your consumption patterns, but lean toward modest denormalization in Silver.

Create domain-aligned tables that make sense for your business (customers, orders, products) rather than trying to perfectly normalize or create one giant wide table. You want Silver to be usable by multiple downstream use cases, not optimized for just one.

-- Example Silver transformation
CREATE OR REPLACE TABLE silver.orders AS
SELECT 
  order_id,
  customer_id,
  CAST(order_date AS DATE) as order_date,
  CAST(order_total AS DECIMAL(10,2)) as order_total,
  UPPER(TRIM(status)) as status,
  -- Add data quality metadata
  CURRENT_TIMESTAMP() as processed_at,
  'silver_v2' as pipeline_version
FROM bronze.raw_orders
WHERE order_id IS NOT NULL  -- Basic quality filter
  AND status != 'TEST'       -- Remove test data
QUALIFY ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY ingestion_time DESC) = 1  -- Dedupe

The Gold Layer: Business-Ready Analytics

Gold is your presentation layer. This is where you create the business-level aggregates, metrics, and curated datasets that directly power dashboards, reports, and ML models.

What Lives in Gold

Gold tables are purpose-built for consumption:

Aggregated metrics: Daily revenue by region, monthly active users, etc.
Wide denormalized tables: Everything a dashboard needs in one place
Feature stores: ML-ready feature sets with proper temporal alignment
Business entities: Customer 360 views, product hierarchies with all relevant attributes

The key principle: optimize for query performance and user experience. Gold tables should be fast and intuitive. If a business analyst needs to join five tables to answer a basic question, you haven't done your job.

Gold Organization

I strongly recommend organizing Gold by business domain or use case:

gold/
  finance/
    revenue_daily
    expenses_monthly
    budget_vs_actual
  marketing/
    campaign_performance
    customer_acquisition_cost
    funnel_metrics
  product/
    feature_usage_weekly
    user_retention_cohorts

This makes it immediately clear to consumers what data is available and who owns it.

Common Pitfalls and How to Avoid Them

Over-Engineering Bronze

Don't try to fix data quality issues in Bronze. It's tempting to "just clean this one field" during ingestion, but resist. Bronze should be a simple, reliable landing zone. Save the complexity for Silver.

Creating Too Many Silver Variants

I've seen Silver layers with tables like orders_silver_v1, orders_silver_cleaned, orders_silver_final. This is a code smell. You should have one authoritative Silver version of each entity. Use proper versioning and migration strategies instead of proliferating tables.

Skipping Layers

Sometimes teams go directly from Bronze to Gold for "simple" use cases. In my experience, this creates technical debt. That "simple" report eventually needs better data quality, and now you're maintaining transformation logic in Gold that should be in Silver, and you can't reuse it across use cases.

Not Having Clear Ownership

Each layer needs clear ownership. Typically: data engineers own Bronze and Silver, while analytics engineers or data analysts own Gold. Without this clarity, you get chaos.

When Medallion Architecture Makes Sense

Medallion architecture shines when you have:

Multiple data sources feeding into analytics
Multiple downstream consumers with different needs
Requirements for data quality and governance
Need for historical data replay and auditing
A team large enough to maintain distinct layers

For very small teams or simple pipelines (one source, one dashboard), medallion might be overkill. A well-designed two-tier system (raw + curated) could suffice. Don't cargo-cult the pattern just because it's trendy.

The Bottom Line

Medallion architecture isn't magic, but it's a solid, battle-tested pattern for organizing data platforms. The Bronze-Silver-Gold progression gives you clear separation of concerns, data quality checkpoints, and flexibility for multiple use cases.

The key to success is understanding that medallion architecture is a guideline, not a rigid specification. Adapt the pattern to your team's size, your data volumes, and your organizational needs. Start simple, add layers of sophistication as they provide clear value, and always prioritize clarity and maintainability over architectural purity.

Your future self—and your teammates—will thank you.