If you're loading data into Snowflake the same way you loaded it into your legacy data warehouse, you're probably spending more money than necessary and dealing with longer refresh cycles than you should. Incremental loading isn't just a nice-to-have optimization—it's a fundamental strategy that separates efficient data pipelines from expensive, slow ones.

In this post, I'll walk you through the most effective incremental loading strategies for Snowflake, explain when to use each approach, and share practical patterns that we've battle-tested across dozens of production environments.

Why Incremental Loading Matters in Snowflake

Let's start with the business case. Full table reloads might seem simpler to implement, but they create three significant problems:

Incremental loading solves these problems by processing only what's changed since the last load. For a table with millions of rows where only thousands change daily, you're looking at potential cost savings of 90% or more.

Strategy 1: Timestamp-Based Incremental Loading

This is the workhorse pattern that most teams should master first. The concept is simple: track when records were last modified and only load records newer than your last successful load.

The Basic Implementation

Here's how this looks in practice:

-- Create a control table to track load watermarks
CREATE TABLE etl_watermarks (
  table_name VARCHAR(255) PRIMARY KEY,
  last_loaded_timestamp TIMESTAMP_NTZ,
  updated_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Incremental load query
INSERT INTO target_table
SELECT *
FROM source_table
WHERE modified_date > (
  SELECT last_loaded_timestamp 
  FROM etl_watermarks 
  WHERE table_name = 'target_table'
)
AND modified_date <= CURRENT_TIMESTAMP();

-- Update watermark after successful load
UPDATE etl_watermarks
SET last_loaded_timestamp = CURRENT_TIMESTAMP(),
    updated_at = CURRENT_TIMESTAMP()
WHERE table_name = 'target_table';

The key insight here is the upper bound on the timestamp filter. By capping at the start of your load process, you avoid missing records that arrive during processing—a common gotcha that causes data loss.

When to Use Timestamp-Based Loading

This strategy works best when:

The main limitation? It doesn't handle hard deletes well. If a record is deleted from the source, timestamp-based loading won't detect it.

Strategy 2: Change Data Capture (CDC) with Streams

Snowflake Streams are purpose-built for incremental processing, and they're genuinely elegant when used correctly. A Stream creates a change table that tracks inserts, updates, and deletes on a source table.

Implementing Stream-Based Loading

-- Create a stream on your staging table
CREATE STREAM customer_stream ON TABLE customer_stage;

-- Process changes with full DML awareness
MERGE INTO customer_target t
USING customer_stream s
ON t.customer_id = s.customer_id
WHEN MATCHED AND s.METADATA$ACTION = 'DELETE' 
  THEN DELETE
WHEN MATCHED AND s.METADATA$ACTION = 'INSERT' 
  THEN UPDATE SET
    t.name = s.name,
    t.email = s.email,
    t.updated_at = s.updated_at
WHEN NOT MATCHED AND s.METADATA$ACTION = 'INSERT'
  THEN INSERT (customer_id, name, email, updated_at)
    VALUES (s.customer_id, s.name, s.email, s.updated_at);

The beauty of Streams is that they automatically advance their offset after successful consumption. No manual watermark management needed. The METADATA$ACTION and METADATA$ISUPDATE columns tell you exactly what changed.

Stream Best Practices

After working with Streams across multiple production systems, here are the patterns that matter:

Streams shine for internal Snowflake table-to-table pipelines but can't directly track changes in external systems. For that, you need a different approach.

Strategy 3: Hash Diff Pattern for SCD Type 2

When you need to track historical changes and maintain Slowly Changing Dimension (SCD) Type 2 logic, the hash diff pattern is your friend. This approach detects changes by comparing hash values of concatenated columns.

-- Add hash column to your target table
ALTER TABLE customer_target 
ADD COLUMN row_hash VARCHAR(64);

-- Generate hash in staging
CREATE OR REPLACE VIEW customer_stage_hashed AS
SELECT 
  *,
  SHA2_HEX(CONCAT_WS('||', 
    customer_id, 
    name, 
    email, 
    address
  )) as row_hash
FROM customer_stage;

-- Identify changes
WITH changes AS (
  SELECT s.*
  FROM customer_stage_hashed s
  LEFT JOIN customer_target t
    ON s.customer_id = t.customer_id
    AND t.is_current = TRUE
  WHERE t.row_hash IS NULL 
     OR t.row_hash != s.row_hash
)
-- Expire old records and insert new versions
-- (Full SCD Type 2 logic here)

This pattern is computationally more expensive than pure timestamp-based loading because you're comparing hash values across potentially large tables. Use it when you specifically need historical tracking, not as a default incremental strategy.

Strategy 4: Merge with Conditional Logic

The MERGE statement is Snowflake's Swiss Army knife for incremental loading. It handles inserts, updates, and deletes in a single statement, which is both elegant and performant.

Here's a production-ready pattern with conditional update logic:

MERGE INTO orders_target t
USING orders_staging s
ON t.order_id = s.order_id
WHEN MATCHED 
  AND (t.status != s.status 
    OR t.total_amount != s.total_amount
    OR t.updated_at < s.updated_at)
  THEN UPDATE SET
    t.status = s.status,
    t.total_amount = s.total_amount,
    t.updated_at = s.updated_at,
    t.etl_updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED 
  THEN INSERT (order_id, status, total_amount, updated_at, etl_inserted_at)
    VALUES (s.order_id, s.status, s.total_amount, s.updated_at, CURRENT_TIMESTAMP());

The conditional update clause is crucial—it prevents unnecessary writes when data hasn't actually changed, which reduces costs and avoids triggering downstream streams unnecessarily.

Choosing the Right Strategy

After implementing hundreds of data pipelines, here's my opinionated decision framework:

Performance and Cost Optimization Tips

Regardless of which strategy you choose, these optimizations apply universally:

Cluster your tables appropriately: If you're filtering on timestamp columns, cluster on those columns. The metadata pruning will dramatically reduce scan costs.

Size your warehouses correctly: Incremental loads often work fine on smaller warehouses. Don't default to large warehouses out of habit.

Batch appropriately: Loading every 5 minutes isn't always better than hourly. Balance freshness requirements against compute costs.

Monitor query profiles: Use Snowflake's query profile to identify partition pruning efficiency. If you're scanning the full table despite filters, you have a clustering or design issue.

Wrapping Up

Incremental loading isn't a single technique—it's a toolkit that you match to your specific requirements. The timestamp-based approach will serve you well for most scenarios, while Streams and hash diff patterns solve specific challenges in CDC and historical tracking.

The real skill is recognizing which tool to use when, and being willing to refactor as your requirements evolve. A pipeline that starts with full reloads might graduate to timestamp-based incremental loading, and eventually to Stream-based processing as data volumes grow.

The best data engineers I've worked with treat incremental loading as a default mindset, not an optimization you add later. Build it in from the start, and your future self will thank you when those tables hit millions of rows.