Developer Recipe: Create an ETL to Consolidate CRM, Ads and Budget Data for Unified Dashboards
developerETLanalytics

Developer Recipe: Create an ETL to Consolidate CRM, Ads and Budget Data for Unified Dashboards

ddashbroad
2026-02-11
10 min read
Advertisement

A developer-focused ETL walkthrough to unify CRM events, Google Ads spend and total campaign budgets for accurate, dashboard-ready reporting in 2026.

Stop stitching reports by hand: a developer recipe to unify CRM events, Google Ads spend and campaign budgets

Pain point: Marketing teams struggle with fragmented CRM events, separate ads cost feeds and new budget types (like Google’s total campaign budgets). This recipe shows a repeatable ETL you can deploy today to power unified dashboards that answer “spend vs. outcomes” without manual spreadsheets or engineering bottlenecks.

Why this matters in 2026

Two trends changed the calculus for marketing pipelines in late 2025 and early 2026: the rollout of total campaign budgets across Search and Shopping, and renewed pressure to eliminate data silos so enterprise AI and analytics actually deliver value. Salesforce and other industry research in 2025–26 highlights how poor data management limits AI and decisioning — which makes a lightweight, reliable ETL to centralize advertising and CRM signals essential.

“Total campaign budgets reduce manual daily tweaks but increase the need to connect spend-to-outcomes at the campaign-level.”

High-level architecture

This recipe follows a standard, maintainable pattern I’ve used across SaaS and retail clients:

  1. Extract raw extracts from CRM, Google Ads API, and budget metadata into a raw schema (immutable).
  2. Load raw data into a cloud warehouse (BigQuery / Snowflake / Redshift / Postgres OLTP for small teams).
  3. Transform with dbt-style models to canonicalize IDs, normalize currencies/timezones, deduplicate, and attribute events to ad clicks/campaigns.
  4. Aggregate into a reporting mart with spend, budgets, and outcomes for dashboards.

Tools I recommend (developer-friendly)

  • Extractors: Airbyte or Fivetran for managed connectors; Google Ads API direct pulls for custom fields (total campaign budgets are exposed in the Ads API since Jan 2026).
  • Orchestration: Airflow or Dagster for scheduled jobs and complex dependencies.
  • Warehouse: BigQuery (preferred for large ad datasets), Snowflake, or Postgres for small to midsize datasets.
  • Transformation: dbt for modular SQL models + tests.
  • Identity resolution: deterministic matching using canonical_user_id (email hashed, CRM id, cookie_id) and incremental merge ops.
  • Dashboarding: Looker / Tableau / Power BI / Dashbroad (your product) consuming the reporting mart.

Step-by-step ETL recipe

1) Extract: map the raw sources

Pull three canonical source groups:

  • CRM events (Salesforce, HubSpot): lead_created, lead_assigned, opportunity_won, revenue, touchpoints. Include user identifiers and timestamps: crm.contact_id, email_hashed, created_at, amount, currency.
  • Google Ads cost & budget: metrics (cost_micros, clicks, impressions) per day and per campaign; campaign metadata including campaign_id, campaign_name, and total_campaign_budget (micros) and budget_start_date / budget_end_date for total budgets.
  • Click/ad attribution: final URL, gclid, click_timestamp, landing_page, and optionally server-side conversion IDs (Google Click ID or your own click_id).

Recommended raw tables (naming convention: raw.source_table):

  • raw.crm_events
  • raw.ads_costs
  • raw.ads_campaigns
  • raw.clicks

2) Load into the warehouse with sensible partitioning

Store raw tables as append-only, partitioned by event_date (or ingestion_date). For BigQuery, partition by DATE(event_timestamp) and cluster by campaign_id or hashed user_id for fast joins.

3) Canonicalize identities and dedupe

Create a staging model that produces a canonical_user_id — deterministic and stable. Typical hierarchical approach:

  1. Use CRM contact_id if available.
  2. If not, use hashed email (SHA256 lowercase), normalized.
  3. Fallback to cookie_id or gclid-based synthetic id.
-- models/stg_crm_events.sql (BigQuery/Postgres syntax)
  with raw as (
    select *
    from raw.crm_events
  ), normalized as (
    select
      event_id,
      lower(trim(email)) as email_norm,
      coalesce(contact_id, null) as contact_id,
      sha256(lower(trim(email))) as email_hash,
      event_type,
      event_timestamp,
      amount,
      currency
    from raw
  )
  select
    event_id,
    coalesce(contact_id, email_hash) as canonical_user_id,
    event_type,
    event_timestamp,
    amount,
    currency
  from normalized;
  

4) Normalize ads cost and budgets

Google Ads reports often use micros and might return daily aggregated cost. Also ingest campaign-level total campaign budget fields which represent total spend allowed over a period.

-- models/stg_ads_costs.sql
  with raw as (
    select *
    from raw.ads_costs
  ), normalized as (
    select
      campaign_id,
      date(day) as cost_date,
      sum(cost_micros)/1e6 as cost_usd, -- assuming USD micros
      sum(clicks) as clicks,
      min(currency) as currency
    from raw
    group by 1,2
  )
  select * from normalized;
  

Campaign metadata with total budgets:

-- models/stg_ads_campaigns.sql
  select
    campaign_id,
    campaign_name,
    total_campaign_budget_micros/1e6 as total_campaign_budget_usd,
    budget_start_date,
    budget_end_date,
    status
  from raw.ads_campaigns;
  

5) Attribution: attribute CRM events to campaigns

Attribution depends on your tracking setup. The most reliable method is deterministic matching using click identifiers (gclid) or server-side click_id. If you only have last-touch via landing page or a session cookie, build a best-effort join and add a confidence field.

-- models/mart_attributed_events.sql
  with events as (
    select *, date(event_timestamp) as event_date
    from models.stg_crm_events
  ), clicks as (
    select gclid, campaign_id, click_timestamp, canonical_user_id
    from raw.clicks
  ), matches as (
    -- exact match on gclid or canonical_user_id within lookback window
    select
      e.event_id,
      e.canonical_user_id,
      c.campaign_id,
      e.event_type,
      e.event_timestamp
    from events e
    left join clicks c
      on (e.gclid = c.gclid) -- preferred
      or (e.canonical_user_id = c.canonical_user_id and c.click_timestamp between timestamp_sub(e.event_timestamp, interval 30 day) and e.event_timestamp)
  )
  select * from matches;
  

6) Aggregate spend to campaign-date and compute pacing vs. total campaign budget

Combine campaign-level budgets and daily cost to compute cumulative spend, pacing %, and remaining budget. This is critical for dashboards that show “campaign X has spent 40% of budget with 60% of time left.”

-- models/mart_campaign_spend.sql (BigQuery example)
  with costs as (
    select
      campaign_id,
      cost_date,
      sum(cost_usd) as cost_usd
    from models.stg_ads_costs
    group by 1,2
  ), budgets as (
    select
      campaign_id,
      total_campaign_budget_usd,
      budget_start_date,
      budget_end_date
    from models.stg_ads_campaigns
  ), campaign_calendar as (
    select
      b.campaign_id,
      date_add(b.budget_start_date, interval x day) as day
    from budgets b,
    unnest(generate_array(0, date_diff(b.budget_end_date, b.budget_start_date, day))) as x
  ), joined as (
    select
      c.campaign_id,
      cal.day as cost_date,
      coalesce(costs.cost_usd, 0) as cost_usd,
      b.total_campaign_budget_usd,
      b.budget_start_date,
      b.budget_end_date
    from campaign_calendar cal
    left join costs on costs.campaign_id = cal.campaign_id and costs.cost_date = cal.day
    left join budgets b on b.campaign_id = cal.campaign_id
  ), running as (
    select
      campaign_id,
      cost_date,
      cost_usd,
      total_campaign_budget_usd,
      sum(cost_usd) over (partition by campaign_id order by cost_date rows between unbounded preceding and current row) as cumulative_spend_usd,
      -- compute campaign duration and elapsed
      date_diff(cost_date, budget_start_date, day) as days_elapsed,
      date_diff(budget_end_date, budget_start_date, day) + 1 as total_days
    from joined
  )
  select
    campaign_id,
    cost_date,
    cost_usd,
    cumulative_spend_usd,
    total_campaign_budget_usd,
    coalesce(cumulative_spend_usd / nullif(total_campaign_budget_usd,0),0) as pacing_percent,
    days_elapsed,
    total_days,
    case when total_campaign_budget_usd is null then null else (total_campaign_budget_usd - cumulative_spend_usd) end as remaining_budget_usd
  from running;
  

Sample queries for dashboard KPIs

Below are SQL snippets you can drop into your BI tool to build cards quickly.

Total spend vs total budget (campaign)

select
    campaign_id,
    sum(cost_usd) as total_spend_usd,
    max(total_campaign_budget_usd) as total_budget_usd,
    round(sum(cost_usd)/nullif(max(total_campaign_budget_usd),0)*100,2) as pct_of_budget
  from models.mart_campaign_spend
  group by campaign_id;
  

Cost per win (campaign-attributed)

select
    c.campaign_id,
    sum(c.cost_usd) as spend_usd,
    count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') as wins,
    case when count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') = 0 then null
      else sum(c.cost_usd) / count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') end as cost_per_win
  from models.mart_campaign_spend c
  left join models.mart_attributed_events e on c.campaign_id = e.campaign_id and c.cost_date = date(e.event_timestamp)
  group by c.campaign_id;
  

Implementation considerations and gotchas

1) Timezones & day boundaries

Google Ads often reports in account timezone; CRM timestamps may be UTC. Normalize everything to UTC and produce a derived date in the brand’s reporting timezone for final aggregates.

2) Currency conversions

Campaigns might run in multiple currencies. Normalize cost to a single reporting currency using daily FX rates. Store cost_micros and converted_cost_usd to allow reconciliations.

3) Handling partial budgets and updates

Total campaign budgets can be created or changed mid-flight. Treat budgets as slowly changing dimensions: timestamp the budget record and use the budget version applicable to each spend date when computing pacing.

4) Incomplete attribution

Not all CRM events will match to a click. Keep a confidence flag and consider multi-touch modeling later. For now, surface unmatched events so marketing can investigate gaps (this is where CDPs and server-side tracking help).

5) Privacy and aggregated measurement

With iOS/Android and browser privacy updates, you might lose deterministic gclid matching from some sources. Build fallback aggregated pipelines (campaign-date-level aggregation) and enrich via server-side tracking (Conversions API) to maintain signal fidelity. Also review the ethical and legal implications for using aggregated or training data in downstream AI systems.

Testing and validation checklist

  • Reconcile daily spend totals in warehouse vs Google Ads UI (tolerance <1–2%).
  • Validate canonical_user_id de-dup rates against CRM expectations.
  • Confirm budget pacing numbers for several test campaigns with known budgets (e.g., 72-hour promotions) — the SearchEngineLand-reported case studies from Jan 2026 show this is critical for short bursts.
  • Automate data quality tests in dbt for nulls, negative costs, currency mismatches, and unexpected campaign id changes.

Operationalizing and scaling

As dataset size grows, follow these practices:

  • Use partitioned and clustered tables to reduce scan costs (BigQuery partition by date + cluster by campaign_id).
  • Push down pre-aggregation to the warehouse (materialized views or incremental dbt models) to serve dashboards directly.
  • Version control transformations and CI-run tests for dbt models to ensure analytics stability.
  • Instrument monitoring for connector failures (Airbyte/Fivetran alerts) and drift in budget fields.

Real-world example: 72-hour launch campaign

Context: a midmarket retailer used Google’s total campaign budget to run a 72-hour promo in Nov 2025. They needed to know daily pacing and cost-per-order attributed to the campaign. Implementation summary:

  1. Extracted CRM orders (order_id, email_hash, order_value) and Google Ads cost & campaigns hourly.
  2. Canonicalized users and attributed orders via gclid and server-side click_id.
  3. Stored campaign.total_campaign_budget_usd and budget start/end; computed cumulative_spend_usd per day and pacing %.

Result: leadership could see at hour 36 that the campaign had spent 58% of its budget with 50% of time elapsed, enabling a targeted pause/creative swap. The team also measured cost-per-order within 24 hours of campaign start, improving PLA decisions mid-flight.

  • API evolution: Ads APIs will continue to add budget-level controls and richer meta (bid strategies, AI recommendations). Keep your extractor layer modular so you can pick up new fields without reworking transformations.
  • AI-driven budgeting: Platforms will recommend or auto-allocate budgets. Capture recommendation metadata to evaluate platform-driven vs manual outcomes.
  • Shift to aggregated signals: Build both deterministic and aggregated models; use both to maintain continuity as deterministic signals degrade.
  • Data ops discipline: Expect tighter audits and governance as enterprises operationalize AI — include lineage, tests, and access controls now.

Wrap-up: practical next steps (developer checklist)

  1. Provision a cloud warehouse and choose your connector (Airbyte/Fivetran). Start with daily extracts for both Ads and CRM.
  2. Implement staging models to canonicalize ids and normalize currencies/timezones.
  3. Ingest campaign budget fields and build the campaign_calendar pattern to compute pacing against total_campaign_budget.
  4. Create dbt tests and BI cards for Spend vs Budget, Cost per Win, and Remaining Budget.
  5. Instrument alerts for connector failures, data drift, and pacing thresholds (e.g., >80% of budget with <50% time elapsed).

Actionable artifacts you can reuse

  • SQL snippets in this article (staging, spend mart, attribution) — drop them into dbt models and adapt column names.
  • dbt model pattern: stg_* → int_* (integrated) → mart_* (dashboard-ready).
  • Alert patterns: spend_daily_vs_budget, missing_gclid_rate, unmatched_events_ratio.

Closing thoughts

Consolidating CRM events, Google Ads spend, and total campaign budgets into a single ETL pipeline cuts reporting time from hours to minutes and enables confident mid-campaign decisions. In 2026, with platforms offering total campaign budgets and AI-driven spend optimization, having a reliable, auditable pipeline is no longer optional — it’s how marketing teams scale while maintaining trust in their data.

Next step: Clone a starter dbt repo, wire up a single campaign and a CRM table, and validate pacing numbers against the Ads UI for three campaigns. If you want a ready-made starter pack (dbt + Airbyte connectors + dashboard templates), visit Dashbroad or request our ETL starter kit for marketing analytics teams.

Author: Senior Analytics Developer, Dashbroad — practical ETL patterns for marketer-focused dashboards (2026).

Advertisement

Related Topics

#developer#ETL#analytics
d

dashbroad

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-11T18:10:56.628Z