Developer Recipe: Create an ETL to Consolidate CRM, Ads and Budget Data for Unified Dashboards
A developer-focused ETL walkthrough to unify CRM events, Google Ads spend and total campaign budgets for accurate, dashboard-ready reporting in 2026.
Stop stitching reports by hand: a developer recipe to unify CRM events, Google Ads spend and campaign budgets
Pain point: Marketing teams struggle with fragmented CRM events, separate ads cost feeds and new budget types (like Google’s total campaign budgets). This recipe shows a repeatable ETL you can deploy today to power unified dashboards that answer “spend vs. outcomes” without manual spreadsheets or engineering bottlenecks.
Why this matters in 2026
Two trends changed the calculus for marketing pipelines in late 2025 and early 2026: the rollout of total campaign budgets across Search and Shopping, and renewed pressure to eliminate data silos so enterprise AI and analytics actually deliver value. Salesforce and other industry research in 2025–26 highlights how poor data management limits AI and decisioning — which makes a lightweight, reliable ETL to centralize advertising and CRM signals essential.
“Total campaign budgets reduce manual daily tweaks but increase the need to connect spend-to-outcomes at the campaign-level.”
High-level architecture
This recipe follows a standard, maintainable pattern I’ve used across SaaS and retail clients:
- Extract raw extracts from CRM, Google Ads API, and budget metadata into a raw schema (immutable).
- Load raw data into a cloud warehouse (BigQuery / Snowflake / Redshift / Postgres OLTP for small teams).
- Transform with dbt-style models to canonicalize IDs, normalize currencies/timezones, deduplicate, and attribute events to ad clicks/campaigns.
- Aggregate into a reporting mart with spend, budgets, and outcomes for dashboards.
Tools I recommend (developer-friendly)
- Extractors: Airbyte or Fivetran for managed connectors; Google Ads API direct pulls for custom fields (total campaign budgets are exposed in the Ads API since Jan 2026).
- Orchestration: Airflow or Dagster for scheduled jobs and complex dependencies.
- Warehouse: BigQuery (preferred for large ad datasets), Snowflake, or Postgres for small to midsize datasets.
- Transformation: dbt for modular SQL models + tests.
- Identity resolution: deterministic matching using canonical_user_id (email hashed, CRM id, cookie_id) and incremental merge ops.
- Dashboarding: Looker / Tableau / Power BI / Dashbroad (your product) consuming the reporting mart.
Step-by-step ETL recipe
1) Extract: map the raw sources
Pull three canonical source groups:
- CRM events (Salesforce, HubSpot): lead_created, lead_assigned, opportunity_won, revenue, touchpoints. Include user identifiers and timestamps: crm.contact_id, email_hashed, created_at, amount, currency.
- Google Ads cost & budget: metrics (cost_micros, clicks, impressions) per day and per campaign; campaign metadata including campaign_id, campaign_name, and total_campaign_budget (micros) and budget_start_date / budget_end_date for total budgets.
- Click/ad attribution: final URL, gclid, click_timestamp, landing_page, and optionally server-side conversion IDs (Google Click ID or your own click_id).
Recommended raw tables (naming convention: raw.source_table):
- raw.crm_events
- raw.ads_costs
- raw.ads_campaigns
- raw.clicks
2) Load into the warehouse with sensible partitioning
Store raw tables as append-only, partitioned by event_date (or ingestion_date). For BigQuery, partition by DATE(event_timestamp) and cluster by campaign_id or hashed user_id for fast joins.
3) Canonicalize identities and dedupe
Create a staging model that produces a canonical_user_id — deterministic and stable. Typical hierarchical approach:
- Use CRM contact_id if available.
- If not, use hashed email (SHA256 lowercase), normalized.
- Fallback to cookie_id or gclid-based synthetic id.
-- models/stg_crm_events.sql (BigQuery/Postgres syntax)
with raw as (
select *
from raw.crm_events
), normalized as (
select
event_id,
lower(trim(email)) as email_norm,
coalesce(contact_id, null) as contact_id,
sha256(lower(trim(email))) as email_hash,
event_type,
event_timestamp,
amount,
currency
from raw
)
select
event_id,
coalesce(contact_id, email_hash) as canonical_user_id,
event_type,
event_timestamp,
amount,
currency
from normalized;
4) Normalize ads cost and budgets
Google Ads reports often use micros and might return daily aggregated cost. Also ingest campaign-level total campaign budget fields which represent total spend allowed over a period.
-- models/stg_ads_costs.sql
with raw as (
select *
from raw.ads_costs
), normalized as (
select
campaign_id,
date(day) as cost_date,
sum(cost_micros)/1e6 as cost_usd, -- assuming USD micros
sum(clicks) as clicks,
min(currency) as currency
from raw
group by 1,2
)
select * from normalized;
Campaign metadata with total budgets:
-- models/stg_ads_campaigns.sql
select
campaign_id,
campaign_name,
total_campaign_budget_micros/1e6 as total_campaign_budget_usd,
budget_start_date,
budget_end_date,
status
from raw.ads_campaigns;
5) Attribution: attribute CRM events to campaigns
Attribution depends on your tracking setup. The most reliable method is deterministic matching using click identifiers (gclid) or server-side click_id. If you only have last-touch via landing page or a session cookie, build a best-effort join and add a confidence field.
-- models/mart_attributed_events.sql
with events as (
select *, date(event_timestamp) as event_date
from models.stg_crm_events
), clicks as (
select gclid, campaign_id, click_timestamp, canonical_user_id
from raw.clicks
), matches as (
-- exact match on gclid or canonical_user_id within lookback window
select
e.event_id,
e.canonical_user_id,
c.campaign_id,
e.event_type,
e.event_timestamp
from events e
left join clicks c
on (e.gclid = c.gclid) -- preferred
or (e.canonical_user_id = c.canonical_user_id and c.click_timestamp between timestamp_sub(e.event_timestamp, interval 30 day) and e.event_timestamp)
)
select * from matches;
6) Aggregate spend to campaign-date and compute pacing vs. total campaign budget
Combine campaign-level budgets and daily cost to compute cumulative spend, pacing %, and remaining budget. This is critical for dashboards that show “campaign X has spent 40% of budget with 60% of time left.”
-- models/mart_campaign_spend.sql (BigQuery example)
with costs as (
select
campaign_id,
cost_date,
sum(cost_usd) as cost_usd
from models.stg_ads_costs
group by 1,2
), budgets as (
select
campaign_id,
total_campaign_budget_usd,
budget_start_date,
budget_end_date
from models.stg_ads_campaigns
), campaign_calendar as (
select
b.campaign_id,
date_add(b.budget_start_date, interval x day) as day
from budgets b,
unnest(generate_array(0, date_diff(b.budget_end_date, b.budget_start_date, day))) as x
), joined as (
select
c.campaign_id,
cal.day as cost_date,
coalesce(costs.cost_usd, 0) as cost_usd,
b.total_campaign_budget_usd,
b.budget_start_date,
b.budget_end_date
from campaign_calendar cal
left join costs on costs.campaign_id = cal.campaign_id and costs.cost_date = cal.day
left join budgets b on b.campaign_id = cal.campaign_id
), running as (
select
campaign_id,
cost_date,
cost_usd,
total_campaign_budget_usd,
sum(cost_usd) over (partition by campaign_id order by cost_date rows between unbounded preceding and current row) as cumulative_spend_usd,
-- compute campaign duration and elapsed
date_diff(cost_date, budget_start_date, day) as days_elapsed,
date_diff(budget_end_date, budget_start_date, day) + 1 as total_days
from joined
)
select
campaign_id,
cost_date,
cost_usd,
cumulative_spend_usd,
total_campaign_budget_usd,
coalesce(cumulative_spend_usd / nullif(total_campaign_budget_usd,0),0) as pacing_percent,
days_elapsed,
total_days,
case when total_campaign_budget_usd is null then null else (total_campaign_budget_usd - cumulative_spend_usd) end as remaining_budget_usd
from running;
Sample queries for dashboard KPIs
Below are SQL snippets you can drop into your BI tool to build cards quickly.
Total spend vs total budget (campaign)
select
campaign_id,
sum(cost_usd) as total_spend_usd,
max(total_campaign_budget_usd) as total_budget_usd,
round(sum(cost_usd)/nullif(max(total_campaign_budget_usd),0)*100,2) as pct_of_budget
from models.mart_campaign_spend
group by campaign_id;
Cost per win (campaign-attributed)
select
c.campaign_id,
sum(c.cost_usd) as spend_usd,
count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') as wins,
case when count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') = 0 then null
else sum(c.cost_usd) / count(distinct e.event_id) filter (where e.event_type = 'opportunity_won') end as cost_per_win
from models.mart_campaign_spend c
left join models.mart_attributed_events e on c.campaign_id = e.campaign_id and c.cost_date = date(e.event_timestamp)
group by c.campaign_id;
Implementation considerations and gotchas
1) Timezones & day boundaries
Google Ads often reports in account timezone; CRM timestamps may be UTC. Normalize everything to UTC and produce a derived date in the brand’s reporting timezone for final aggregates.
2) Currency conversions
Campaigns might run in multiple currencies. Normalize cost to a single reporting currency using daily FX rates. Store cost_micros and converted_cost_usd to allow reconciliations.
3) Handling partial budgets and updates
Total campaign budgets can be created or changed mid-flight. Treat budgets as slowly changing dimensions: timestamp the budget record and use the budget version applicable to each spend date when computing pacing.
4) Incomplete attribution
Not all CRM events will match to a click. Keep a confidence flag and consider multi-touch modeling later. For now, surface unmatched events so marketing can investigate gaps (this is where CDPs and server-side tracking help).
5) Privacy and aggregated measurement
With iOS/Android and browser privacy updates, you might lose deterministic gclid matching from some sources. Build fallback aggregated pipelines (campaign-date-level aggregation) and enrich via server-side tracking (Conversions API) to maintain signal fidelity. Also review the ethical and legal implications for using aggregated or training data in downstream AI systems.
Testing and validation checklist
- Reconcile daily spend totals in warehouse vs Google Ads UI (tolerance <1–2%).
- Validate canonical_user_id de-dup rates against CRM expectations.
- Confirm budget pacing numbers for several test campaigns with known budgets (e.g., 72-hour promotions) — the SearchEngineLand-reported case studies from Jan 2026 show this is critical for short bursts.
- Automate data quality tests in dbt for nulls, negative costs, currency mismatches, and unexpected campaign id changes.
Operationalizing and scaling
As dataset size grows, follow these practices:
- Use partitioned and clustered tables to reduce scan costs (BigQuery partition by date + cluster by campaign_id).
- Push down pre-aggregation to the warehouse (materialized views or incremental dbt models) to serve dashboards directly.
- Version control transformations and CI-run tests for dbt models to ensure analytics stability.
- Instrument monitoring for connector failures (Airbyte/Fivetran alerts) and drift in budget fields.
Real-world example: 72-hour launch campaign
Context: a midmarket retailer used Google’s total campaign budget to run a 72-hour promo in Nov 2025. They needed to know daily pacing and cost-per-order attributed to the campaign. Implementation summary:
- Extracted CRM orders (order_id, email_hash, order_value) and Google Ads cost & campaigns hourly.
- Canonicalized users and attributed orders via gclid and server-side click_id.
- Stored campaign.total_campaign_budget_usd and budget start/end; computed cumulative_spend_usd per day and pacing %.
Result: leadership could see at hour 36 that the campaign had spent 58% of its budget with 50% of time elapsed, enabling a targeted pause/creative swap. The team also measured cost-per-order within 24 hours of campaign start, improving PLA decisions mid-flight.
Future-proofing: 2026+ trends to plan for
- API evolution: Ads APIs will continue to add budget-level controls and richer meta (bid strategies, AI recommendations). Keep your extractor layer modular so you can pick up new fields without reworking transformations.
- AI-driven budgeting: Platforms will recommend or auto-allocate budgets. Capture recommendation metadata to evaluate platform-driven vs manual outcomes.
- Shift to aggregated signals: Build both deterministic and aggregated models; use both to maintain continuity as deterministic signals degrade.
- Data ops discipline: Expect tighter audits and governance as enterprises operationalize AI — include lineage, tests, and access controls now.
Wrap-up: practical next steps (developer checklist)
- Provision a cloud warehouse and choose your connector (Airbyte/Fivetran). Start with daily extracts for both Ads and CRM.
- Implement staging models to canonicalize ids and normalize currencies/timezones.
- Ingest campaign budget fields and build the campaign_calendar pattern to compute pacing against total_campaign_budget.
- Create dbt tests and BI cards for Spend vs Budget, Cost per Win, and Remaining Budget.
- Instrument alerts for connector failures, data drift, and pacing thresholds (e.g., >80% of budget with <50% time elapsed).
Actionable artifacts you can reuse
- SQL snippets in this article (staging, spend mart, attribution) — drop them into dbt models and adapt column names.
- dbt model pattern: stg_* → int_* (integrated) → mart_* (dashboard-ready).
- Alert patterns: spend_daily_vs_budget, missing_gclid_rate, unmatched_events_ratio.
Closing thoughts
Consolidating CRM events, Google Ads spend, and total campaign budgets into a single ETL pipeline cuts reporting time from hours to minutes and enables confident mid-campaign decisions. In 2026, with platforms offering total campaign budgets and AI-driven spend optimization, having a reliable, auditable pipeline is no longer optional — it’s how marketing teams scale while maintaining trust in their data.
Next step: Clone a starter dbt repo, wire up a single campaign and a CRM table, and validate pacing numbers against the Ads UI for three campaigns. If you want a ready-made starter pack (dbt + Airbyte connectors + dashboard templates), visit Dashbroad or request our ETL starter kit for marketing analytics teams.
Related Reading
- Architecting a Paid-Data Marketplace: Security, Billing, and Model Audit Trails
- Developer Guide: Offering Your Content as Compliant Training Data
- Comparing CRMs for full document lifecycle management: scoring matrix and decision flow
- Security Best Practices with Mongoose.Cloud
- The Ethical & Legal Playbook for Selling Creator Work to AI Marketplaces
- Streaming Launches: Using Digital Platforms to Premiere New Perfumes
- Budget Audio vs Premium: Should You Replace Your Car Speakers With Cheap Micro Speakers?
- When Allegations Hit a Brand: Legal Checklist for Small Businesses Facing Employee Misconduct Claims
- Acupuncture Retreat at Home: Designing a Short-Term Retreat Using Condo Amenities
- Score Brooks Shoes for Less: Best Timeframes and Promo Codes (Updated Jan 2026)
Related Topics
dashbroad
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
News: Major Exchange Launches Layer‑2 Clearing — What It Means for Settlement Dashboards (2026)
AI Hardware and Analytics: Assessing Trends and Implications for Businesses
CRM Selection Cheat Sheet: Which Platform Makes Analytics Easy for Marketers?
From Our Network
Trending stories across our publication group