GA4 to BigQuery: Building a Measurement Pipeline That Scales
GA4’s native reporting interface is fine for checking yesterday’s traffic. It is not fine for answering the questions that actually matter to your business — questions like “what’s the true cost per acquisition when I account for assisted conversions across all channels?” or “how does user lifetime value differ between organic search and paid social cohorts over 90 days?”
If you’re running campaigns across multiple platforms and need measurement you can trust, you need your event-level data in a warehouse where you control the queries, the attribution logic, and the retention window. That warehouse is BigQuery, and the pipeline from GA4 to BigQuery is one of the highest-leverage analytics investments you can make.
Why GA4 Native Reporting Falls Short
GA4 is a capable analytics platform. But it has structural limitations that make it inadequate as your single source of truth for campaign measurement:
Sampling kicks in fast. Any exploration or custom report that touches more than roughly 500,000 events will be sampled. Google tells you it’s sampled — there’s a little shield icon — but most people either miss it or don’t understand the implications. Sampled data is estimated data, and estimated data makes for unreliable optimisation decisions.
Data retention caps at 14 months. GA4 retains user-level data for a maximum of 14 months (or 2 months on the free tier). If you want to analyse year-over-year trends, build LTV models, or look at seasonal patterns, you need your data somewhere that doesn’t delete it.
Attribution is a black box. GA4’s data-driven attribution model is genuinely useful, but you can’t inspect it, modify it, or compare it against alternatives. When a client asks “what happens to our numbers if we switch to last-click?”, you can’t answer that question inside GA4.
Cross-platform joins are impossible. GA4 knows about website behaviour. It doesn’t know about your CRM data, your offline conversions, your product margin data, or your cost data from non-Google platforms. Real measurement requires joining these datasets.
The bottom line: GA4 is a great data collection tool. It’s a mediocre analysis tool. BigQuery is where you turn GA4’s raw events into actual business intelligence.
What the BigQuery Export Gives You
When you enable the BigQuery export in GA4, you get a daily (or streaming) export of every event, with every parameter, for every user. No sampling, no aggregation, no retention limits.
The export creates a table per day in the format events_YYYYMMDD, and each row represents a single event with nested fields for event parameters, user properties, device info, geo data, and traffic source.
Here’s a simplified view of the schema:
SELECT
event_date,
event_name,
event_timestamp,
user_pseudo_id,
-- Extract specific event parameters
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time,
-- Traffic source
traffic_source.source,
traffic_source.medium,
traffic_source.name AS campaign,
-- Device
device.category AS device_category,
-- Geo
geo.country,
geo.city
FROM `project.dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20260301' AND '20260324'
The nested UNNEST syntax looks intimidating the first time, but it’s the key to the entire pipeline. Every event parameter, every user property, and every item in an e-commerce transaction is stored as a nested array. You extract what you need with UNNEST and ignore the rest.
Daily vs Streaming Export
GA4 offers two export modes:
- Daily export (free): One batch per day, available by the following afternoon. Good enough for most reporting use cases.
- Streaming export (BigQuery charges apply): Events appear in BigQuery within minutes. Required if you’re building real-time dashboards or triggering actions based on events.
For most agencies, the daily export is the right choice. Streaming adds cost and complexity, and very few marketing decisions need to be made within minutes of an event occurring.
Building the Pipeline: Step by Step
Step 1: Enable the BigQuery Link
In GA4, go to Admin > Product Links > BigQuery Links. Select your Google Cloud project, choose daily or streaming, and pick your events. Enable “Include advertising identifiers” if you need them (and your consent implementation supports it).
Pro tip: Use a dedicated GCP project for analytics data, separate from your application infrastructure. This makes billing clearer and access management simpler — especially if you’re sharing query access with clients or analysts.
Step 2: Design Your Intermediate Layer
Raw GA4 export tables are verbose and expensive to query repeatedly. The standard approach is to build an intermediate layer of materialised views or scheduled queries that transform the raw data into analysis-ready tables.
At minimum, you want:
A sessions table:
-- Materialised view: sessions
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
MIN(event_timestamp) AS session_start,
MAX(event_timestamp) AS session_end,
COUNTIF(event_name = 'page_view') AS pageviews,
MAX(traffic_source.source) AS source,
MAX(traffic_source.medium) AS medium,
MAX(traffic_source.name) AS campaign,
MAX(device.category) AS device_category,
MAX(geo.country) AS country
FROM `project.dataset.events_*`
GROUP BY user_pseudo_id, session_id
A conversions table:
-- Materialised view: conversions
SELECT
event_date,
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
event_name AS conversion_event,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS conversion_value,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'currency') AS currency,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id
FROM `project.dataset.events_*`
WHERE event_name IN ('purchase', 'generate_lead', 'sign_up', 'begin_checkout')
A user properties table that captures the latest user-scoped attributes for cohort analysis.
These intermediate tables become the foundation for all downstream reporting. Query them instead of the raw tables — your BigQuery costs will drop dramatically and your queries will run in seconds instead of minutes.
Step 3: Build Your Dashboards
With clean intermediate tables, connect your dashboarding tool of choice:
- Looker Studio (free, integrates natively with BigQuery)
- Looker (enterprise, if you need governed metrics and explore capabilities)
- Custom dashboards via BigQuery API (if you’re building client-facing reporting into your own product)
The key principle: dashboards should query your intermediate layer, never the raw export tables. This keeps costs predictable and load times fast.
Step 4: Schedule and Monitor
Use BigQuery scheduled queries to refresh your intermediate tables daily, timed after the GA4 daily export completes (typically by 14:00 UTC for the previous day’s data).
Set up alerts for:
- Export failures: The GA4 export occasionally fails silently. Monitor for missing daily tables.
- Volume anomalies: If today’s event count is 50% lower than the 7-day average, something is probably broken — a tag was removed, the site went down, or the export stopped.
- Schema changes: GA4 can add new default event parameters in platform updates, which changes your table schema. Your queries should handle this gracefully.
Common Pitfalls (and How to Avoid Them)
Schema Changes That Break Queries
GA4’s export schema evolves. Google adds new fields, deprecates old ones, and occasionally changes nested structures. If your queries reference specific field paths, they can break silently — returning NULL instead of erroring.
Fix: Write defensive queries that check for field existence. Use IFNULL and COALESCE liberally. Pin your queries to specific field paths rather than using SELECT *.
Event Naming Inconsistency
This is the single biggest data quality issue in BigQuery pipelines. It’s not a BigQuery problem — it’s a GTM problem that surfaces in BigQuery.
When different developers push events with different naming conventions (add_to_cart vs addToCart vs add-to-cart), you end up with what looks like three different events in BigQuery. Your conversion queries miss two-thirds of the data.
Fix: Enforce a naming convention at the GTM layer (see our GTM audit guide for the full checklist). Use GA4’s recommended event names wherever they apply. If custom events are necessary, document them and validate them with automated checks.
Property Sprawl Across Clients
Agencies managing 20+ clients often have GA4 properties scattered across different Google accounts, different GCP projects, and different BigQuery datasets. Some export to BigQuery, some don’t. Some use streaming, some use daily. There’s no single pane of glass.
Fix: Standardise your setup. Every client property should export to BigQuery using the same project structure: {agency-project}/{client-dataset}/events_*. Use consistent naming. Build a registry of all properties, their export status, and their BigQuery locations.
Cost Management
BigQuery charges for data scanned. A single poorly written query against a year of raw GA4 data for a high-traffic property can cost R500+ in a single execution. Multiply that by analysts running ad-hoc queries, and costs spiral.
Fix: Always use table suffixes to limit date ranges (_TABLE_SUFFIX BETWEEN). Build and query intermediate tables. Set up BigQuery cost controls — custom quotas per user, per project, or per day. Monitor with BigQuery’s INFORMATION_SCHEMA.JOBS view.
Pro tip: Partition your intermediate tables by date and cluster by the fields you filter most often (usually
user_pseudo_idandevent_name). This alone can reduce query costs by 80-90%.
The Multi-Client Challenge
Everything above gets exponentially harder when you’re managing measurement pipelines for multiple clients. Each client has:
- Their own GA4 properties (sometimes multiple)
- Their own GCP projects (or shared projects with complex permissions)
- Their own event naming conventions (or lack thereof)
- Their own dashboarding requirements
- Their own data retention and privacy requirements
The manual approach — logging into each client’s GA4 property, checking their BigQuery export, running their scheduled queries, monitoring their dashboards — doesn’t scale past about 5 clients before it becomes a full-time job.
This is where automation becomes essential, not optional.
How Alethia Handles Multi-Property GA4
Alethia’s GA4 integration connects directly to the GA4 API via OAuth with comprehensive tool coverage. Instead of logging into each property manually, you can:
- Query any connected property from a single interface — “Show me last week’s conversion rate for Client X’s main property”
- Run cross-property comparisons — “Compare organic traffic trends across all my client properties”
- Detect anomalies — “Flag any property where sessions dropped more than 20% week-over-week”
- Audit configurations — “Check which properties have BigQuery export enabled and which don’t”
The point isn’t to replace BigQuery — it’s to give you a fast way to interrogate your GA4 data without context-switching between a dozen browser tabs. When you need the depth that BigQuery provides, the data is there. When you need a quick answer, the agent gets it in seconds.
Building vs Buying
You have three options for your measurement pipeline:
Build it yourself. If you have SQL skills and a GCP account, you can set up the BigQuery export, write your intermediate queries, connect Looker Studio, and manage it manually. This works for a single property. It becomes painful at scale.
Use a managed tool. Platforms like Supermetrics, Funnel.io, or Fivetran can handle the data pipeline and provide pre-built transformations. They add cost ($200-500/month per client) but save engineering time.
Automate with AI. Use an agent that understands your GA4 data, can write and optimise BigQuery queries, monitor pipeline health, and flag issues before they become client-facing problems. This is the approach we’re building at Alethia — not replacing the pipeline, but making it manageable at agency scale.
The right answer depends on your scale. For a single brand, manual setup is fine. For an agency managing 10+ clients, some form of automation is not a luxury — it’s a survival requirement.
Getting Started
If you’re starting from zero, here’s the minimum viable pipeline:
- Enable BigQuery daily export on your GA4 property (it’s free).
- Write three intermediate queries — sessions, conversions, and a user-level rollup.
- Schedule them to run daily after the export lands.
- Connect Looker Studio to your intermediate tables.
- Set up one alert — a simple check that yesterday’s export table exists and has a reasonable row count.
That gets you 80% of the value. Everything else — cross-platform joins, LTV models, custom attribution — is refinement you can add incrementally.
The most important step is the first one. Every day you’re not exporting to BigQuery is a day of event-level data you can never get back.