Menu

Advanced UTM Attribution in GA4 Using BigQuery

Getting Last-Click Attribution from Google Analytics 4 to Align with Universal Analytics Reporting in BigQuery

Shreya Banker

Data Scientist

Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.

Our client recently encountered significant challenges using the data from Google Analytics 4 (GA4) due to differences in how it attributes user activity to marketing efforts. Initially, they attempted to extract UTM parameters from GA4 data exported to Google BigQuery. However, unlike Universal Analytics (UA), where traffic source data is readily available for each session, GA4’s event-based tracking model led to gaps in capturing UTM parameters, creating inconsistencies in their marketing attribution reports.

Challenges with UTM Attribution in GA4 vs. UA

In Universal Analytics, each session was tied to a single traffic source, and any change in the source triggered a new session. This ensured that every session had reliable UTM information for medium, source, and campaign parameters. However, GA4’s event-based model allows sessions to persist even when the traffic source changes, and the UTM parameters are recorded only at the event level. As a result, if the first event in a session doesn’t capture UTM data, sessions may lack essential source, medium, and campaign details. This discrepancy impacts year-over-year comparisons and the overall interpretation of traffic attribution in BigQuery.

Technical Solution for Missing UTM Data in GA4

To address this issue, Napkyn developed a robust solution leveraging SQL queries in BigQuery to capture and reconstruct missing UTM data:

Event Data Extraction and Aggregation

An initial SQL query extracts key fields like session_id, user_pseudo_id, and session_start, along with available UTM parameters (utm_source, utm_medium, utm_campaign) from the GA4 event stream.

This query isolates sessions where UTM parameters are null, indicating missing traffic source data.

Applying Last Non-Direct Attribution in BigQuery

A second query was implemented using window functions to backfill missing UTM values. By applying the Last Non-Direct Attribution model, we identified the last valid traffic source from prior sessions of the same user and assigned those UTM values to sessions with missing traffic source data.

The solution utilized window functions like LAST_VALUE(), PARTITION BY, and ORDER BY to track traffic source changes across sessions.

Data Processing and Transformation

The query logic was designed to handle various edge cases such as incomplete session data, multi-session users, and long lookback windows (e.g., 30-day or 90-day). We optimized the query’s performance in BigQuery to balance data accuracy with computational efficiency.

Validation and Improved Attribution Reporting

We conducted a thorough validation by comparing the outputs from BigQuery against the client’s GA4 reports. The results showed full alignment, with corrected UTM values now present in sessions where they were initially missing, significantly improving the fidelity of their traffic attribution reports.

Outcomes and Improved Attribution Reporting

The implemented solution delivered a complete and accurate view of the client's traffic sources, correcting previously missing UTM data and enabling reliable marketing attribution analysis in GA4. By leveraging the Last Non-Direct Click attribution model in BigQuery, the client now has a scalable, automated solution that aligns with their existing strategies and facilitates more informed decision-making.

How to Resolve UTM Attribution Discrepancies in GA4 with BigQuery

  1. Extract Event Data: Use SQL to extract sessions and UTM parameters from GA4’s event data stream.

  2. Apply Last Non-Direct Attribution: Use window functions in SQL to backfill missing UTM data based on previous valid traffic sources.

  3. Validate Results: Compare reconstructed data in BigQuery with GA4 reports to ensure alignment and accuracy.

Conclusion

This case illustrates the power of customizing your approach using the raw data available in BigQuery to resolve discrepancies between Universal Analytics and GA4’s data models. The adoption of event-based tracking in GA4 requires tailored solutions to ensure continuity in traffic source attribution, where your business needs it. By implementing custom SQL logic to specify an attribution model, it’s possible to bridge data gaps and drive more accurate marketing insights.

About Napkyn

Napkyn specializes in empowering businesses with tailored analytics solutions, offering expertise in GA4 and BigQuery to transform data into actionable insights. Their custom BigQuery reports are designed to meet unique business needs, providing deep insights into user behavior, campaign performance, and e-commerce trends. By combining advanced analytics with robust reporting, Napkyn helps organizations make data-driven decisions to optimize marketing strategies and drive growth.


Looking to harness the full potential of GA4 and BigQuery for your business? Let Napkyn guide you with custom reporting solutions that deliver real results. Get in touch with us today to start transforming your data into your most valuable business asset!

Advanced UTM Attribution in GA4 Using BigQuery

Getting Last-Click Attribution from Google Analytics 4 to Align with Universal Analytics Reporting in BigQuery

Shreya Banker

Data Scientist

Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.

Our client recently encountered significant challenges using the data from Google Analytics 4 (GA4) due to differences in how it attributes user activity to marketing efforts. Initially, they attempted to extract UTM parameters from GA4 data exported to Google BigQuery. However, unlike Universal Analytics (UA), where traffic source data is readily available for each session, GA4’s event-based tracking model led to gaps in capturing UTM parameters, creating inconsistencies in their marketing attribution reports.

Challenges with UTM Attribution in GA4 vs. UA

In Universal Analytics, each session was tied to a single traffic source, and any change in the source triggered a new session. This ensured that every session had reliable UTM information for medium, source, and campaign parameters. However, GA4’s event-based model allows sessions to persist even when the traffic source changes, and the UTM parameters are recorded only at the event level. As a result, if the first event in a session doesn’t capture UTM data, sessions may lack essential source, medium, and campaign details. This discrepancy impacts year-over-year comparisons and the overall interpretation of traffic attribution in BigQuery.

Technical Solution for Missing UTM Data in GA4

To address this issue, Napkyn developed a robust solution leveraging SQL queries in BigQuery to capture and reconstruct missing UTM data:

Event Data Extraction and Aggregation

An initial SQL query extracts key fields like session_id, user_pseudo_id, and session_start, along with available UTM parameters (utm_source, utm_medium, utm_campaign) from the GA4 event stream.

This query isolates sessions where UTM parameters are null, indicating missing traffic source data.

Applying Last Non-Direct Attribution in BigQuery

A second query was implemented using window functions to backfill missing UTM values. By applying the Last Non-Direct Attribution model, we identified the last valid traffic source from prior sessions of the same user and assigned those UTM values to sessions with missing traffic source data.

The solution utilized window functions like LAST_VALUE(), PARTITION BY, and ORDER BY to track traffic source changes across sessions.

Data Processing and Transformation

The query logic was designed to handle various edge cases such as incomplete session data, multi-session users, and long lookback windows (e.g., 30-day or 90-day). We optimized the query’s performance in BigQuery to balance data accuracy with computational efficiency.

Validation and Improved Attribution Reporting

We conducted a thorough validation by comparing the outputs from BigQuery against the client’s GA4 reports. The results showed full alignment, with corrected UTM values now present in sessions where they were initially missing, significantly improving the fidelity of their traffic attribution reports.

Outcomes and Improved Attribution Reporting

The implemented solution delivered a complete and accurate view of the client's traffic sources, correcting previously missing UTM data and enabling reliable marketing attribution analysis in GA4. By leveraging the Last Non-Direct Click attribution model in BigQuery, the client now has a scalable, automated solution that aligns with their existing strategies and facilitates more informed decision-making.

How to Resolve UTM Attribution Discrepancies in GA4 with BigQuery

  1. Extract Event Data: Use SQL to extract sessions and UTM parameters from GA4’s event data stream.

  2. Apply Last Non-Direct Attribution: Use window functions in SQL to backfill missing UTM data based on previous valid traffic sources.

  3. Validate Results: Compare reconstructed data in BigQuery with GA4 reports to ensure alignment and accuracy.

Conclusion

This case illustrates the power of customizing your approach using the raw data available in BigQuery to resolve discrepancies between Universal Analytics and GA4’s data models. The adoption of event-based tracking in GA4 requires tailored solutions to ensure continuity in traffic source attribution, where your business needs it. By implementing custom SQL logic to specify an attribution model, it’s possible to bridge data gaps and drive more accurate marketing insights.

About Napkyn

Napkyn specializes in empowering businesses with tailored analytics solutions, offering expertise in GA4 and BigQuery to transform data into actionable insights. Their custom BigQuery reports are designed to meet unique business needs, providing deep insights into user behavior, campaign performance, and e-commerce trends. By combining advanced analytics with robust reporting, Napkyn helps organizations make data-driven decisions to optimize marketing strategies and drive growth.


Looking to harness the full potential of GA4 and BigQuery for your business? Let Napkyn guide you with custom reporting solutions that deliver real results. Get in touch with us today to start transforming your data into your most valuable business asset!

Sign Up For Our Newsletter

Napkyn Inc.
204-78 George Street, Ottawa, Ontario, K1N 5W1, Canada

Napkyn US
6 East 32nd Street, 9th Floor, New York, NY 10016, USA

212-247-0800 | info@napkyn.com