Advanced UTM Attribution in GA4 Using BigQuery
Getting Last-Click Attribution from Google Analytics 4 to Align with Universal Analytics Reporting in BigQuery
Shreya Banker
Data Scientist
Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.
Our client recently encountered significant challenges using the data from Google Analytics 4 (GA4) due to differences in how it attributes user activity to marketing efforts. Initially, they attempted to extract UTM parameters from GA4 data exported to Google BigQuery. However, unlike Universal Analytics (UA), where traffic source data is readily available for each session, GA4’s event-based tracking model led to gaps in capturing UTM parameters, creating inconsistencies in their marketing attribution reports.
Challenges with UTM Attribution in GA4 vs. UA
In Universal Analytics, each session was tied to a single traffic source, and any change in the source triggered a new session. This ensured that every session had reliable UTM information for medium, source, and campaign parameters. However, GA4’s event-based model allows sessions to persist even when the traffic source changes, and the UTM parameters are recorded only at the event level. As a result, if the first event in a session doesn’t capture UTM data, sessions may lack essential source, medium, and campaign details. This discrepancy impacts year-over-year comparisons and the overall interpretation of traffic attribution in BigQuery.
Technical Solution for Missing UTM Data in GA4
To address this issue, Napkyn developed a robust solution leveraging SQL queries in BigQuery to capture and reconstruct missing UTM data:
Event Data Extraction and Aggregation
An initial SQL query extracts key fields like session_id, user_pseudo_id, and session_start, along with available UTM parameters (utm_source, utm_medium, utm_campaign) from the GA4 event stream.
This query isolates sessions where UTM parameters are null, indicating missing traffic source data.
Applying Last Non-Direct Attribution in BigQuery
A second query was implemented using window functions to backfill missing UTM values. By applying the Last Non-Direct Attribution model, we identified the last valid traffic source from prior sessions of the same user and assigned those UTM values to sessions with missing traffic source data.
The solution utilized window functions like LAST_VALUE()
, PARTITION BY
, and ORDER BY
to track traffic source changes across sessions.
Data Processing and Transformation
The query logic was designed to handle various edge cases such as incomplete session data, multi-session users, and long lookback windows (e.g., 30-day or 90-day). We optimized the query’s performance in BigQuery to balance data accuracy with computational efficiency.
Validation and Improved Attribution Reporting
We conducted a thorough validation by comparing the outputs from BigQuery against the client’s GA4 reports. The results showed full alignment, with corrected UTM values now present in sessions where they were initially missing, significantly improving the fidelity of their traffic attribution reports.
Outcomes and Improved Attribution Reporting
The implemented solution delivered a complete and accurate view of the client's traffic sources, correcting previously missing UTM data and enabling reliable marketing attribution analysis in GA4. By leveraging the Last Non-Direct Click attribution model in BigQuery, the client now has a scalable, automated solution that aligns with their existing strategies and facilitates more informed decision-making.
How to Resolve UTM Attribution Discrepancies in GA4 with BigQuery
Extract Event Data: Use SQL to extract sessions and UTM parameters from GA4’s event data stream.
Apply Last Non-Direct Attribution: Use window functions in SQL to backfill missing UTM data based on previous valid traffic sources.
Validate Results: Compare reconstructed data in BigQuery with GA4 reports to ensure alignment and accuracy.
Conclusion
This case illustrates the power of customizing your approach using the raw data available in BigQuery to resolve discrepancies between Universal Analytics and GA4’s data models. The adoption of event-based tracking in GA4 requires tailored solutions to ensure continuity in traffic source attribution, where your business needs it. By implementing custom SQL logic to specify an attribution model, it’s possible to bridge data gaps and drive more accurate marketing insights.
About Napkyn
Napkyn specializes in empowering businesses with tailored analytics solutions, offering expertise in GA4 and BigQuery to transform data into actionable insights. Their custom BigQuery reports are designed to meet unique business needs, providing deep insights into user behavior, campaign performance, and e-commerce trends. By combining advanced analytics with robust reporting, Napkyn helps organizations make data-driven decisions to optimize marketing strategies and drive growth.
Looking to harness the full potential of GA4 and BigQuery for your business? Let Napkyn guide you with custom reporting solutions that deliver real results. Get in touch with us today to start transforming your data into your most valuable business asset!
More Insights
Google Tag Manager Tag Diagnostics: Troubleshooting and Optimization Guide
Ricardo Cristofolini
Senior Implementation Specialist, Data Solutions
Dec 11, 2024
Read More
November 2024 GA4 & GMP Updates
Napkyn
Dec 4, 2024
Read More
Server-Side Google Tag Manager: Multi-Region vs. Single-Region Deployment
Ketul Dave
Implementation Specialist
Nov 27, 2024
Read More
More Insights
Sign Up For Our Newsletter
Napkyn Inc.
204-78 George Street, Ottawa, Ontario, K1N 5W1, Canada
Napkyn US
6 East 32nd Street, 9th Floor, New York, NY 10016, USA
212-247-0800 | info@napkyn.com
Advanced UTM Attribution in GA4 Using BigQuery
Getting Last-Click Attribution from Google Analytics 4 to Align with Universal Analytics Reporting in BigQuery
Shreya Banker
Data Scientist
Data Analyst enthusiast. More than 7 years of exposure in Data Analysis and Software programming. I am a highly motivated, versatile IT professional with experience in Data Analysis, Visualization and Database Management. I look for the hardest problem to solve and where I can learn and develop the most. I love a challenge and never run from a difficult task. I'm determined to succeed, and I look forward to what life has to offer.
Our client recently encountered significant challenges using the data from Google Analytics 4 (GA4) due to differences in how it attributes user activity to marketing efforts. Initially, they attempted to extract UTM parameters from GA4 data exported to Google BigQuery. However, unlike Universal Analytics (UA), where traffic source data is readily available for each session, GA4’s event-based tracking model led to gaps in capturing UTM parameters, creating inconsistencies in their marketing attribution reports.
Challenges with UTM Attribution in GA4 vs. UA
In Universal Analytics, each session was tied to a single traffic source, and any change in the source triggered a new session. This ensured that every session had reliable UTM information for medium, source, and campaign parameters. However, GA4’s event-based model allows sessions to persist even when the traffic source changes, and the UTM parameters are recorded only at the event level. As a result, if the first event in a session doesn’t capture UTM data, sessions may lack essential source, medium, and campaign details. This discrepancy impacts year-over-year comparisons and the overall interpretation of traffic attribution in BigQuery.
Technical Solution for Missing UTM Data in GA4
To address this issue, Napkyn developed a robust solution leveraging SQL queries in BigQuery to capture and reconstruct missing UTM data:
Event Data Extraction and Aggregation
An initial SQL query extracts key fields like session_id, user_pseudo_id, and session_start, along with available UTM parameters (utm_source, utm_medium, utm_campaign) from the GA4 event stream.
This query isolates sessions where UTM parameters are null, indicating missing traffic source data.
Applying Last Non-Direct Attribution in BigQuery
A second query was implemented using window functions to backfill missing UTM values. By applying the Last Non-Direct Attribution model, we identified the last valid traffic source from prior sessions of the same user and assigned those UTM values to sessions with missing traffic source data.
The solution utilized window functions like LAST_VALUE()
, PARTITION BY
, and ORDER BY
to track traffic source changes across sessions.
Data Processing and Transformation
The query logic was designed to handle various edge cases such as incomplete session data, multi-session users, and long lookback windows (e.g., 30-day or 90-day). We optimized the query’s performance in BigQuery to balance data accuracy with computational efficiency.
Validation and Improved Attribution Reporting
We conducted a thorough validation by comparing the outputs from BigQuery against the client’s GA4 reports. The results showed full alignment, with corrected UTM values now present in sessions where they were initially missing, significantly improving the fidelity of their traffic attribution reports.
Outcomes and Improved Attribution Reporting
The implemented solution delivered a complete and accurate view of the client's traffic sources, correcting previously missing UTM data and enabling reliable marketing attribution analysis in GA4. By leveraging the Last Non-Direct Click attribution model in BigQuery, the client now has a scalable, automated solution that aligns with their existing strategies and facilitates more informed decision-making.
How to Resolve UTM Attribution Discrepancies in GA4 with BigQuery
Extract Event Data: Use SQL to extract sessions and UTM parameters from GA4’s event data stream.
Apply Last Non-Direct Attribution: Use window functions in SQL to backfill missing UTM data based on previous valid traffic sources.
Validate Results: Compare reconstructed data in BigQuery with GA4 reports to ensure alignment and accuracy.
Conclusion
This case illustrates the power of customizing your approach using the raw data available in BigQuery to resolve discrepancies between Universal Analytics and GA4’s data models. The adoption of event-based tracking in GA4 requires tailored solutions to ensure continuity in traffic source attribution, where your business needs it. By implementing custom SQL logic to specify an attribution model, it’s possible to bridge data gaps and drive more accurate marketing insights.
About Napkyn
Napkyn specializes in empowering businesses with tailored analytics solutions, offering expertise in GA4 and BigQuery to transform data into actionable insights. Their custom BigQuery reports are designed to meet unique business needs, providing deep insights into user behavior, campaign performance, and e-commerce trends. By combining advanced analytics with robust reporting, Napkyn helps organizations make data-driven decisions to optimize marketing strategies and drive growth.
Looking to harness the full potential of GA4 and BigQuery for your business? Let Napkyn guide you with custom reporting solutions that deliver real results. Get in touch with us today to start transforming your data into your most valuable business asset!
More Insights
Google Tag Manager Tag Diagnostics: Troubleshooting and Optimization Guide
Ricardo Cristofolini
Senior Implementation Specialist, Data Solutions
Dec 11, 2024
Read More
November 2024 GA4 & GMP Updates
Napkyn
Dec 4, 2024
Read More
Server-Side Google Tag Manager: Multi-Region vs. Single-Region Deployment
Ketul Dave
Implementation Specialist
Nov 27, 2024
Read More
More Insights