HubSpot original source is sometimes incorrectly attributed to SSO provider
Our web application uses SSO (with choices including Google sign-in and Github sign-in) for new user sign-up, and uses the HubSpot Forms API to create a new contact in HubSpot when a user signs up for our app. Sometimes (but not always), the original source on the HubSpot contact is incorrectly attributed to the SSO provider, eg, a referral from accounts.google.com.
More details on the standard flow we expect:
A user visits www.oursite.com, which is running the HubSpot tracking code
They click a link to sign up, which happens on the subdomain app.oursite.com. This page also runs the HubSpot tracking code, and per the docs, we do not need to enable cross-domain tracking to track users across multiple subdomains. I have confirmed that the hubspotutk cookie value is the same on both the www and app subdomains.
This page on app.oursite.com immediately redirects the user to an Auth0-controlled flow in which they can select Google, Github, or several other methods for authenticating.
After authenticating, the SSO provider (eg, accounts.google.com) will redirect back to app.oursite.com where the user can finish the sign-up flow. At the very end of the flow when an account is created in our database, we also submit a hidden form via the HubSpot API in order to create a contact.
Looking at the contacts actually created through this flow, many of them have their original source set correctly, to values such as "Direct traffic," "Organic Social," and "Organic search."
Let's pick as one example a contact whose original source is "Organic Social" and the drill-down 1 is "Reddit." Based on the docs, this value either came from utm_medium or from the referrer. We log UTM parameters, and our logging shows that no UTM parameters were set for this user. Therefore, I must assume that HubSpot correctly identified that the referrer was reddit.com. This agrees with our own logging, which shows that the path this user took was reddit.com -> www -> app -> sign up -> back to app. This seems correct to me.
We also have several contacts created through this flow whose original source is "Direct traffic" with a drill-down indicating that the referrer was our own site (in some cases a www page, in some cases a page on the app subdomain), and contacts whose source is "Referrals" with a drill-down indicating that the referrer is a third party site. This also seems correct to me, and shows that HubSpot is sometimes using the referrer value from steps 1-3 when calculating original source.
However, there are additionally several contacts created through this flow whose original source is "Referrals" with a drill-down of "accounts.google.com." There is a part in our flow where the user (if they SSO with Google) will have a referrer of accounts.google.com; this is in step 4 of the flow I described above. What I do not understand is why sometimes HubSpot ignores the Google SSO referrer and categorizes contacts as "Direct traffic," and other times uses this referrer to categorize contacts as "Referrals."
How does HubSpot decide which referrer to use? There are 3 referrers in this flow (the referrer for when the user first visits www in step 1, the referrer for when the user first visits our app subdomain in step 2, and the referrer when the user finishes authenticating in step 4), and HubSpot seems to use each of these 3 somewhat randomly/indiscriminately. Why is this inconsistent, and is there a way to control which of these referrers we use?
Alternatively, is there a way to potentially exclude accounts.google.com as a valid referrer? (Exclude traffic from IP addresses or referrers comes close to doing this, but won't work for us because we do still want form submissions originating from this domain to be recorded, we just want them recorded with a different referrer value.)
HubSpot original source is sometimes incorrectly attributed to SSO provider
i am seeing a lot of missing Original Sources that are attributed to Zapier (Zapier transfers data to Hubspot AFTER the user lands on the website) instead of the actual and true Original Source. Could this be the reason? Instead of seeing Direct Traffic, Paid Search, etc, i am seeing INTEGRATION (Zapier). Is this because the cookies are not captured and as a result, Hubspot recognises subsequent Contact creation by Zapier as Original Source?
HubSpot original source is sometimes incorrectly attributed to SSO provider
For these users with incorrect source information, can you view their activity logs and see what their first page view are? Assuming how you state your flow, they would have some page views prior to the pages seen after authenticating. Is that true.
If /page-a is pre auth and /page-b is post auth, I would expect to see page views for /page-a on all contacts if they have /page-b. Do these incorrect contacts only have /page-b? HubSpot should be using the earliest session tied to the users cookie. It will also go back and update the original source if a older cookie gets tied to the contact. Is it possible the user is visiting /page-b on a differet device? Is there some other way they might be directed to the auth flow?
HubSpot original source is sometimes incorrectly attributed to SSO provider
Thanks for suggesting that I view the activity logs. I should have thought of this earlier 🙂
The contacts who are incorrectly attributed to the SSO provider have no activity before the form submission. The contacts who are correctly attributed do have activity (page views) before the form submission. At the time the form is submitted, the user's current referrer is the SSO provider, so if that's the first event being tracked, it makes sense that HubSpot is attributing the visit to the SSO provider.
This means we now have a slightly different (and hopefully simpler) problem to solve: why are the page views in the first few steps of the signup flow (before the form submission) only sometimes getting tracked?
Our intent was to track all of the pages in this flow. To give a little more context on how we've set this up (in case we're doing something incorrectly), we've added HubSpot tracking code via Google Tag Manager, specifically, this tag from the community template gallery. The tag is triggered on all page views, plus (because we're a single page app) all history changes. We are not explicitly calling any of the API methods from the tracking API. In particular, we never call identify (the Forms API submission implicitly identifies the user), and we do not call setPath nor trackPageView (the fact that we're seeing page views correctly being tracked for the majority of our contacts suggests we don't have to call these methods explicitly).
Is there anything that sounds off about this setup?
I also have a couple questions (whose answers I cannot find in HubSpot documentation) that would help me debug on our end:
Have you seen issues like this before from adblocker/content blocker extensions? When I enable uBlock Origin locally (as one example), the tag that's supposed to inject the HubSpot tracking code does not run, and even if it did run (I tested this part by disabling my content blocker while the page loaded, then enabled it again), the requests to track.hubspot.com are blocked. What's the expected behavior around attribution if a user with a content blocker submits a HubSpot form via the Forms API? NOTE: While content blockers may be part of the problem, they are definitely not the only culprit. I have been able to reproduce this issue (no page views or activities logged before the form submission, such that the contact is attributed to the SSO provider) even when all content blockers are disabled.
Should we be calling setPath and trackPageView manually?
From what I can tell (by inspecting network requests), I believe that the data about a user's page views is not sent to HubSpot until the form submission occurs. Is this correct? If so, where is the activity stored before it's sent to HubSpot's servers? Is there somewhere (eg, a cookie or localstorage) that I can inspect to better understand why activity is being dropped when I reproduce this issue locally?
HubSpot original source is sometimes incorrectly attributed to SSO provider
Lots to unpack.
Should we be calling setPath and trackPageView manually?
Depends on your setup. This would only be needed if you had like a SPA. If users are moving between URLs in a normal website, pageviews are tracked automatically and you can see those page views in the contact's activity log. The reason they are tracked automatically in this case is a trackPageView is sent when the tracking code loads. For a SPA, you would need to call setPath (so you know the page the user is on) and then trackPageView to tell HubSpot the user is on a new page.
When is page view data sent to HubSpot?
If you inspect your network requests from the browser you should see some calls to track.hubspot.com. I think all of these (but possibly not all of them) are the requests from the _hsq.push() method which is the tracking code. You can test this by opening your browser consolse and pasting in _hsq.push(['trackPageView']); and then looking at the new network request that shows in the logs there.
That being said, technically those tracking items are tied to the browser cookie, not a contact specifically. Once HubSpot can tie a cookie to a contact, then the tracking data will appear on a contact. For example contact visits site and browses over several sessions. All of that is tracked against that cookie. User days later submits a form from that same browser, ALL the tracking data is added to the contact and that contact is associated to that browser cookie so future traffic is logged correctly. If the contact later users a different browser/device (new cookie) none of that traffic will get tracked against the contact until they fill out a form (with the same email) or visit a page from an email link (which ties the two together).
There should be a js file loaded in your browser network logged that looks like https://js.hs-analytics.net/analytics/1670419500000/[your-portal-id].js that if you look at will show what is tracked by default when the tracking code loads, and there you can see the track page view is logged.
What gets blocked from an Ad Blocker?
I wouldn't think an ad blocker would interfere with this. I had AdBlockPlus and it doesn't prevent any of the tracking I'm talking about in my tests. I got 0 items blocked on my page with the tracking code installed.
Why are the first pages in the signup flow only sometimes being tracked?
Can you describe your signup flow. I'll describe a common one that might break things and sort of sounds like yours. User visits website from their desktop. User registers (not using a HubSpot form so nothing is created in HubSpot). User receives an email to confirm email or registration. User opens email on phone and clicks link to visit the auth page. User completes registration and some data is sent to HubSpot. In this case, depending on a number of factors, the initial desktop visit isn't tied to the contact that was created by HubSpot until the desktop cookie is tied to that contact via another form submission or an email click.
If that, or something similar aligns with your setup, it feels you might want to use the identify API call or the form submission API (including cookie) to create the contact during an early registration step so you can tie that early traffic to that email in case the user changes devices for the later steps. Again assuming there is like an email confirmation in between. Maybe those third party auto providers send them an email. I don't know, but that is an example scenario, changing devices, that might screw things up.
If it all happens in a single session on a single device, I wouldn't expect to see what you are seeing, but I'd need to inspect the whole flow to really know for sure.
HubSpot original source is sometimes incorrectly attributed to SSO provider
Thanks for the thorough response. I'm just going to respond to a couple of the points above:
The js file you suggested that I look at is minified, so there is no easy way for me to read that and undersatnd what it is tracking by default. I think I got enough out of your above answer despite this.
The issue we're seeing is definitely not due to users switching between multiple devices (eg, a phone and a computer). In all cases I've investigated, the signup flow takes place entirely in a single browser on a single computer, and the user never leaves the flow to check their email inbox. That's a good guess, but not what's affecting us.
I am now 100% convinced that adblockers/content blockers are responsible for a good chunk of our contacts affected by this bug (though still not all of them). A few notes:
Every adblock extension is different. Thanks for testing one, but that doesn't mean none of them can cause issues. If you try testing again with uBlock Origin (another extremely popular blocker), what do you see?
Some adblockers only block ads. Others are general-purpose content blockers that also block tracking pixels, tracking javascript, and other ways users can be tracked across the web. It's specifically adblockers with a content blocker filter list that I think are at fault. (Again, uBlock Origin with its default settings is a content blocker.)
hs-analytics.net is on one of the filter lists used by uBlock Origin (see here). As one easy demonstration of the fact that this interferes with Hubspot tracking, if I load the main Hubspot marketing page (https://www.hubspot.com/), I see a javascript file is fetched from https://js.hs-analytics.net/analytics/1670904300000/53.js. If I enable uBlock Origin and load the same page, the javascript file is now blocked.
Separately from our Hubspot tracking, we have other types of logging and tracking that also get blocked when a user has a content blocker installed. By looking at our other logs for the same set of users, I am able to determine with high confidence whether a given user of our site has a content blocker enabled.
For users who have a content blocker enabled (based on my correlating with other logs), when I view their form submission in Hubspot, I see a yellow warning in the submission details: "No cookie was found for this submission. Learn more." Additionally, I see no page views before the form was submitted, and the original source in Hubspot is incorrectly attributed to the SSO provider.
Putting all this together, my strong hypothesis is that, for users with a content blocker installed, the Hubspot tracking javascript is blocked, therefore no hubspotutk cookie is set. However, the content blocker does not block the Hubspot form submission. Therefore, a contact is created but with none of the page views, history, or attribution that should be tied to it. In this case, the referrer at the time of form submission is used, instead of the referrer at the time the user first visited our site (and ran the Hubspot tracking script). Due to our signup flow redirecting to an SSO provider and then back to our site before the Hubspot form is submitted, this referrer is our SSO provider.
Again, this doesn't explain 100% of the bad data we're seeing, but it explains most, and I think would be a good problem to tackle first. Once we solve this, we can see what's remaining and tackle that next.
So I will ask again: What's the expected behavior around attribution if a user with a content blocker submits a HubSpot form via the Forms API?
And is there a way to override the referrer that's sent to Hubspot, or to exclude certain referrers (without excluding the traffic referred by them), so that the SSO provider is not incorrectly used for attribution in these cases?
I think if you add the domain of your SSO provider to your site domains, it will stop counting them as referrers. This is found under Settings -> Trackin Code -> Advanced Tracking. Without the tracking code though, I think all of this will come through as Direct Traffic. I'm unsure though, but it's worth quickly checking.
HubSpot original source is sometimes incorrectly attributed to SSO provider
Thanks, adding the additional site domain works for now. (As you expected, these contacts are now classified as Direct Traffic instead of a referral from our SSO provider.)
It's not ideal that we're losing attribution information from previous page views -- both 100% of the time when uBlock Origin is installed, and occasionally even when it's not -- but this is great incremental progress.
HubSpot original source is sometimes incorrectly attributed to SSO provider
Well you could possibly get around that, but before that I'd probably try to determine if it is worth it or not. I mean, blockers will block so it's sort of an arms race there and one could argue, if people want their privacy or to block cookies, that's their perogotive and should be respected.
One thing you could test is to see if passing in the page URI with utm parameters in the form submission changes the source of the contact. If it does, you could store the real source when the visitor initially lands on your site. It would need to be stored in a session cookie (client side) or a session variable (server side). Later when the user submits the form, if they don't have a HubSpot cookie you could append utm parameters on the page URI with what you know about their initial page view.
All untested, but I feel like it could work. Again, if that is worth it is up to you. If it's a small-ish subset of users, and the above method works, you could also tag them with some other campaign/utm/etc so you could at least segment them all to some group and they don't skew your real direct traffic. Your trends would all still be good, just exact numbers off, and trends are generally more important anyway.
HubSpot original source is sometimes incorrectly attributed to SSO provider
We already manually pass along UTM parameters by re-injecting them into the page URL. As a result, Hubspot correctly attributes the original source of any visitor with UTM tags, even if the tracking code itself fails. This SSO provider referrer bug is only affecting our non-UTM tagged traffic. However, due to this bug, we are losing attribution for these visitors about the actual/true referrer that took a visitor to our app.
HubSpot original source is sometimes incorrectly attributed to SSO provider
FWIW, I just set utm_referrer to https://paypal.com and on my record it came through like so:
So if you can capture the referral information on your own, you could pass it via a UTM and get that actual referral traffic data. This data should be available in Javascript via document.referrer if you are storing this data client side or in most server side web languages.
HubSpot original source is sometimes incorrectly attributed to SSO provider
Hi, @jhurwitz👋 Thank you for including all those details. Hey, @alyssamwilie@Anton@LeeBartelme, do you have any experience or thoughts you can share about the challenges @jhurwitz is facing? Thank you! — Jaycee
HubSpot’s AI-powered customer agent resolves up to 50% of customer queries instantly, with some customers reaching up to 90% resolution rates. Learn More.
Did you know that the Community is available in other languages? Join regional conversations by changing your language settings !