Content Strategy & SEO

breichenbach
HubSpot Product Team
HubSpot Product Team

Parameterization and non-indexed URLs for HubSpot Content

If you’re an SEO just looking for a list of parameters that you can work with within Search Console to improve your parameter handling, jump down to the “Documented Parameters” section. If you’re a marketer seeing odd URLs in your Google Search Console account and wonder if you should address them, then start here with this first section. 

 

Parameterization– What is it and What do These URLs look Like? 

In their most basic form, parameters represent data points that interact with or define page content in a specific way. 

For example, one of the most common examples would be a webpage where you sold shirts. Let’s say that you have www.clothingstore.com/shirt. On this page, you have the main listing of a shirt that you like. But what happens if there are size, fabric, and color variations? As you toggle various selectors off and on, your URL may start to look like: 

In each of these cases, you’re on the same page, but the parameters define specific characteristics of the page that you’re on. They’re separated from the base URL by a question mark, and then they follow a structure of FIELD=VALUE, separated by “&” if you have more than one. 

These parameters often work alongside Javascript and other frontend frameworks to allow content on the page to be dynamic, or to pass information from one page to another. For example, sometimes parameters are used to pass information into a form. Other times, parameters capture information about your campaigns, referral sources, or other points of data that you want to be able to capture. 

Pretty straightforward, right? At this point, you may be thinking that this is a non-issue and could be solved by setting up canonical URLs. You’d be kind of right. 

 

When Can This Be a Problem?

The problem that we run into is that not all parameterization is as straightforward as the size of a shirt or the referral source of a tracking URL you created for a marketing campaign. As bots (crawlers) explore content, they’re going to interact with on-page assets differently than a human user would. This opens a bit of a can of worms. Currently, many of the assets that we load client-side rather than server-side have the potential to generate numerous parameter variants, as do the usual suspects that you would anticipate, such as filters and other dynamic elements that you may have implemented and intentionally accounted for. 

As a result, quite a few users may see that they have a significant number of pages flagged in Google Search Console as having an “Alternate page with proper canonical tag” or “Discovered- not indexed.” Within these categories, you may be seeing your page URLs with extensive parameter variants. Some are going to be pretty straightforward, such as www.example.com?hs_lang=de (which would suggest that Googlebot hit a language-specific redirect to the German variant of www.example.com) but others may be a bit more obtuse, such as https://www.example.com.com/articles/post_slug?utm_source=hs_email&utm_medium=email&_hsenc=asldkfjas...

With these more complex URLs, you’re probably wondering what they actually mean– why would Google find an email link on your site (utm_source=hs_email) and what the heck does “_hsenc” mean (it’s a tracking parameter for link clicks)? 

This is where clicking, following, and parameterization get dicey and can cause your Googlebot and other crawlers to inefficiently crawl your content. Many URLs like these are going to be dynamically generated based upon session and user agent data, and are tied to client-side rendering elements. 

 

So what do you need to do with these parameters?

I’m going to give you the most cliche answer an SEO can give you: it depends. 

To make a recommendation that is meaningful, I would have to spend a decent amount of time in the weeds with your site and your marketing goals to get a clearer understanding of what needs to be done, what could be done, and what probably shouldn’t be prioritized. 

So, what I will say is that if you do have someone who is in the weeds with your site and your SEO, they’re probably the most qualified person to make a recommendation about what to do moving forward. For example, if you have a smaller site and see a few parameters causing the most URL variants, you could probably just adjust parameter handling for those few parameters or use your robots.txt file to disallow crawling to those parameters. If you have a bigger site where this is causing an extensive issue, then working with a dedicated SEO/ project manager to tackle the issue would be your best bet. 

Moving forward

At this time, there are no immediate changes on HubSpot’s end on the horizon. However, we are investigating this issue and looking into specific examples, and we will evaluate possible, scalable solutions moving forward. 

Documented Parameters

If you need a list of parameters that we have identified, please refer to the spreadsheet linked here. Please make a copy for yourself to work with. Note that this list is not exhaustive, and there may be functionality on your site that is not accounted for here. 

https://docs.google.com/spreadsheets/d/1x_8Y_u5NfDr2LhC_7s-7PvBVcAuoEujpwjfgOycmFf4/edit?usp=sharing

1 Reply 1
TiphaineCuisset
Community Manager
Community Manager

Parameterization and non-indexed URLs for HubSpot Content

Thank you for sharing @breichenbach !


Saviez vous que la Communauté est disponible en français?
Rejoignez les discussions francophones en changeant votre langue dans les paramètres !

Did you know that the Community is available in other languages?
Join regional conversations by changing your language settings !


0 Upvotes