APIs & Integrations

terrance
Participant

HubSpot Links Crawler 2.0 user agent hitting protected pages

We are getting hit by a crawler with the user agent of "HubSpot Links Crawler 2.0 http://www.hubspot.com/" on a gated page. This is happening on the first page after a user signs into the application - thus if an unauthenticated user visits that page, it will trigger a 401. The crawler is most likely hitting that page because we are making an identify() call to associate an unknown user with an email address. Hubspot hitting us is flooding our error tracking software. How can we turn this off or prevent it from happening?

0 Upvotes
6 Replies 6
terrance
Participant

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais I PM'd you. That data is kind of sensitive.

Derek_Gervais
HubSpot Alumni
HubSpot Alumni

HubSpot Links Crawler 2.0 user agent hitting protected pages

Hi @terrance,

This might be appropriate for Support, but I'm happy to help out here as well. Can you give me the page URL for the restricted page that the crawler is hitting? I'd like to dig in with the team to see if we're perhaps having trouble accessing the robots.txt.

0 Upvotes
terrance
Participant

HubSpot Links Crawler 2.0 user agent hitting protected pages

Derek, has anyone ever looked into this?

0 Upvotes
Derek_Gervais
HubSpot Alumni
HubSpot Alumni

HubSpot Links Crawler 2.0 user agent hitting protected pages

Hi @terrance,

The HubSpot links crawlers respect robots.txt files; are you able to implement a robots.txt file for the page(s) you're referring to?

0 Upvotes
terrance
Participant

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais Yes, they have been implemented since day 1 as the page path is accessing an authenticated account. Here is what is looks like:

# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
User-agent: *
Disallow: /
User-agent: AdsBot-Google
Disallow: /
0 Upvotes
terrance
Participant

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais Is this something better asked to support rather than on the forums?

0 Upvotes