APIs & Integrations

terrance
参加者

HubSpot Links Crawler 2.0 user agent hitting protected pages

We are getting hit by a crawler with the user agent of "HubSpot Links Crawler 2.0 http://www.hubspot.com/" on a gated page. This is happening on the first page after a user signs into the application - thus if an unauthenticated user visits that page, it will trigger a 401. The crawler is most likely hitting that page because we are making an identify() call to associate an unknown user with an email address. Hubspot hitting us is flooding our error tracking software. How can we turn this off or prevent it from happening?

0 いいね!
6件の返信
terrance
参加者

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais I PM'd you. That data is kind of sensitive.

Derek_Gervais
元HubSpot社員
元HubSpot社員

HubSpot Links Crawler 2.0 user agent hitting protected pages

Hi @terrance,

This might be appropriate for Support, but I'm happy to help out here as well. Can you give me the page URL for the restricted page that the crawler is hitting? I'd like to dig in with the team to see if we're perhaps having trouble accessing the robots.txt.

0 いいね!
terrance
参加者

HubSpot Links Crawler 2.0 user agent hitting protected pages

Derek, has anyone ever looked into this?

0 いいね!
Derek_Gervais
元HubSpot社員
元HubSpot社員

HubSpot Links Crawler 2.0 user agent hitting protected pages

Hi @terrance,

The HubSpot links crawlers respect robots.txt files; are you able to implement a robots.txt file for the page(s) you're referring to?

0 いいね!
terrance
参加者

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais Yes, they have been implemented since day 1 as the page path is accessing an authenticated account. Here is what is looks like:

# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
User-agent: *
Disallow: /
User-agent: AdsBot-Google
Disallow: /
0 いいね!
terrance
参加者

HubSpot Links Crawler 2.0 user agent hitting protected pages

@Derek_Gervais Is this something better asked to support rather than on the forums?

0 いいね!