Every company has some documents which are gated and need to be excluded from search engine crawlers. There is a convuloted solution at the moment which is to add everysingle URL to the robot.txt file.
It would be great to have a checkbox for each document to specifiy if we want to disallow crwaling for the document.
Weitere Ideen mit folgender Beschriftung anzeigen:
You're right, this is more difficult than it really should be. In reality, it's pretty simple to set up – you just need to set up two directories.
For files:
The trick is to create a specific folder within the file system in which all contents are not indexed, then add that folder to the robots.txt directory with a wildcard (asterisk "*"). Any file that lives in that folder will be hidden from search.
For pages, same idea:
Name a given directory you would like to disable search on by using the robots.txt file (it can be named whatever you like on your hubspot domain.) To hide a page, you have to custom-set the URL of any landing page to be in that directory so that it will not show up on search engines.
If this is too complex, the hubspot people should be able to guide you through it.
Let's say I'm not suing HS for my website but I am using HS to house my files. Then, my files live under cdn.hubspot.com. In this case, how can I set up a robots.txt file to unindex specific files? Or can't I? @aarongarcia1
Sie müssen ein registrierter Benutzer sein, um hier einen Kommentar hinzuzufügen. Wenn Sie sich bereits registriert haben, melden Sie sich bitte an. Wenn Sie sich noch nicht registriert haben, führen Sie bitte eine Registrierung durch und melden Sie sich an.