Blog, Website & Page Publishing

hidanya
Teilnehmer/-in

Google Search Console: Sitemap Could Not Be Read

lösung

Hello,

 

I am trying to submit my HubSpot blog and landing page sitemap https://info.magaya.com/sitemap.xml to Google Search Console, and I'm getting the following error:

- Couldn't Fetch

- General HTTP Error

 

Our main site is magaya.com (WordPress) and the blog and landing pages are on the subdomain on HubSpot.

 

We have seen a slight decline in search volume since moving the blog to HubSpot, so I'm trying to get to the bottom of things.

 

Please advse.

 

Thanks!

0 Upvotes
1 Akzeptierte Lösung
JonPayne
Lösung
Autorität | Diamond Partner
Autorität | Diamond Partner

Google Search Console: Sitemap Could Not Be Read

lösung

Hmmm - very odd. Nice job on the 301s 🙂

I've looked at that sitemap from all angles and it's 200OK all the way, so I'm leaning towards something glitchy happening.

 

Before we get to fixing it for Google, if you haven't already set up the main domain and the info sub domain on Bing Webmaster Tools: https://www.bing.com/toolbox/webmaster Submit the sitemaps there and see if Bing also has problems fetchingt the info. subdmain one.

 

You still need to fix the problem for Google but at least if Bing can get it, you know you did a good job and you're not going mad.

 

To begin the fix for Google - this is what I'd do.

 

1. Sign post the new sitemap on the main domain.

Add this line:

Sitemap: https://info.magaya.com/sitemap.xml  

to robots.txt file of your main domain: https://www.magaya.com/robots.txt

 

It's OK to have two sitemaps in one robots.txt and this signpost means that at least if the Google glitch fixes while you're not looking, it'll just crawl the new sitemap in the usual way without you having to manually resubmit.

 

2. Remove odd formatting from subdomain sitemap + sign post again

Save this https://info.magaya.com/robots.txt , somewhere safe, then click to edit it.. follow these insrtuctions if you've not done that in HubSpot before:

https://knowledge.hubspot.com/cos-general/customize-your-robots-txt-file 

 

Delete these lines:

            Disallow: /sample-*
            Disallow: /blog/sample-*

Because the tab might be making Google bot do strange things.

 

Add this line: Sitemap: https://info.magaya.com/sitemap.xml  

 

Check GSC in a couple of days to make sure it's not suddenly crawling loads of sample blogs / sample landing / sample thank you pages. Or better, delete any sample blogs etc , then you don't need those two disallow lines anyway. 

 

Give it a couple of days and resubmit that sitemap if Google hasn't already crawled it.

 

If this doesn't work, hit me up here: jon@noisylittlemonkey.com and I'll get someone more geeky to look at it. IT SHOULD BE EASY! 🙂

Lösung in ursprünglichem Beitrag anzeigen

9 Antworten
BNaskar
Mitglied

Google Search Console: Sitemap Could Not Be Read

lösung

Same Problem Here With https://flizmovies.fun 
When i submit https://flizmovies.fun/sitemap.xml 
Google Says Sitemap Could Not Be Read, How Do I Now???
HELP Pls

0 Upvotes
WJunior
Mitglied

Google Search Console: Sitemap Could Not Be Read

lösung

A few months ago I also had the same issue with my website

I wass constantly getting the following text

“We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.”

I tried many ways (suggested by a man on wordpress) like

. Updated SEOPress 3.3.5

. Improved images and videos description and details.

. Cleared the Cache.

and it worked.

0 Upvotes
CCamacho0
Mitglied

Google Search Console: Sitemap Could Not Be Read

lösung

@WJunior wrote:

A few months ago I also had the same issue with my website

I wass constantly getting the following text

“We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting.”

I tried many ways (suggested by a man on wordpress) like

. Updated SEOPress 3.3.5

. Improved images and videos description and details.

. Cleared the Cache.

and it worked.


can you help me too?

0 Upvotes
hidanya
Teilnehmer/-in

Google Search Console: Sitemap Could Not Be Read

lösung

Thanks so much for the detailed response!

 

To answer your questions,

Question 1 - It's the entire xml sitemap file that Google cannot fetch. 

Question 2 - We 301'd post-by-post.

 

Thanks for the point about the grey-on-grey. I'll look into that for sure!

0 Upvotes
JonPayne
Lösung
Autorität | Diamond Partner
Autorität | Diamond Partner

Google Search Console: Sitemap Could Not Be Read

lösung

Hmmm - very odd. Nice job on the 301s 🙂

I've looked at that sitemap from all angles and it's 200OK all the way, so I'm leaning towards something glitchy happening.

 

Before we get to fixing it for Google, if you haven't already set up the main domain and the info sub domain on Bing Webmaster Tools: https://www.bing.com/toolbox/webmaster Submit the sitemaps there and see if Bing also has problems fetchingt the info. subdmain one.

 

You still need to fix the problem for Google but at least if Bing can get it, you know you did a good job and you're not going mad.

 

To begin the fix for Google - this is what I'd do.

 

1. Sign post the new sitemap on the main domain.

Add this line:

Sitemap: https://info.magaya.com/sitemap.xml  

to robots.txt file of your main domain: https://www.magaya.com/robots.txt

 

It's OK to have two sitemaps in one robots.txt and this signpost means that at least if the Google glitch fixes while you're not looking, it'll just crawl the new sitemap in the usual way without you having to manually resubmit.

 

2. Remove odd formatting from subdomain sitemap + sign post again

Save this https://info.magaya.com/robots.txt , somewhere safe, then click to edit it.. follow these insrtuctions if you've not done that in HubSpot before:

https://knowledge.hubspot.com/cos-general/customize-your-robots-txt-file 

 

Delete these lines:

            Disallow: /sample-*
            Disallow: /blog/sample-*

Because the tab might be making Google bot do strange things.

 

Add this line: Sitemap: https://info.magaya.com/sitemap.xml  

 

Check GSC in a couple of days to make sure it's not suddenly crawling loads of sample blogs / sample landing / sample thank you pages. Or better, delete any sample blogs etc , then you don't need those two disallow lines anyway. 

 

Give it a couple of days and resubmit that sitemap if Google hasn't already crawled it.

 

If this doesn't work, hit me up here: jon@noisylittlemonkey.com and I'll get someone more geeky to look at it. IT SHOULD BE EASY! 🙂

hidanya
Teilnehmer/-in

Google Search Console: Sitemap Could Not Be Read

lösung

Hi Jon,

I'm not sure what happened, but right after I imported the account to Bing webmaster tools, I went back to look at GSC, and it had imported correctly. I didn't change anything on my end other than giving Bing a try. I do have a support ticket in with HubSpot, so maybe they changed something... but they haven't replied yet. In any case, it looks like I'm in the clear. Hoping this helps bring the traffic up at least a bit. I will have my dev look at that footer issue you mentioned. 

Thanks again! I truly appreciate it.

Danya

JonPayne
Autorität | Diamond Partner
Autorität | Diamond Partner

Google Search Console: Sitemap Could Not Be Read

lösung

HA! Love this! It's tempting to think that Bing WMT and GSC are somehow in cahoots... A terrifying prospect indeed!

JonPayne
Autorität | Diamond Partner
Autorität | Diamond Partner

Google Search Console: Sitemap Could Not Be Read

lösung

Hi!

 

It's worrying when you move and lose traffic! I feel your pain 🙂 Let's see if we can get to the bottom of this.

 

Question 1 - your "Couldn't fetch" and "General HTTP Error" in Google Search Console - is this for the xml sitemap file or just certain URLs within it? 

 

Question 2 - When you moved the blog to hubspot - did you redirect the old wordpress urls? If so, how?

 

Often during a migration for client, I'll see a drop in organic search traffic which is usually at most a few weeks, so I'm not suprised there's a drop off - it's just good to make sure you've done everything you can to mitigate this.  I'd make sure you're following Google's migration guidelines: https://developers.google.com/search/docs/advanced/crawling/site-move-with-url-changes 

 

Some other essentials, that I do when things look like they might be going a bit wobbly.

 

Publish a load of content on the new HubSpot blog, easier said than done, I know but it will send healthy signals to Google that the website is behaving as before and still sharing the same, quality advice. If you used to publish a blog once a week before the migration, do three a week for a month. If you used to publish one a month, publish one a week.

 

Go back through your old content and manually change any internal links from the old URL to the new URL. This can be a royal pain in the **bleep** but well worth doing.

 

If you have hefty authortitative links pointing at your old wordpress URL, I'd ask the owner / admin of the linking website to change the link on thier site to the new URL on HubSpot.

 

Let me know the answers to Qs above and I'll be back later today to help more 🙂

JonPayne
Autorität | Diamond Partner
Autorität | Diamond Partner

Google Search Console: Sitemap Could Not Be Read

lösung

And another thing @hidanya - this almost certainly isn't the problem but it's worth addressing as soon as your team has the resource. The footer of the blog page template  (It seems to be this bit: Magaya_May2020_Theme/templates/partials/footer.html ) when JavaScript is turned off in the browser, shows the titles of the footer menu items in grey on grey which to Google may look like hidden text and it REALLY doesn't like hidden text because awful SEOs use it to try and game the system. Particularly in footers! An example of what I'm seeing is attached. To test yourself, using chrome install this https://chrome.google.com/webstore/detail/toggle-javascript/cidlcjdalomndpeagkjpnefhljffbnlo and browse the site with javascript turned off (your main resources page is a bit of a problem here too).hidden.png