Creating an XML sitemap is one of the most fundamental tasks of any SEO strategy.
It is also one of those aspects that often gets overlooked, particularly when it comes to using sitemaps to solve technical SEO issues.
In this guide, you’ll learn everything you need to know about sitemaps, from the foundational basics every SEO should know to the process of creating and submitting sitemaps to Google, as well as using sitemaps to further your site’s search visibility.
A sitemap is nothing else but a simple XML file containing a list of your website’s most important content (including video and images).
Notice that I didn’t say ALL the content. That’s because contrary to a common belief, a sitemap does not need to list every single page you have on the site. It should, however, include information about any content that you want to show up in the search results.
This is an important distinction to remember. Following this rule will help you create a sitemap that works for your SEO benefit, rather than against it.
The simplest answer is because Google uses it to discover and access the most important pages on the site.
Search engines, typically, find and crawl pages on the site following internal links.
Google’s Gary Illyes has confirmed that to be true somewhat recently in a tweet.
However, not all pages will be interlinked well enough for the Googlebot to find them.
Many landing pages, for instance, may exist as independent entities, not interlinked with any other content.
Other pages might be buried too deep within the site’s architecture for the bot to reach them within the available crawl budget.
A sitemap gives you the ability to tell Google which pages to access and prioritize in the crawl.
But that will work only if you follow the rule above.
Listing all pages, including those you don’t need to see in SERPs (and most likely have blocked from being indexed in the Robots.txt file anyway) only clutters the roadmap.
Keeping the sitemap clean, listing only the assets you want to rank will make it much easier for Google to use.
Recommended Reading: The Best SEO Audit Checklist to Boost Search Visibility and Rankings
For one, a sitemap will help you monitor the most important content on the site.
Sitemaps listing pages on specific sections of the site only, like product pages, for example, let you run highly-targeted site audits and uncover problems at a granular level.
You can use sitemaps to evaluate the health of specific content types or sections of the site, and spot issues that you’d otherwise miss, if reviewed in a full site crawl.
An example of a crawler set up to audit a specific sitemap only.
Sitemap uses an XML format to include the data. A typical sitemap file looks like this:
Note that, in spite of all the code, you can clearly see specific page URLs and associated data, like when the page was last modified.
The <url> section contains all that information.
Sitemaps have some limitations, though:
An important note about the sitemap’s size limits.
As you’ve seen from the limitations above, the sitemap cannot contain more than 50 thousand URLs. Although this is enough for a typical site, it’s far too little for a typical enterprise website.
What to do, then?
As there is no restriction as to how many sitemaps you can create, the simplest and most practical solution is to have more than a single sitemap. It is also one of the most effective ways to monitor individual site sections, as you’ve seen above too.
Recommended Reading: How to Audit Localized Versions of Your Page
When creating multiple sitemaps, make sure to also create a sitemap index - A single file listing all the sitemaps on the site.
You can submit just the sitemap index file to Google, and the search engine will use it to access all the individual sitemaps you’ve created.
The process will largely depend on the technology used on your site. In most cases, however, you can create a sitemap directly from your Content Management System.
Most CMS’ like Wordpress or Joomla offer dedicated plugins for creating and updating the sitemap every time a page is created or updated.
(XML Sitemaps option in a WordPress plugin)
More advanced plugins will also automatically create individual sitemaps for each website section.
If your CMS doesn’t offer that functionality, you can use a crawler to create the sitemap as well.
A crawler collects all the information about pages on the site as it goes through them. That information is stored in a database, and the crawler can output it in the sitemap’s XML format.
Here’s how the option looks like in seoClarity’s crawler, Clarity Audits:
With Clarity Audits, you can crawl the entire site and build a sitemap from that information. Or you can set it up to access specific sections only to create individual sitemaps.
But the crawler has a few other tricks up its sleeve.
Google can find the sitemap on its own. However, it is a good practice to submit it to Google, and let it process the data.
You do it in Google Search Console.
Below the box, you will see your all submitted sitemaps with information about when Google has accessed them recently, and the status of the file.
With the sitemap created and submitted, all that is left to do is to monitor it for any potential errors in Google Search Console. Also, if you’ve created the sitemap manually and not through the CMS, make sure to recreate and upload it to the server regularly to ensure that any new pages have been included.