Crawl budget refers to the number of URLs a search engine's spider will crawl during one session. A website's crawl budget is influenced by the crawl rate that Googlebot has determined is optimal for the site and crawl demand which is determined by the website's popularity and the need to keep Google's SERP fresh for a particular search query. Learn more about the crawl budget
What is Crawl Rate?
Crawl Rate refers to how many requests per second Googlebot makes to a site when crawling it.
How Do You Optimize a Crawl Budget?
The goal of crawl budget optimization is to increase the frequency by which spiders crawl pages and send information to algorithms that are in charge of evaluating content quality and content indexing. This means the better a site's crawl budget, the faster it is likely to be updated on the search engine.
Here are some best practices for optimizing crawl budget:
Block Unimportant Pages and Ensure Important Pages Are Crawlable
Use script to direct Google bots away from your unimportant pages. This can be done by adding a 'noindex' tag to the head section of the page. This will ensure that these pages are not indexed. Also, it is important to make sure that important pages are crawlable.
Avoid Redirect Chains
Each redirected page uses up a portion of the site's crawl budget. This means long redirect chains should be avoided. Also, bots might stop crawling redirects if there are an unreasonable amount in a row.
Fix Error Pages
Besides being bad for user experience, 404 and 410 errors use up crawl budget. The site owner should therefore make sure there are no page errors on their site.
Focus on Using HTML
Although Google is able to crawl Flash, XML, and JavaScript, bots from other search engines may struggle to crawl these files. As a result, website owners should use HTML instead.
Keep Your Sitemap Up to Date
An up to date xml site map will help bots understand site structure and where internal links are pointing. It is important that the sitemap only include canonical URLs and should be consistent with robots.txt files. The reason is robots.txt files tell bots to not crawl pages, while site map links tell bots to crawl pages. If there is any inconsistency the bots will be told to crawl pages they are blocked from crawling.
Use Hreflang Tags
Hreflang Tags help bots to understand local versions of pages. This includes language and region-specific content. This can be done through the use of HTTP headers, HTML tags, or a sitemap.
Deal With URL Parameters
Many CMSs generate dynamic URLs. These extra URLs don't only use up a site's crawl budget, but they also potentially cause duplicate content issues. To deal with this, parameters should be added to the Google Search Console account.
What is a Crawl Stats Report?
The crawl Stats Report provides information about Googlebot's activity on a site. The report is taken from the last 90 days of data and takes into consideration all content types.
How do I Increase Google Crawl Rate?
There are times when Googlebot crawls a site less frequently. There are a number of possible reasons for this.
Low Authority
When a site lacks high-quality backlinks, Google interprets this to mean the site is unimportant. As a result, Google will not frequently crawl it. This is fairly common for a new site.
The way to improve the situation is to build links to the site. When a site acquires a high volume of quality links its authority grows. This will signal Google to crawl the site more often.
Site Errors
If a site has many errors, Google will crawl that site slowly. This can easily be dealt with by fixing those errors. The errors can be found in the Google Search Console Coverage report.
Slow Loading Site
When a site loads slowly, Google might slow its crawl rate for that site. The faster a site loads the more pages Google bots can crawl at one time. In order to increase site speed, use site speed tools like gtmetrics.com or the Google PageSpeed Insights.
What Does Google See When it Crawls My Site?
Google first looks at your robots.txt file. This tells Google which pages it can crawl and which pages it can't. Once it has 'seen' the robots.txt file Googlebot then crawls the page's title tag. After the title tag, it then crawls the site's meta description. It then moves on to the images. Although Google is not able to see a site's images it does crawl the image alt tags. Finally, Google crawls the actual content on the page.