Friday, April 2, 2021

Technical SEO: Indexing & Crawl

Website owners cannot take crawling for granted since it is a costly process. Thus Google budgets crawl, especially in the case of large portals.


Different factors impact crawl budget

  • Age of the domain 
  • Backlink profile, which reflects the authority and trustworthiness of a domain. 
  • The size of the domain 
  • Also content freshness 
  • Accessibility
  • Proper http status codes


If the above factors are favorable regular crawls are expected, and hence new updates are indexed fast. Search engines do not like to waste crucial resources ineptly crawling your site; hence core vitals should be within the correct limits. The SE try to achieve more with fewer efforts and save on resources which are data center per unit for a given domain. Crawl requests are judged based on several requests in twenty-four hours. Thus site updates and new content should reflect quickly on the SERPs. Keep a tab of indexing and crawling for each site in GSC or Google Search Console.


As a contingency measure, important URLs should be placed in priority in the site architecture.  


You can use various tools to understand how SE crawls a web page. SEs use many crawlers to ascertain your web page and index various attributes that reflect upon website ranking and user experience. 


Tools like SEMRush Audit, Deep Crawl, and Screaming Frog help you analyze crawling and problems associated with it. But GSC is unavoidable if you wish to understand how a client's websites is crawled. The tools emulate Google Bot to furnish information about site crawl.


Robot.TXT Exclusion Protocol 


This exclusion protocol placed in the root directory keeps away certain pages, scripts, and pdfs from search engine crawl. All SE does not follow this protocol, but Google, Bing, and Yahoo follow. This is a bit risky for excluding wrong coding, errors, and other issues that may prevent proper functioning.


If you want to make sure that your robots.txt file is free of dangerous issues, there are several checks available in the SEMrush Site Audit. With these checks, you can find out:

  • Whether your robots.txt file has format errors
  • If there are any issues with blocked internal resources in robots.txt
  • Whether Sitemap.xml is indicated in robots.txt
  • If the robots.txt file exists
  • If no exclusion is required do not use this protocol at all.
  • Exclusion Using Meta Tags
  • This is a preferred method to exclude pages from Google crawl or from other search engines. This robot meta tag directive is placed in the <head> section linke all meta tags.
  • <meta name="robots" content="noindex" /> 
  • To block ad bots like Google Ad Bot use a specific directive.
  • <meta name="AdsBot-Google" content="noindex" />
  • <meta name="googlebot" content="noindex" />
  • <meta name="googlebot-news" content="noindex" />
  • To exclude multiple crawlers use the script below
  • <meta name="googlebot" content="noindex">
  • <meta name="googlebot-news" content="nosnippet">


The exclusion can be set in the headers as well.


Another advanced protocol in SEO is PRG. In this, you can define how the URLs look like using CSS code. Use Java Script to submit a post form and send it to a controller which redirects clean URLs.   


==================



Uday provides a search engine optimization service for digital marketing. In addition, he provides website content and content for authoritative links. 

He can be contacted at: 

pateluday90@hotmail.com

09755089323      



No comments:

Post a Comment