Robots.txt files are essential for managing how search engines crawl and index your website. For e-commerce sites and news portals, configuring robots.txt correctly can improve SEO performance and protect sensitive information. Here are practical examples tailored for these types of websites.

Robots.txt for E-commerce Websites

E-commerce websites often have numerous product pages, categories, and customer data. Proper robots.txt setup helps prevent indexing of duplicate content, admin pages, and checkout processes.

Basic E-commerce Robots.txt Example

This configuration allows search engines to crawl product pages and categories but blocks admin areas and checkout pages.

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /cart/
Disallow: /user/
Allow: /product/
Allow: /category/
Sitemap: https://www.example.com/sitemap.xml

Blocking Duplicate Content

To avoid duplicate content issues, block URLs with parameters or session IDs.

User-agent: *
Disallow: /*?sessionid=
Disallow: /*?ref=

Robots.txt for News Portals

News websites generate a large volume of content daily. It's important to block non-essential pages and focus search engine crawling on news articles and categories.

Basic News Portal Robots.txt Example

This setup allows crawling of news articles and categories while blocking admin pages, login pages, and internal search results.

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /search/
Disallow: /user/
Allow: /news/
Allow: /category/
Sitemap: https://www.newsportal.com/sitemap.xml

Blocking Non-Content Pages

Prevent search engines from indexing pages like tags, author archives, or duplicate feeds.

Disallow: /tag/
Disallow: /author/
Disallow: /feed/
Disallow: /comments/

Additional Tips for Robots.txt Optimization

Regularly review and update your robots.txt file to adapt to website changes. Use the Google Search Console to test your robots.txt rules and ensure important pages are crawled.

Remember, robots.txt does not guarantee pages won't be indexed if they are linked from other sites. Use meta tags like noindex for sensitive pages when necessary.