WordPress最全面的Robots.txt优化规范

稍微接触过SEO的站长朋友都应该知道 robots协议(也称为爬虫协议、爬虫规则、机器人协议等),也就是通常添加到网站根目录的robots.txt 文件,它的作用在于告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取,从而优化网站的收录结果和权重。

所以可想而知如果你的站点内没有robots.txt,搜索引擎会多么失望。当搜索蜘蛛未发现robots.txt文件,会随机产生一个404错误日志在服务器上,从而增加服务器的负担,因此robots.txt文件重要程度大家一定不能忽视。那接下来的问题是,它为什么要检索robots.txt文件,robots.txt里面是放了什么内容呢。下面我们讲讲它。

在我们建站时WordPress会自动生成一个robots.txt文件,默认在浏览器中输入:http://您的域名/robots.txt,会显示如下内容:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/

当然也可能没有,如果没有,我们就自己见一个robots.txt文件并把这个文件放在您的Wordpress根目录下面即可。

下面是核心讨论了。如何自定义自己的robots.txt文件。不多说,我们给出相关的例子。自己根据需求进行删减吧

#Wordpress核心

Sitemap: https://您的域名/sitemap.xml

User-agent: *
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt
Disallow: /xmlrpc.php
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /*?*
Disallow: /*?
Disallow: /*~*
Disallow: /*~

网站地图有很多生成方式,一般的SEO插件,专门的网站地图插件,我们推荐您使用Yoast SEO或是XML Sitemap & Google News

#通知搜索引擎应爬行您的网站

国外主要搜索引擎

User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Mediapartners-Google
Allow: /
User-agent: AdsBot-Google
Allow: /
User-agent: AdsBot-Google-Mobile
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Msnbot
Allow: /
User-agent: msnbot-media
Allow: /wp-content/uploads/
User-agent: Applebot
Allow: /
User-agent: Yandex
Allow: /
User-agent: YandexImages
Allow: /wp-content/uploads/
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Qwantify
Allow: /

国内主要搜索引擎

User-agent: Baiduspider
Allow: /
User-agent: Baiduspider/2.0
Allow: /
User-agent: Baiduspider-video
Allow: /
User-agent: Baiduspider-image
Allow: /
User-agent: Sogou spider
Allow: /
User-agent: Sogou web spider
Allow: /
User-agent: Sosospider
Allow: /
User-agent: Sosospider+
Allow: /
User-agent: Sosospider/2.0
Allow: /
User-agent: yodao
Allow: /
User-agent: youdao
Allow: /
User-agent: YoudaoBot
Allow: /
User-agent: YoudaoBot/1.0
Allow: /

#垃圾邮件反向链接阻止程序

Disallow: /feed/
Disallow: /feed/$
Disallow: /comments/feed
Disallow: /trackback/
Disallow: */?author=*
Disallow: */author/*
Disallow: /author*
Disallow: /author/
Disallow: */comments$
Disallow: */feed
Disallow: */feed$
Disallow: */trackback
Disallow: */trackback$
Disallow: /?feed=
Disallow: /wp-comments
Disallow: /wp-feed
Disallow: /wp-trackback
Disallow: */replyom=

#加载Woocommerce的性能

Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /*?orderby=price
Disallow: /*?orderby=rating
Disallow: /*?orderby=date
Disallow: /*?orderby=price-desc
Disallow: /*?orderby=popularity
Disallow: /*?filter
Disallow: /*add-to-cart=*

#避免搜索导致搜索被收录的问题

Disallow: /search/
Disallow: *?s=*
Disallow: *?p=*
Disallow: *&p=*
Disallow: *&preview=*
Disallow: /search

#阻止恶意机器人

下面的恶意搜索机器人是国外大佬收集的。在使用过程中请酌情使用Allow或Disallow,默认都是Disallow。

User-agent: DotBot
Disallow: /
User-agent: GiftGhostBot
Disallow: /
User-agent: Seznam
Disallow: /
User-agent: PaperLiBot
Disallow: /
User-agent: Genieo 
Disallow: /
User-agent: Dataprovider/6.101
Disallow: /
User-agent: DataproviderSiteExplorer
Disallow: /
User-agent: Dazoobot/1.0
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: DomainStatsBot/1.0
Disallow: /
User-agent: DotBot/1.1
Disallow: /
User-agent: dubaiindex
Disallow: /
User-agent: eCommerceBot
Disallow: /
User-agent: ExpertSearchSpider
Disallow: /
User-agent: Feedbin
Disallow: /
User-agent: Fetch/2.0a
Disallow: /
User-agent: FFbot/1.0
Disallow: /
User-agent: Googlebot
Disallow: /
User-agent: focusbot/1.1
Disallow: /
User-agent: HuaweiSymantecSpider
Disallow: /
User-agent: HuaweiSymantecSpider/1.0
Disallow: /
User-agent: JobdiggerSpider
Disallow: /
User-agent: LemurWebCrawler
Disallow: /
User-agent: LipperheyLinkExplorer
Disallow: /
User-agent: LSSRocketCrawler/1.0
Disallow: /
User-agent: LYT.SRv1.5
Disallow: /
User-agent: MiaDev/0.0.1
Disallow: /
User-agent: Najdi.si/3.1
Disallow: /
User-agent: BountiiBot
Disallow: /
User-agent: Experibot_v1
Disallow: /
User-agent: bixocrawler
Disallow: /
User-agent: bixocrawler TestCrawler
Disallow: /
User-agent: Crawler4j
Disallow: /
User-agent: Crowsnest/0.5
Disallow: /
User-agent: CukBot
Disallow: /
User-agent: Dataprovider/6.92
Disallow: /
User-agent: DBLBot/1.0
Disallow: /
User-agent: Diffbot/0.1
Disallow: /
User-agent: Digg Deeper/v1
Disallow: /
User-agent: discobot/1.0
Disallow: /
User-agent: discobot/1.1
Disallow: /
User-agent: discobot/2.0
Disallow: /
User-agent: discoverybot/2.0
Disallow: /
User-agent: Dlvr.it/1.0
Disallow: /
User-agent: DomainStatsBot/1.0
Disallow: /
User-agent: drupact/0.7
Disallow: /
User-agent: Ezooms/1.0  
Disallow: /
User-agent: fastbot crawler beta 2.0  
Disallow: /
User-agent: fastbot crawler beta 4.0  
Disallow: /
User-agent: feedly social
Disallow: /
User-agent: Feedly/1.0  
Disallow: /
User-agent: FeedlyBot/1.0  
Disallow: /
User-agent: Feedspot  
Disallow: /
User-agent: Feedspotbot/1.0
Disallow: /
User-agent: Clickagy Intelligence Bot v2
Disallow: /
User-agent: classbot
Disallow: /
User-agent: CISPA Vulnerability Notification
Disallow: /
User-agent: CirrusExplorer/1.1
Disallow: /
User-agent: Checksem/Nutch-1.10
Disallow: /
User-agent: CatchBot/5.0
Disallow: /
User-agent: CatchBot/3.0
Disallow: /
User-agent: CatchBot/2.0
Disallow: /
User-agent: CatchBot/1.0
Disallow: /
User-agent: CamontSpider/1.0
Disallow: /
User-agent: Buzzbot/1.0
Disallow: /
User-agent: Buzzbot
Disallow: /
User-agent: BusinessSeek.biz_Spider
Disallow: /
User-agent: BUbiNG
Disallow: /
User-agent: 008/0.85
Disallow: /
User-agent: 008/0.83
Disallow: /
User-agent: 008/0.71
Disallow: /
User-agent: ^Nail
Disallow: /
User-agent: FyberSpider/1.3
Disallow: /
User-agent: findlinks/1.1.6-beta5
Disallow: /
User-agent: g2reader-bot/1.0
Disallow: /
User-agent: findlinks/1.1.6-beta6
Disallow: /
User-agent: findlinks/2.0
Disallow: /
User-agent: findlinks/2.0.1
Disallow: /
User-agent: findlinks/2.0.2
Disallow: /
User-agent: findlinks/2.0.4
Disallow: /
User-agent: findlinks/2.0.5
Disallow: /
User-agent: findlinks/2.0.9
Disallow: /
User-agent: findlinks/2.1
Disallow: /
User-agent: findlinks/2.1.5
Disallow: /
User-agent: findlinks/2.1.3
Disallow: /
User-agent: findlinks/2.2
Disallow: /
User-agent: findlinks/2.5
Disallow: /
User-agent: findlinks/2.6
Disallow: /
User-agent: FFbot/1.0
Disallow: /
User-agent: findlinks/1.0
Disallow: /
User-agent: findlinks/1.1.3-beta8
Disallow: /
User-agent: findlinks/1.1.3-beta9
Disallow: /
User-agent: findlinks/1.1.4-beta7
Disallow: /
User-agent: findlinks/1.1.6-beta1
Disallow: /
User-agent: findlinks/1.1.6-beta1 Yacy
Disallow: /
User-agent: findlinks/1.1.6-beta2
Disallow: /
User-agent: findlinks/1.1.6-beta3
Disallow: /
User-agent: findlinks/1.1.6-beta4
Disallow: /
User-agent: bixo
Disallow: /
User-agent: bixolabs/1.0
Disallow: /
User-agent: Crawlera/1.10.2
Disallow: /
User-agent: Dataprovider Site Explorer
Disallow: /

#反向链接保护。

下面的#反向链接保护是国外大佬收集的。在使用过程中请酌情使用Allow或Disallow,默认都是Disallow。

User-agent: AhrefsBot
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: SurveyBot
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: rogerbot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: SemrushBot-BA
Disallow: /
User-agent: SemrushBot-SI
Disallow: /
User-agent: SemrushBot-SWA
Disallow: /
User-agent: SemrushBot-CT
Disallow: /
User-agent: SemrushBot-BM
Disallow: /

总结

上面只是些基本的写法,当然还有很多,但是对于新手wordpress站长来讲已经够用了。如果您有新的思路或是想法。请在文章末尾留言,让更多的新人能轻松上手自己的Robots.txt

 

本文已在Ie主题99839发布

文章来源:https://ietheme.com/wordpress-robots.html


撰写评论

您的邮箱地址不会被公开。 必填项已用 * 标注

加入我们

注册完成!

密码重置

请输入您的邮箱地址。 您将收到一个链接来创建新密码。

检查你的邮件中的确认链接。