Files
bridget/layouts/robots.txt
Sped0n 88da65ef67 feat(robots.txt): add robots.txt file to block unwanted bots and allow all other bots to crawl the site
The robots.txt file is added to the layouts directory. This file includes rules to block specific bots from crawling the site. The following bots are blocked: MJ12bot, AhrefsBot, BLEXBot, SISTRIX Crawler, sistrix, 007ac9, 007ac9 Crawler, UptimeRobot/2.0, Ezooms Robot, Perl LWP, netEstate NE Crawler (+http://www.website-datenbank.de/), WiseGuys Robot, Turnitin Robot, Heritrix, pricepi, SurdotlyBot, and ZoominfoBot. All other bots are allowed to crawl the site. The file also includes a sitemap directive to point to the sitemap.xml file.
2023-11-03 10:11:02 +08:00

62 lines
944 B
Plaintext

User-agent: MJ12bot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: BLEXBot
Disallow: /
# Block SISTRIX
User-agent: SISTRIX Crawler
Disallow: /
User-agent: sistrix
Disallow: /
User-agent: 007ac9
Disallow: /
User-agent: 007ac9 Crawler
Disallow: /
# Block Uptime robot
User-agent: UptimeRobot/2.0
Disallow: /
# Block Ezooms Robot
User-agent: Ezooms Robot
Disallow: /
# Block Perl LWP
User-agent: Perl LWP
Disallow: /
# Block netEstate NE Crawler (+http://www.website-datenbank.de/)
User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/)
Disallow: /
# Block WiseGuys Robot
User-agent: WiseGuys Robot
Disallow: /
# Block Turnitin Robot
User-agent: Turnitin Robot
Disallow: /
# Block Heritrix
User-agent: Heritrix
Disallow: /
# Block pricepi
User-agent: pimonster
Disallow: /
User-agent: SurdotlyBot
Disallow: /
User-agent: ZoominfoBot
Disallow: /
User-agent: *
Allow: /
Sitemap: {{ "/sitemap.xml" | absURL }}