From 88da65ef67fcccdc6953552fa7bf616353faebe1 Mon Sep 17 00:00:00 2001 From: Sped0n Date: Fri, 3 Nov 2023 10:11:02 +0800 Subject: [PATCH] feat(robots.txt): add robots.txt file to block unwanted bots and allow all other bots to crawl the site The robots.txt file is added to the layouts directory. This file includes rules to block specific bots from crawling the site. The following bots are blocked: MJ12bot, AhrefsBot, BLEXBot, SISTRIX Crawler, sistrix, 007ac9, 007ac9 Crawler, UptimeRobot/2.0, Ezooms Robot, Perl LWP, netEstate NE Crawler (+http://www.website-datenbank.de/), WiseGuys Robot, Turnitin Robot, Heritrix, pricepi, SurdotlyBot, and ZoominfoBot. All other bots are allowed to crawl the site. The file also includes a sitemap directive to point to the sitemap.xml file. --- layouts/robots.txt | 61 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 layouts/robots.txt diff --git a/layouts/robots.txt b/layouts/robots.txt new file mode 100644 index 0000000..206e15f --- /dev/null +++ b/layouts/robots.txt @@ -0,0 +1,61 @@ +User-agent: MJ12bot +Disallow: / + +User-agent: AhrefsBot +Disallow: / + +User-agent: BLEXBot +Disallow: / + +# Block SISTRIX +User-agent: SISTRIX Crawler +Disallow: / +User-agent: sistrix +Disallow: / +User-agent: 007ac9 +Disallow: / +User-agent: 007ac9 Crawler +Disallow: / + +# Block Uptime robot +User-agent: UptimeRobot/2.0 +Disallow: / + +# Block Ezooms Robot +User-agent: Ezooms Robot +Disallow: / + +# Block Perl LWP +User-agent: Perl LWP +Disallow: / + +# Block netEstate NE Crawler (+http://www.website-datenbank.de/) +User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/) +Disallow: / + +# Block WiseGuys Robot +User-agent: WiseGuys Robot +Disallow: / + +# Block Turnitin Robot +User-agent: Turnitin Robot +Disallow: / + +# Block Heritrix +User-agent: Heritrix +Disallow: / + +# Block pricepi +User-agent: pimonster +Disallow: / + +User-agent: SurdotlyBot +Disallow: / + +User-agent: ZoominfoBot +Disallow: / + +User-agent: * +Allow: / + +Sitemap: {{ "/sitemap.xml" | absURL }}