• DaGeek247@fedia.io
    link
    fedilink
    arrow-up
    19
    ·
    3 months ago

    My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

    I’ve only gotten like, 20 visits in the past three months though, so, very small sample size.

    • mozz@mbin.grits.dev
      link
      fedilink
      arrow-up
      11
      ·
      3 months ago

      I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

      This is fuckin GENIUS

      • Moonrise2473@feddit.it
        link
        fedilink
        arrow-up
        5
        ·
        3 months ago

        only if you don’t want any visits except from yourself, because this removes your site from any search engine

        should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)

          • Moonrise2473@feddit.it
            link
            fedilink
            arrow-up
            1
            ·
            3 months ago

            Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt

        • mozz@mbin.grits.dev
          link
          fedilink
          arrow-up
          4
          ·
          3 months ago

          You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      3 months ago

      Interesting way of testing this. Another would be to search the search machines with adding site:your.domain (Edit: Typo corrected. Off course without - at -site:, otherwise you will exclude it, not limit to.) to show results from your site only. Not an exhaustive check, but another tool to test this behavior.