Questions tagged [robots.txt]

Robots.txt is text file used by Website owners to give instructions about their site to web robots. Basically it tells robots which parts of the site are open and which parts are closed. This is called The Robots Exclusion Protocol.

Filter by
Sorted by
Tagged with
0 votes
0 answers
9 views

I'm trying to figure out why this website is not being indexed [duplicate]

Is this robots.txt file the issue, and if it is, how should it be written and why? User-agent: * Disallow:
user avatar
2 votes
1 answer
334 views

Why is Google indexing my robots.txt file?

Google Search is indexing my robots.txt file. I know this because when I search site:example.com on Google, my robots.txt shows in the list of results. I don't want my robots.txt to show in Google. ...
user avatar
  • 123
1 vote
1 answer
38 views

Getting Robots.txt Not Found 404 Error on Google Search Console

I have created robots.txt in my root directory and when I navigate to my website address like https://example.ca/robots.txt I am able to see the rules User-agent:* Disallow: /captcha/ Disallow: /...
user avatar
  • 115
2 votes
1 answer
29 views

Googlebot is crawling unpublished pages from Adobe AEM and getting 404 errors

My company is currently using Adobe AEM as a CMS for our company website. We are currently having issues with 4xx errors appearing specifically 404 as unpublished pages are still being crawled. What ...
user avatar
  • 21
2 votes
2 answers
27 views

Disallow top level of directory, but not subdirectories in robots.txt

I have a directory that I don't want Google to index at the top level. I currently do the following: Disallow: /profiles/ This stops Google from indexing https://example.com/profiles/, but it also ...
user avatar
3 votes
1 answer
283 views

Redirect Errors in Google Search Console when trying to index site

I've built a small site for a local trader which is not a complicated site in any way. In fact it is 100% static. I have a few years experience and in that time have never encountered an error with ...
user avatar
1 vote
1 answer
31 views

Only allow crawling of pages in sitemap

My idea is, to use robots.txt to disallow everyting and put all allowed pages into the sitemaps. User-agent: * Disallow: / Sitemap: https://www.example.com/sitemap.xml Is this working as expected?
user avatar
  • 11
3 votes
3 answers
44 views

Why would a human read robots.txt

As a part of our new website I've written code in a page filter to do some basic bot / crawler detection by looking at the user agent. That way when a bot is detected we serve up some generic user ...
user avatar
0 votes
1 answer
149 views

My website's image does not work with twitter cards (unclear issues with robots.txt)

Problem I am trying to properly set up metadata for a twitter summary card on my website. I have everything in place, but the card image is not showing. I use Grav, but I guess it does not matter. ...
user avatar
  • 188
3 votes
1 answer
74 views

How to Allow bots only on main page

I'm trying to create a robots.txt file that allows bots to access ONLY the home page and no other page. Below is the content of the robots.txt file based on the research I have done. Is this correct? ...
user avatar
8 votes
1 answer
699 views

Are 'robots.txt' rules "starts with" or "contains" rules (clarification on confusing Google documentation)

Searching for robots.txt pattern matching rules (inc. the non official Google rules), I found this Google dev page where it says the /fish rule matches any path that starts with /fish (...) it doesn'...
user avatar
  • 183
4 votes
1 answer
83 views

Is there a way to allow ad bots to crawl a website that has "noindex,nofollow" on it?

Due to duplicated content, there is a "noindex,nofollow" on the website, but in order to run ppc ads, there needs to be a way for the ads bots to crawl the website Can this be done without ...
user avatar
2 votes
2 answers
89 views

Should I exclude the assets folder from search engine crawlers in angular

I'm working with angular and had added the assets folder as element to exclude from the search engine crawlers, into my robots.txt file. But I now get the following error while testing the site health ...
user avatar
0 votes
0 answers
11 views

Sitemap in robots.txt is not appearing in Google Search Console, no URLs detected in sitemap [duplicate]

At the bottom of robots.txt I have Sitemap: https://domain2.com/sitemap.xml. In the Google Search Console for my property at domain1 I see a false HTML sitemap with errors, no trace of my actual ...
user avatar
  • 151
1 vote
0 answers
28 views

Disallow crawling of all links to a domain, except only specific URLs with parameters

I have a domain like example.com under which everything is blocked and links to an SPA. Like example.com, and we've disabled everything like so: User-agent: * Disallow: / Now this SPA uses parameters ...
user avatar

15 30 50 per page
1
2 3 4 5
48