Questions tagged [web-crawlers]

A computer program that accesses web pages for various purposes (to scrape content, to provide search engines with information about your site, etc.)

Filter by
Sorted by
Tagged with
2 votes
1 answer
114 views

Will redirecting to a mailto URL prevent spam bots from harvesting the email address?

On https://www.heartinternet.uk/blog/15-ways-to-hide-your-email-address/, under the sub heading Replacing with a PHP script, a method to hide email addresses is presented that I like very much. On the ...
user avatar
0 votes
1 answer
41 views

Do Webcrawlers go to multiple subpages with no internal links?

I run a small business, for discussion let's say I have 4 pages: Page A (Home) Page B Page C Page D. Important: NONE of these Pages have internal links to each other. Imagine just 4 tabs at the top ...
user avatar
1 vote
0 answers
15 views

Exposing content from client-side headless CMS API to web crawlers

This is something I've been struggling to wrap my head around for the past few days. Tried to keep the rambling to a minimum but concise questions are at the bottom. As I understand it, utilizing JS ...
user avatar
1 vote
1 answer
22 views

Only allow crawling of pages in sitemap

My idea is, to use robots.txt to disallow everyting and put all allowed pages into the sitemaps. User-agent: * Disallow: / Sitemap: https://www.example.com/sitemap.xml Is this working as expected?
user avatar
  • 11
0 votes
0 answers
11 views

Do Baidu and/or Yandex publish the current IP ranges used by their site indexing bots?

I need to identify bots operated by search engine operators to index sites. I've looked at an existing question on this matter and the answers given there are too simplistic. The User-Agent header is ...
user avatar
1 vote
1 answer
33 views

Google Indexing Tons of Non-Existent URLs

A client that I am working for has come to me with a problem. Their Google Adwords account has been disabled because there is a bunch of traffic to crazy non-existent pages on their website. Here are ...
user avatar
0 votes
1 answer
28 views

Ratio of real visitors to search engine views

I know it is highly website-dependent, but is there any statistics/study/report on the ratio of real visitors websites get and the visits made by the search engine crawler? Services like Alexa reports ...
user avatar
  • 442
8 votes
1 answer
698 views

Are 'robots.txt' rules "starts with" or "contains" rules (clarification on confusing Google documentation)

Searching for robots.txt pattern matching rules (inc. the non official Google rules), I found this Google dev page where it says the /fish rule matches any path that starts with /fish (...) it doesn'...
user avatar
  • 183
0 votes
0 answers
13 views

google crawling the link which is dynamically changed

I have a question about how Google crawls. I have a pagination link on the site. Like this. www.domeinnaam.com/category?page=2 I suppose that google by crawling is going to click on this link. So far ...
user avatar
0 votes
0 answers
11 views

HOW TO GET MUTIPLE PAGES INDEXED [duplicate]

I have a website with approximately 700k web pages but only has 2k indexed pages. I have tried increasing the crawl limit to 2 requests per second because that is what google search console allows for ...
user avatar
  • 1
1 vote
2 answers
79 views

Can a Web Crawler follow links from <Link> tag in React?

I am building a frontend website and I am trying to make sure that the website can be crawled by search engines so that it can appear in search results. Currently, most of my links look like this in ...
user avatar
2 votes
1 answer
43 views

How to use Subdomain for a custom CMS of the main domain website?

Should I use a subdomain for a custom CMS of the main domain website? Like cms.domain.com. As it will be a CMS, I will disallow a search crawler for indexing. Will this hurt SEO for the Main Domain?
user avatar
4 votes
1 answer
67 views

Is there a way to allow ad bots to crawl a website that has "noindex,nofollow" on it?

Due to duplicated content, there is a "noindex,nofollow" on the website, but in order to run ppc ads, there needs to be a way for the ads bots to crawl the website Can this be done without ...
user avatar
2 votes
2 answers
42 views

Should I exclude the assets folder from search engine crawlers in angular

I'm working with angular and had added the assets folder as element to exclude from the search engine crawlers, into my robots.txt file. But I now get the following error while testing the site health ...
user avatar
1 vote
0 answers
22 views

Disallow crawling of all links to a domain, except only specific URLs with parameters

I have a domain like example.com under which everything is blocked and links to an SPA. Like example.com, and we've disabled everything like so: User-agent: * Disallow: / Now this SPA uses parameters ...
user avatar

15 30 50 per page
1
2 3 4 5
57