Hello guys, It’s me, Park. Today I want to follow my last article to discuss more about the difference between Google and Baidu.
Today’s topic is Baidu spider and Google Bot. And I want to discuss this topic in two sections. First is IP.
As we all known, that all the crawlers have their own IPs. For example, Google bot normally start it’s IP with “66.xxx.xxx.xxx”
In Baidu, it’s different. Usually they start with “123”, “220”, “111”. In some special occasions, their IPs may start with “181”, “180”.
Different IP means different functions in Baidu.
Baidu IP “123.xxx.xxx.xxx” Spider: I like to call it scanner
Why scanner? Because, most of the job they do are scanning the whole website, put all the new URLs into a crawling list. And then pass the list to “220” spiders.
In my website database ( I got most of my followers’ website data, just for data analysis ), which shows that scanner get less data then “220” spiders (via server log). And matter of fact, “220” spider will come to your website right after a scanner crawled the site.
Baidu IP “220.xxx.xxx.xxx” Spider: Vertical Crawler
Vertical, Yes, You hear me right. Which meas that the “220” spiders go much deeper than scanners. They will get the page content fully, including words, phrases, image code, etc. Before the page was indexed by Baidu. We always find that the Vertical Crawler have come to the exact URL.
Baidu IP “111.xxx.xxx.xxx” Spider: Render
Render Spider (Official name, it’s not named by me) only crawls js/css/fonts documents, which effect user experience directly. It’s important, make sure the crawling path is clear ( Robots.txt ) and your server didn’t block the Renders’ IPs.
I’m not sure I explained the 3 types of spiders clearly. Hope you guys can give me some feed back in comment section, thanks.