Great ! Thought I would pass on I have traces of the following

Google WebCache about crawler-detect HOT 4 CLOSED

jaybizzle commented on July 26, 2024

Google WebCache

from crawler-detect.

Comments (4)

JayBizzle commented on July 26, 2024

Hi,

Thanks for the info.

How do you know it is a bot, and what user-agent is it supplying as this script (currently) only works off the provided user-agent.

Thanks

from crawler-detect.

timint commented on July 26, 2024

Thanks for replying. I have looked into this further and it seems the user agent is pretty much a regular user-agent.

Here are some agents all by the *.bc.googleusercontent.com domain:

Mozilla/5.0 (Linux; Android 6.0.1; GCE x86 phone Build/MMB29W.MZC79) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2694.0 S
Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2672.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2690.0 Safari/537.36
Mozilla/5.0 (X11; CrOS x86_64 14.4.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2697.0 Safari/537.36

It's possible these are regular users on some sort of proxy tunnel through Google.

I also found these, with similar user agent strings:

google-proxy-66-249-93-101.google.com
google-proxy-66-102-6-75.google.com
google-proxy-66-102-6-78.google.com

My misstake for registering a feature request if this is invalid. I think I need to look into this further more. I spontaneously never thought Google could tunnel users.

from crawler-detect.

timint commented on July 26, 2024

Some forums believe the Google Proxy is by their Data Compression service.
https://developer.chrome.com/multidevice/data-compression

And some believe the Google User Content is some sort of CDN, or Image caching.

I still have no idea if these should be considered bots or real visitors. :(

from crawler-detect.

JayBizzle commented on July 26, 2024

Thanks for looking into this further.

I think for now it would be too risky to block. Also, it is currently out of the remit of this library as we currently rely solely on the user-agen.

Expanding the library to do any kind of reverse DNS lookup would slow things down too much I feel.

Im closing this for now, but feel free to add any additional info as you find it.

Thanks again! ❤️

from crawler-detect.

Related Issues (20)

Recommend Projects

Google WebCache about crawler-detect HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent