cyveillance.com from 63.148.99.229 - bad, bad spider
There are rules and recommendations out there on how to conduct business on the web. If you’re a spider, automatically downloading content, analyzing it and using it for YOUR business, you better behave like a good spider. And not like the one from cyveillance.com
Cyveillance.com (see here: http://www.cyveillance.com/) came by this morning and download all textual content from my server. And it did that in a way that I don’t like:
- It did not identify itself as a spider in the useragent header
- It did not access the robots.txt file to check whether it was ok to spider my site (and thus stepped into areas a spider should not have stepped into)
- It downloaded the whole lot in just a few minutes without giving my server time to breathe and other user’s a slice of the bandwidth
So, all in all they violated every single rule of good spider-behaviour. Here are the first few lines from the access-log file (in reverse chronological order):
| host | time | request | status | bytes | useragent | referer |
|---|---|---|---|---|---|---|
| 04:13:31 | GET /blog/archives/000090.html HTTP/1.0 | 200 | 11570 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:26 | GET /copyright.html HTTP/1.0 | 200 | 4870 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:21 | GET /spam.html HTTP/1.0 | 200 | 5108 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:16 | GET /trips.html HTTP/1.0 | 200 | 5225 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:12 | GET /photo.html HTTP/1.0 | 200 | 5225 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:07 | GET /blog/index.html HTTP/1.0 | 200 | 52823 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - | |
| 04:13:03 | GET / HTTP/1.0 | 200 | 943 | Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) | - |
I contaced them this morning and asked them to explain their spidering behaviour, until I get a response I’ll block access from 63.148.99.229. And if I don’t hear from them, I guess we can block the whole subnet 63.148.99.224/27. I’ll update this entry as I get more information.
uh this spider thingmajiggy came by my site also but I don’t quite understand why or how seeing as I just “released” my site without any meta tags..anyways all and all they bookmark 105 pages. I don’t know what spiders are or what they do and why they do it but I think it would be wise if I did the same as you and ban that ip. thank you for posting this. Also, could you contact me with any updates you get or info on this manner? thank you so much
Laura
I see this thread is a bit aged but I’m curious as to whether or not you ever got an answer out of cyveillance.com.
I noticed them in my access log this evening and when I went back through this years stats, they’ve been to my site 5 times.
I’ve asked them to state the purpose of thier visits and until they do so, their domain and IP block will remain on the “no christmas card” list.