English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   project.EAB (https://eab.abime.net/forumdisplay.php?f=14)
-   -   Spiders / web crawlers (https://eab.abime.net/showthread.php?t=95619)

DamienD 27 December 2018 00:30

Spiders / web crawlers
 
This is really a question for RCK...

Often I notice when looking at Currently Active Users that there are many spiders / web crawlers constantly on EAB with multiple instances :sad

For example; tonight "BoardReader Spider" had 21 connections into my collection thread alone.

I hope this doesn't slow things down on EAB or cause extra bandwidth costs?

Maybe not even possible, but can spiders / web crawlers be limited to say 3 connections maximum?

DamienD 27 December 2018 14:42

Today, just now there are 28 BoardReader spiders on EAB :sad

Here's an overview from their website:

Quote:

Overview

BoardReader was developed to address the shortcomings of current search engine technology to accurately find and display information contained on the Web's forums and message boards. Founded in May 2000 by engineers and students from the University of Michigan, Boardreader uses proprietary software that allows users to search multiple message boards simultaneously, allowing users to share information in a truly global sense.


Our Focus

Boardreader is focused on creating the largest repository of searchable information for our users. We also strive to increase traffic and exposure for the individual forum and message board.

By creating an interface that corresponds with multiple boards simultaneously, users can find answers to their questions from others who share similar interests. Our goal is to allow our users to search the "human to human" discussions that exist on the Internet.


Our Technology

Our technology is based on innovative, scaleable software created to quickly and accurately search information contained on message boards. Message boards are part of the Internet known as the 'Invisible Web' and pose many problems to traditional search engine spiders. The dynamic content is usually very deep and hard to search. In addition, many of these sites change their locations, servers, or url's almost daily presenting special searching challenges.

Special retrieval and indexing algorithms as well as unique topic relevance ordering rules are but a few parts of what is needed to allow you to view what we affectionately call the 'human experience'. We sincerely appreciate your comments and hope you enjoy our service.
Here's some more info about BoardReader and blocking them:

... http://www.webhostingtalk.com/showthread.php?t=1176901
... https://community.centminmod.com/thr...in-robots.550/

I love this comment from someone:

Quote:

Blocking them is trivial. The most irritating part about this is the crawl rate -- 20-50 requests per second from multiple IPs? was the bot designed by a complete tool or is this how they intended it?

RCK 28 December 2018 16:22

Hello Damien,

If those spiders are performing to much simultaneous connexion and impact performance, yes I will limited their parrallels connexion, but for now, the new server is fine and don't go over 15% of CPU, so it's not a problem :)
Bandwidth if fine too.

DamienD 28 December 2018 16:57

No problem RCK; just thought I'd mention ;)

DamienD 06 January 2019 16:55

I don't know why, but this spider loves to read my thread... currently 21 connections:


http://i67.tinypic.com/34oq7ts.jpg


All times are GMT +2. The time now is 19:25.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05338 seconds with 11 queries