What is a spider trap?
Attackers normally use a spambot for sending spam. A spambot is an automated program that is used by attackers to send spam to users. Spambots can send spam emails or even automated posts to various forums or social networking sites.
Spambots (What are spambots, and how do attackers use spambots to send spams?) crawl websites for malicious purposes and waste a website’s bandwidth unnecessarily. So, websites use spider traps as a countermeasure against those spambots.
How does a spider trap work?
Spambots request webpages from a web server several times within a short duration. So, to counter them, a spider trap catches spambots and makes them run in some infinite loop.
There are a number of common techniques that are frequently used to make the spambots run in an infinite loop. To name a few of them :
- Sometimes, a cyclic directory structure is used. For example: /path/to/directory/again/path/to/directory. As a result, if a spambot starts crawling the website, it will start running in an infinite loop.
- Some websites use an unbounded number of dynamic pages. For example, algorithmically generated poetry or including a calendar.
- Webpages are sometimes filled with a large number of characters so that when a lexical analyzer tries to parse it, it will end up crashing.
Disadvantages of using spider traps
Not all web crawlers are spambots. Sometimes, polite web crawlers crawl websites for indexing purposes. So, a website cannot use spider traps to trap all the crawlers it encounters. It needs to differentiate between spambots and legitimate web crawlers.
How to prevent legitimate web crawlers from falling into a spider trap?
Polite web crawlers alternate requests between different hosts. They do not request webpages from the same server more than once within a short time frame. So, usually, spider traps do not affect them much. Moreover, websites with spider traps can keep a robots.txt, which can keep enough information so that legitimate web crawlers do not fall into the trap.
0 Comments