7
Feb 11
The *Biggest* Scraper In the World Revealed
There is a company that has been doing content scraping on a level that’s really unimaginable, to the point that it can be regarded as a completely different business model than most others that do this. Like many of the sites you’ve seen out there, its approach is to spider the web and copy other people’s content, then subsections of content are “mashed” together and presented to end-users – essentially auto-generated web pages.
The difference between this company and the myriad of other scrapers our there is in the “mashing up” process; rather than mashing together random pieces of content, only pieces that are truly relevant to each other are assembled together. This is very difficult Ph.D-level stuff, and is the key to the whole process. As a result, the content mash-ups it presents look very natural, the content pieces are related to each other, and these auto-generated web pages can (shockingly) sometimes even be useful to the end-user. A neat trick as well is this company has an automated system to create these mash-ups based on content demand from the end-users – it’s a great systematic way to create profitable web pages.
The really onerous thing though is, in many cases, this company also hosts copies of the original content the mash-ups are based on so end-users can access them without ever leaving the company’s website. Pretty over-the-top, you’d think that sort of behavior would attract lawsuits!
This approach has been so successful this company has even been branching out by scraping content that ISN’T EVEN AVAILABLE ELECTRONICALLY. I know this sounds crazy, but it’s actually scanning in offline content and adding it to its database.
Needless to say, this company is making huge amounts of money from advertising revenues – from the huge traffic that all this auto-scraped content is attracting.
The name of the company is “Google”, and this completely different business model is called a “search engine”.
*rolls his eyes*