A Russian website, AnalyzeThis, was analyzing the results of the world's popular search engines, looking at various metrics like how much spam there is in the search results, how well are the engines filtering out adult content, etc. Among the other elements that were examined, the website was also interested in how well the search engines detect the original content over its duplicates across the Internet. Knowing how important fresh and unique content are for better SEO, it is understandable why this issue is of great interest for every website owner and SEO expert aiming for higher ratings for their websites.
Since the content they create and post online will be copied on some web page, legitimately and otherwise, and often within few days or hours after it was first published, it is essential for the content creators that the search engines to do a reasonable job, favoring the original authors. However, much to our surprise, the results from this analysis were, simply put, less than inspiring. According to the survey done at the end of 2011, Google, the most popular search engine in the world, gets it right only about 57% of the time, and this is a significant increase when compared to the year before when it was under 10%. And what's even worse, Google is best of all. Just for comparison, Bing is hovering at about 7%.
The truth is we may need to take these numbers with a dose of skepticism, since the website does not provide detailed information about how they obtain and process this data, but even if the tests aren't what we would call perfect, that fact would further highlight the possibility that Google isn't either. Even if we decide to question the techniques that AnalyzeThis uses for processing such complex data, with this many content creators and different works on the Internet, there's still a great part of it that can be mislabeled by Google as duplicate.
What is Google's Problem?
Google wants to provide its users with a variety of relevant search results they can choose from, not with links that have same content. If lots of pages on the Internet have the same content, Google will have to decide which one is the original, and which ones are the duplicates. The way Google and other search engines determine this leaves space for mistakes the spammers are counting on.
To assess what content is the original, Google uses a variety of factors such as the website authority, the number of inbound links, how old the page and the site are, and many other metrics that aren't always most accurate. This is the reason Google makes mistakes from time to time, punishing the wrong websites for duplicate content and lowering their page rankings. The latest Panda and Penguin updates Google introduced recently are making it harder for spammers to trick the algorithms and manipulate the search results, but there still are some winning the numbers with duplicate content stolen from other websites, while they claim they've taken it from some open source, or that it has been uploaded by one of their users.
This even further highlights how important it is for website owners and content creators to be vigilant and protect their writing, instead of leaving everything for Google to handle. Duplicate content checker PlagSpotter helps you carefully monitor the important data about the activity on your website to immediately detect when someone's trying to steal your content and undertake the needed actions.