Wild Query Strings and Duplicate Content

Recently, Robin Rozhon wrote an interesting post about duplicate content and how small changes reduce the amount of indexed pages in a site, increasing the traffic received by landing pages and, thus, the revenue obtained by that site.

Rozhon’s thesis is that when you have the same content duplicated in several indexed URLs, you lost control about the crawling of you site, relying on Google’s judgment and spreading all your possible visitors in several URLs that would have to fight among them for the same content. In the post I linked before, the author explains why they make a reduction of 80% of their indexed URLs (from 500.000 pages to only 100.000). Before the change, only the 8.55% indexed URLs generated at least one session in a month; after deindexing URLs, the 49.7% indexed URLs generated organic traffic.

In Safecont, we totally agree: duplicate content is something to avoid. The main source of duplicate content are query parameters used to generate content views, faceted navigation or track users. The last one is really a bad idea. There are better ways to track users, but if you are obliged to use track parameters, it is necessary to avoid its indexing because each pass of the crawler will generate useless URLs that will duplicate with the main landing pages of your site.

Source: https://rozhon.com/blog/crawling-indexing-technical-seo-basics-that-drive-revenue/

Regarding content views, your should ask yourself: Does this parameter change content seen by the user? If the answer is no, you should avoid indexing it. If the answer is yes, index it only if the change is noticeable. For example a parameter used to order some listing by price should not be indexed.

Facets pose the same problem. If you combine several facets with several filters, you could generate thousands of URLs that would not add content to your site. In general, you should use only facets for individual category pages and use non indexable filters to increase the refinement of a search. For example, you could generate a landing for “adidas jackets” but not for “adidas jackets under 200$”.

The usual method to search URL parameters is looking for them in the code of your site. This method is not very efficient. In Safecont we propose you another method. Our tool is capable of crawling your site in the same terms as the crawlers from Google or Bing and compare all the different URLs. Let’s see an example. We have recently crawled 100.000 URLs from www.kukuxumusu.com, a well known spanish ecommerce. The initial Safecont score is 62.8.

Source: safecont.com

If we look further, we see that the main problem is that 100% of the URLs have similarity problems:

Source: safecont.com

And if we look among the most duplicated URLs, we can see this:

Source: safecont.com

In the first 10 pages we can see a parameter used to track origin of users, “___from_store”; two parameters used to sort listings, “dir” and “ord”, and a filter, “talla”. All of them should not be indexed, either by using de “robots” tag, the robots.txt file or the canonical tag. With these changes, these 10 URLs will be reduced to only 3 (or 2 if you consider that the paging parameter should not be indexed). The same process can be done in all the site, reducing the number of URLs and concentrating the organic traffic in the most important URLs.

If we focus on just one URL of the listing, we can see how large is this problem:

Duplicates for http://www.kukuxumusu.com/en/kids-baby/ni-a/girl-footwear/flip-flops.html . Source: safecont.com

Almost all of them are URLs using parameters that could be deindexed. As you can see, internal duplicity is a big problem that has an easy solution. If you think that you have it, contact us and we will help you solving it.

Comparte la noticia