Blog
Read more >>
We are pleased to announce that we bring news and this does not stop.
It is already official the launch of “CRAWL STATS”, you have available and completely free of charge a crawler included in all Safecont accounts.
A new tab appears in your user dashboard. Once you update or launch an analysis, you will have information and data on crawling, with listings for each type of URL, and other metrics. Until you update your analysis, the data will not appear.
In the main screen of the tab you will see “Crawled stats” a summary of the state of the domain with a pie chart and some very interesting information:
Unique indexable pages
Non-indexable pages
Pages that give a code other than 200 (301, 302, 404, 500, …)
Crawlable pages over the limit that you have put to the analysis (you may have launched an analysis on 5,000 urls, but our crawler has found more pages)
And a great novelty our Crawl Score: A metric created by us, which evaluates the difficulty that bots find to crawl our domain and find all the indexable urls, taking out a single metric weighted by the weight of each level according to importance. Further down on the same page, you have the “Crawled URLs per level” graphic where you have the information of each depth level.
All this information is downloadable in CSV where you will have many more details with which to work as your index / follow status, where the url was found, where it points, status code, your pagerisk, similarity, pagerank, hub value, authority value, the semantic cluster to which it belongs, and much more:
Haz click para ampliar
Then you will find 3 boxes with more information:
Non indexed URLs: With information of redirections, noindex and other circumstances that can make a page not indexable. Clicking on any will go to the detail page with the full list of urls that compose it and other relevant data.
Non 200 status URLs: Where
Read more >>
Recently, Robin Rozhon wrote an interesting post about duplicate content and how small changes reduce the amount of indexed pages in a site, increasing the traffic received by landing pages and, thus, the revenue obtained by that site.
Rozhon’s thesis is that when you have the same content duplicated in several indexed URLs, you lost control about the crawling of you site, relying on Google’s judgment and spreading all your possible visitors in several URLs that would have to fight among them for the same content. In the post I linked before, the author explains why they make a reduction of 80% of their indexed URLs (from 500.000 pages to only 100.000). Before the change, only the 8.55% indexed URLs generated at least one session in a month; after deindexing URLs, the 49.7% indexed URLs generated organic traffic.
In Safecont, we totally agree: duplicate content is something to avoid. The main source of duplicate content are query parameters used to generate content views, faceted navigation or track users. The last one is really a bad idea. There are better ways to track users, but if you are obliged to use track parameters, it is necessary to avoid its indexing because each pass of the crawler will generate useless URLs that will duplicate with the main landing pages of your site.
Regarding content views, your should ask yourself: Does this parameter change content seen by the user? If the answer is no, you should avoid indexing it. If the answer is yes, index it only if the change is noticeable. For example a parameter used to order some listing by price should not be indexed.
Facets pose the same problem. If you combine several facets with several filters, you could generate thousands of URLs that would not add content to your site. In general, you should use only facets for individual category pages and use non indexable filters to increase the refinement of a search. For example, you could generate a landing for “adidas jackets” but not for “adidas jackets under 200$”.
The usual method to search URL parameters is looking for them in the code
Read more >>
Read more >>
Read more >>