Ranking pages: Hubs and Authorities

This week we are still talking about web architecture, and how we can work on it through Safecont.

One of the least known parts of our tool is the page listings of a domain by Hub or Authority scores. Although we do not usually mention them much in our videos, these scores also serve to measure the importance of the pages of a domain and improve the architecture of a site in an alternative to the typical Pagerank algorithm.

Explicación de como funciona la relación entre hubs y authorities

While Pagerank focuses on sorting the pages by the probability that they are visited at random, the HITS (Hyperlink-Induced Topic Search) algorithm is based on the idea that there are two types of pages on the Internet:

  • Hub-type pages are those that, although they do not provide much information on a topic, link to the pages that do.
  • Authority type pages are those that contribute content on a topic to a website and are therefore linked by many Hubs pages related to that topic.

It is necessary to emphasize that the two types of value (Hub and Authority) are not exclusionary. The main page of a site usually has high scores in Authority (it is linked from the whole site) and Hub (it links to many pages with high Authority scores). Let’s see how we can use these scores to improve the structure of our site.

We have placed the page listings by their Hub or Authority score in the “Architecture” tab of our tool. In that section you can find two links to the lists of URLs ordered by their weight as Hub and as Authority. Let’s see some examples:

Example: eu.billabong.com

This website is the online store of one of the surfing fashion brands. If we look at your list of Auths we see the following:

Auths for Billabong, by Safecont

As you can see the root has a high Auth weight, this is logical because it is linked from most pages of the site. However we see a curious thing, the Hub score is very low.

Normally it would have a score close to 1.0 because the usual in an e-commerce is that this page links to most sections of the site.

The headings of these sections will have high Auth punctuation, and therefore the root should have high Hub score. The fact that it is so low indicates a structure problem.

If we look at how the levels are structured, we will quickly see the problem:

Billabong levels by Safecont

The site’s root links to very few pages, 8, with the two main links to sections for men’s and women’s clothing. If this page includes links to listings with its main products as well as quick access to all sections of the site, it would be better structured and customers would find what they are looking for more quickly.

Another example: searchengineland.com

Lista de authorities por Safecon para searchengineland.com

Authorities by Safecont for searchengineland.com

This time we can see that most pages with high Auth are those of the first levels. If we put it together with its level graph we will see that in this case the number of URLs in the second level is much higher.

Niveles de searchengineland.com por Safecont

Levels for searchengineland.com by Safecont

On the other hand, the lower levels are those with high Hub as they usually receive few links, linking in turn to a high number of pages of the site. We see it for the same case:

Hubs for searchengineland by Safecont

As we can see, most of the pages with a high Hub, have much higher levels and correspond to entries in the blog that are not very linked from the site but that link a lot. Perhaps the most anomalous is their situation at higher levels of the fourth or fifth.

Besides we can see two things: the first URL of the list, in the second level has the highest score. Probably this page is one of the main sites in terms of redistribution of links, as it also has a high Auth, it is quite linked from the rest of the site, there is nothing anomalous in it.

The second URL of the list, we can see that it is at depth level 200. It is likely to have a single inbound link and, in turn, links to many of the site’s Auths pages. In this case, it is of little concern because the page looks like a list of old entries that might have to be de-indexed or replaced by an efficient content search system.

The existence of this page does not contribute anything to our users or search engines. It will simply make your bots waste their time reaching this level only to find a list of useful URLs that they have already visited at lower levels.

When building a site we must bear in mind that URLs with high Hub are useful when offering interesting links to both bots and our users. If we get these URLs to be at lower levels, the site will crawl more quickly and efficiently.

As we can see, building an efficient architecture for our website is a very complicated task. If you think you have an architectural problem, do not hesitate to use the Safecont tools that will help you in this difficult task.

