This week we talk about one of Safecont parts that can help improve the quality of a page. I refer to the “SEMANTIC” tab of Safecont. In this part of the tool we have summarized the semantic information of a domain in different ways.
On the one hand we have the TFIDF again. If in previous occasions we look at the TFIDF of each URL of a site, in the SEMANTIC tab we will focus on the general TFIDF of a domain. On the one hand we have the general TFIDF of the most frequent words of the domain, calculated as the average of the TFIDFs of all the URLs of the system. This graphic can give us information about what words are important in our domain and give us an overview of the use we make of them. As we did in the article that talked about individual TFIDF of each URL, we can look at those words that have TFIDF too high (values well above the average). As in this graph we show the average values of the domain, a high value in this table implies generally high values in the domain, which will be words that we will be using too much in each of the pages in which it appears.
The other way to see the TFIDF of the system is the graph of the same tab in which we show the relationship of TFIDF value of each word and number of URLs in which it appears.
In this graphic the most used words in the domain are represented as a point. The horizontal axis represents the number of URLs in which that word appears and the vertical axis represents the average TFIDF value that word has among all the pages in which it is used. If we leave the mouse on one of these points, we will see to what word it refers and the values that it has. Likewise, you can zoom at will. This graphic can be used for several things:
You can see both examples clearly in the figure above.
The other information shown in this tab is a visualization of the semantic clusters detected in the system. A semantic cluster is a group of pages that deal with a similar topic. These clusters have a series of characteristics that allow us to represent them in a three-dimensional space as points separated by a certain distance. The closer two points are to the graph, the more similar will be the topics they deal with. In the following figure we show an example of said graph.
This graphic can help us find groups of topics that may be important on our page but that we have not considered. It would also allow us to find groups of pages that go out of normal in terms of the topics covered. If in this graph we see a cluster that is very far from the others, it is very likely that the information treated in this cluster does not have much to do with the topics treated in the domain.
If you think that a semantic analysis will help you improve your pages, do not hesitate to use the Safecont tools.
Carlos Pérez Miguel