FAQ – Frequently ask questions

What is Safecont and what is it used for?

Safecont is a tool for analyzing website content and architecture. It uses machine learning technology to detect a website’s main issues in order to avoid positioning problems or penalties.

By training artificial intelligence algorithms, we can detect low-quality content that may lead to penalizations or other issues.

Why use Safecont?

Safecont detects, analyzes and classifies website issues into danger levels, using systems that detect patterns previous technologies could never identify. This allows users to take faster and more precise action on their websites (where is the error? what is it? and, how dangerous is it?).

What does Safecont essentially detect?

  • Internal content issues.
  • External content issues.
  • Thin content issues.
  • Web architecture issues.

Why does Safecont use machine learning technology?

The number of factors involved in the detection of quality content and organic positioning is increasing every day. Additionally, the relationships between these factors are becoming more and more complex. As such, we need cutting-edge technology to detect what is almost impossible to find with the naked eye. For example, in order to analyze 100,000 URLs we carry out 10^15 operations, which is more or less the equivalent of 1,000,000,000,000,000,000 mathematical calculations. Obviously, carrying out that many calculations using conventional technology is extremely complicated.

After billions of mathematical calculations, machine learning technology enables us to generate error detection patterns that would be impossible to detect otherwise. In this way error detection and resolution is simplified.

How are PandaRisk and PageRisk calculated?

Our algorithms have been trained on hundreds of thousands of URLs that experienced a significant change in traffic after being penalized by search engines because of their content or other variables. This process provides us with a URL score (PageRisk), which shows the URL’s risk of being penalized, as well as an overall score for the entire domain (PandaRisk). This overall score not only considers the individual scores of the domain’s URLs but also the number of pages with low-quality content.

What is PandaRisk?

Based on the results from our machine learning algorithms, domains are given a score indicating their content-based penalty risk. If the PandaRisk value is close to 0 (green) the domain is safe, but if it is close to 100 (red) it is in maximum danger.

PandaRisk includes many more variables besides the usual internal or external similarity and duplicity, or thin content.

What is PageRisk?

It is similar to PandaRisk but at URL level. That is, the score reflects the penalty risk for an individual web page. If the PageRisk value is close to 0 (green) the domain is safe, but if it is close to 100 (red) it is in maximum danger.

When is a domain, group of pages, or particular URL really in danger?

Values over 40 are quite significant, but over 70 are quite dangerous. So, it is advisable to take action with high-risk pages or clusters first (PageRisk or ClusterRisk).

Are there any success stories that attest that Safecont really works?

Yes, a lot. Before the software was launched on the market, Safecont was tested with millions of URLs and hundreds of websites.

 Furthermore, not only do we detect patterns that we can see once they’ve been discovered, but the algorithms glimpse more than can be seen with the naked eye.

What does Similarity mean?

A website’s pages/URLs may have some content in common with others on the same site. This can be duplicate or very similar content (which also leads to high risk). That is to say, the term similarity is more complex and goes beyond the commonly used term of duplicity.

What is LevelStrength?

A score for the relevance of different page depth levels, comparable to PageRank.

The maximum relevance of a level is 100 and minimum is 0. Ideally, all depth levels would be 100 or close, with a difference of 20 points, at most, between levels. That is, depth level 1 (corresponding to the home page) has a value of 100 and depth level 2 should have a value between 80 and 100, and so on.

What is PageStrength?

Relevance scores at page level, comparable to PageRank. The maximum relevance level is 100 and minimum 0.

What is LinkStrength?

Relevance scores at link level, comparable to PageRank. Links with a value of 100 are more relevant than those with a value close to 0.

What is a Cluster?

Pages/URLs grouped according to certain patterns they have in common. Different groupings can be made based on the recurrent and dangerous issues detected. Furthermore, these groups can be separated and ordered by their risk levels, making it easier to focus on issues the website may be experiencing. For example, Safecont shows the most dangerous cluster of pages/URLs within a domain and assigns them a risk value. This makes work easier by highlighting dangerous pages, allowing users to focus on them and take the necessary action.

What is ClusterRisk?

URLs grouped according to their penalization risk. They can be grouped by PageRisk, External Duplicate, Similarity or Thin content.

What is a PageRisk Cluster?

Average score of a group of URLs. They are grouped into percentiles only considering the URLs’ PageRisk.

What is an External Duplicate Cluster?

URLs grouped according to their external duplication percentage (content duplicated on other domains analyzed).

What is a Thin Content Cluster?

URLs grouped according to their thin content ratio.

What is a Semantic Cluster?

Semantic grouping of URLs. Each has a different ClusterRisk.

How long does the analysis take?

It depends on the number of pages making up a domain. The calculation time is exponential to the number of pages: anywhere between thousands and hundreds of thousands of mathematical calculations are carried out to calculate each score. An approximate timeframe would be:

Can I download lists?

Yes. From each Safecont section where you find lists of URLs (Similarity, External Duplicate, Thin Content, Pages, Most powerful pages, Most common anchors and Most powerful anchors). The download button is to the right of each list name (in CSV format).

Are we considering the canonicals?

We consider them when crawling the domains but they are ruled out when analyzing, as well as main search engines’ crawlers.

They need to be crawled to ensure a good site analysis, that’s why they are deducted from your credit total.

What happens with nofollow links?

As the name suggests, they are not followed. So, we detect them but the crawler does not follow them. That is, they are found but not analyzed.

What is thin content and how is it determined?

Pages within a domain with scarce and low-quality content are categorized as thin content. Based on the characteristics of each domain, various weightings are applied to determine whether you have thin content issues.

ThinRatio?

This score is given to each page based on its thin content “value”. As the ThinRatio value nears 100, the possibility of the page or pages having or causing issues is greater.

What are internal inbound links?

Links to a page from other pages within the same domain.

What are internal outbound links?

Links from a page to other pages within the same domain.

Can I download reports?

Yes. You’ll see a PDF icon over the rhinoceros on the upper right corner of the home page, just click and download should start automatically.

How long will it take for my analysis to be available?

There are two important questions to consider when estimating how long it will take for domain data to become available: crawling and calculations.

Crawling time depends on page size. So, the bigger the page the longer it takes. Sometimes the crawl has to be launched several times because we send a high number of requests and our crawler may be banned. For this reason it is advisable to include us on the whitelist, so that everything runs much faster.

Likewise, calculation time also depends on the size of each website. However, in this case calculation time is exponential. That is, analyzing a website with 200,000 URLs doesn’t take twice that required for 100,000 URLs, it takes much more.

Although consistently improving, this is an approximate time estimate:

  • 10,000 URLs approximately a half hour
  • 50,000 URLs approximately two hours
  • 100,000 URLs approximately one day