In my final article, we analyzed our backlinks utilizing knowledge from Ahrefs.

This time round, we’re together with the competitor backlinks in our evaluation utilizing the identical Ahrefs knowledge supply for comparability.

Like final time, we outlined the worth of a website’s backlinks for website positioning as a product of high quality and amount.

High quality is area authority (or Ahrefs’ equal area ranking) and amount is the variety of referring domains.

Once more, we’ll consider the hyperlink high quality with the obtainable knowledge earlier than evaluating the amount.

Time to code.

import re
import time
import random
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta
from plotnine import *
import matplotlib.pyplot as plt
from pandas.api.varieties import is_string_dtype
from pandas.api.varieties import is_numeric_dtype
import uritools  

pd.set_option('show.max_colwidth', None)
%matplotlib inline
root_domain = 'johnsankey.co.uk'
hostdomain = 'www.johnsankey.co.uk'
full_domain = 'https://www.johnsankey.co.uk'
target_name="John Sankey"

Knowledge Import & Cleansing

We arrange the file directories to learn a number of Ahrefs exported knowledge information in a single folder, which is far quicker, much less boring, and extra environment friendly than studying every file individually.

Particularly when you have got greater than 10 of them!


The listdir( ) operate from the OS module permits us to listing all information in a subdirectory.

ahrefs_filenames = os.listdir(ahrefs_path)
ahrefs_filenames.take away('.DS_Store')

File names now listed under:


With the information listed, we’ll now learn each individually utilizing a for loop, and add these to a dataframe.

Whereas studying within the file we’ll use some string manipulation to create a brand new column with the location title of the information we’re importing.

ahrefs_df_lst = listing()
ahrefs_colnames = listing()

for filename in ahrefs_filenames:
    df = pd.read_csv(ahrefs_path + filename)
    df['site'] = filename
    df['site'] = df['site'].str.exchange('www.', '', regex = False)    
    df['site'] = df['site'].str.exchange('.csv', '', regex = False)
    df['site'] = df['site'].str.exchange('-.+', '', regex = True)

ahrefs_df_raw = pd.concat(ahrefs_df_lst)
ahrefs dofollow raw data

Picture from Ahrefs, Could 2022

Now we now have the uncooked knowledge from every website in a single dataframe. The following step is to tidy up the column names and make them a bit friendlier to work with.

Though the repetition could possibly be eradicated with a customized operate or an inventory comprehension, it’s good observe and simpler for newbie website positioning Pythonistas to see what’s taking place step-by-step. As they are saying, “repetition is the mom of mastery,” so get working towards!

competitor_ahrefs_cleancols = ahrefs_df_raw
competitor_ahrefs_cleancols.columns = [col.lower() for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace(' ','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('.','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('__','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('(','') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace(')','') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('%','') for col in competitor_ahrefs_cleancols.columns]

The rely column and having a single worth column (‘venture’) are helpful for groupby and aggregation operations.

competitor_ahrefs_cleancols['rd_count'] = 1
competitor_ahrefs_cleancols['project'] = target_name

Ahrefs competitor dataPicture from Ahrefs, Could 2022

The columns are cleaned up, so now we’ll clear up the row knowledge.

competitor_ahrefs_clean_dtypes = competitor_ahrefs_cleancols

For referring domains, we’re changing hyphens with zero and setting the information kind as an integer (i.e., complete quantity).

This will likely be repeated for linked domains, additionally.

competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] = np.the place(competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] == '-',
                                                           0, competitor_ahrefs_clean_dtypes['dofollow_ref_domains'])
competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] = competitor_ahrefs_clean_dtypes['dofollow_ref_domains'].astype(int)

# linked_domains

competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] = np.the place(competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] == '-',
                                                           0, competitor_ahrefs_clean_dtypes['dofollow_linked_domains'])
competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] = competitor_ahrefs_clean_dtypes['dofollow_linked_domains'].astype(int)


First seen offers us a date level at which hyperlinks had been discovered, which we will use for time sequence plotting and deriving the hyperlink age.

We’ll convert to this point format utilizing the to_datetime operate.

# first_seen
competitor_ahrefs_clean_dtypes['first_seen'] = pd.to_datetime(competitor_ahrefs_clean_dtypes['first_seen'], 
                                                              format="%d/%m/%Y %H:%M")
competitor_ahrefs_clean_dtypes['first_seen'] = competitor_ahrefs_clean_dtypes['first_seen'].dt.normalize()
competitor_ahrefs_clean_dtypes['month_year'] = competitor_ahrefs_clean_dtypes['first_seen'].dt.to_period('M')

To calculate the link_age we’ll merely deduct the primary seen date from at present’s date and convert the distinction right into a quantity.

# hyperlink age
competitor_ahrefs_clean_dtypes['link_age'] = dt.datetime.now() - competitor_ahrefs_clean_dtypes['first_seen']
competitor_ahrefs_clean_dtypes['link_age'] = competitor_ahrefs_clean_dtypes['link_age']
competitor_ahrefs_clean_dtypes['link_age'] = competitor_ahrefs_clean_dtypes['link_age'].astype(int)
competitor_ahrefs_clean_dtypes['link_age'] = (competitor_ahrefs_clean_dtypes['link_age']/(3600 * 24 * 1000000000)).spherical(0)

The goal column helps us distinguish the “consumer” website vs rivals which is beneficial for visualization later.

competitor_ahrefs_clean_dtypes['target'] = np.the place(competitor_ahrefs_clean_dtypes['site'].str.incorporates('johns'),
                                                                                            1, 0)
competitor_ahrefs_clean_dtypes['target'] = competitor_ahrefs_clean_dtypes['target'].astype('class')

Ahrefs clean data typesPicture from Ahrefs, Could 2022

Now that the information is cleaned up each by way of column titles and row values we’re able to set forth and begin analyzing.

Hyperlink High quality

We begin with Hyperlink High quality which we’ll settle for Area Ranking (DR) because the measure.

Let’s begin by inspecting the distributive properties of DR by plotting their distribution utilizing the geom_bokplot operate.

comp_dr_dist_box_plt = (
    ggplot(competitor_ahrefs_analysis.loc[competitor_ahrefs_analysis['dr'] > 0], 
           aes(x = 'reorder(website, dr)', y = 'dr', color="goal")) + 
    geom_boxplot(alpha = 0.6) +
    scale_y_continuous() +   
    theme(legend_position = 'none', 
          axis_text_x=element_text(rotation=90, hjust=1)

                           top=5, width=10, items="in", dpi=1000)
competition distribution typesPicture from Ahrefs, Could 2022

The plot compares the location’s statistical properties facet by facet, and most notably, the interquartile vary displaying the place most referring domains fall by way of area ranking.

We additionally see that John Sankey has the fourth-highest median area ranking, which compares effectively with hyperlink high quality in opposition to different websites.

William Garvey has essentially the most various vary of DR in contrast with different domains, indicating ever so barely extra relaxed standards for hyperlink acquisition. Who is aware of.

Hyperlink Volumes

That’s high quality. What in regards to the quantity of hyperlinks from referring domains?

To deal with that, we’ll compute a working sum of referring domains utilizing the groupby operate.

competitor_count_cumsum_df = competitor_ahrefs_analysis

competitor_count_cumsum_df = competitor_count_cumsum_df.groupby(['site', 'month_year'])['rd_count'].sum().reset_index()

The increasing operate permits the calculation window to develop with the variety of rows which is how we obtain our working sum.

competitor_count_cumsum_df['count_runsum'] = competitor_count_cumsum_df['rd_count'].increasing().sum()

Ahrefs cumulative sum dataPicture from Ahrefs, Could 2022

The result’s an information body with the location, month_year and count_runsum (the working sum), which is within the excellent format to feed the graph.

competitor_count_cumsum_plt = (
    ggplot(competitor_count_cumsum_df, aes(x = 'month_year', y = 'count_runsum', 
                                           group = 'website', color="website")) + 
    geom_line(alpha = 0.6, measurement = 2) +
    labs(y = 'Working Sum of Referring Domains', x = 'Month Yr') + 
    scale_y_continuous() + 
    scale_x_date() +
    theme(legend_position = 'proper', 
          axis_text_x=element_text(rotation=90, hjust=1)
                           top=5, width=10, items="in", dpi=1000)

competitor graph Picture from Ahrefs, Could 2022

The plot exhibits the variety of referring domains for every website since 2014.

I discover fairly fascinating the totally different beginning positions for every website after they begin buying hyperlinks.

For instance, William Garvey began with over 5,000 domains. I’d like to know who their PR company is!

We are able to additionally see the speed of development. For instance, though Hadley Rose began hyperlink acquisition in 2018, issues actually took off round mid-2021.

Extra, Extra, And Extra

You may all the time do extra scientific evaluation.

For instance, one instant and pure extension of the above can be to mix each the standard (DR) and the amount (quantity) for a extra holistic view of how the websites evaluate by way of offsite website positioning.

Different extensions can be to mannequin the qualities of these referring domains for each your personal and your competitor websites to see which hyperlink options (such because the variety of phrases or relevance of the linking content material) may clarify the distinction in visibility between you and your rivals.

This mannequin extension can be utility of these machine studying methods.

Extra sources:

Featured Picture: F8 studio/Shutterstock


Previous articleInlinks Launches New Key phrase & Matter Analysis Software
Next articleHow To Dominate Google Service provider Heart [Podcast]


Please enter your comment!
Please enter your name here