Skip to the content.

Top YouTubers in UK for 2024

Introduction

The dataset provides insights into the top YouTubers in the UK for 2024, focusing on three key metrics: Total Videos, Total Views, and Total Subscribers.

Data Source

The dataset is sourced from Kaggle, but it required significant cleaning and preprocessing.

Tools & Technologies

Methodology

  1. Design: Defined the project objectives, identified relevant metrics, and planned the data workflow.
  2. Development: Processed and cleaned the raw data, then loaded it into SQL for analysis.
  3. Testing: Verified data accuracy and ensured that the key metrics aligned with the project goals.
  4. Analysis: Performed detailed analysis and created visualizations to present insights into the top UK YouTubers of 2024.

SQL Query Process

In this project, I used SQL to clean and query the data effectively. Here’s a breakdown of the steps:

Step 1: Loading the Data

First, I uploaded the raw dataset into an SQL database to make querying easier. This dataset included information about YouTubers such as total videos, total views, total subscribers, and more.

Step 2: Clean the Data

/*
  Select the required columns: channel name, total subscribers, total views, and total videos.
  Extract the channel name from the 'NOMBRE' column by removing the part after the '@' symbol.
*/

-- 1. Select the necessary columns and clean the channel name
SELECT
    SUBSTRING(NOMBRE, 1, CHARINDEX('@', NOMBRE) - 1) AS channel_name,  -- Extract channel name before '@'
    total_subscribers,
    total_views,
    total_videos
FROM
    top_uk_youtubers_2024;


## Step 2: Clean the Data

```sql
/*
  Select the required columns: channel name, total subscribers, total views, and total videos.
  Extract the channel name from the 'NOMBRE' column by removing the part after the '@' symbol.
*/

-- 1. Select the necessary columns and clean the channel name
SELECT
    SUBSTRING(NOMBRE, 1, CHARINDEX('@', NOMBRE) - 1) AS channel_name,  -- Extract channel name before '@'
    total_subscribers,
    total_views,
    total_videos
FROM
    top_uk_youtubers_2024;

step 3: Create The SQL View

  
CREATE VIEW view_Top_youtubers_2024 As
select CAST (SUBSTRING (NOMBRE, 1, CHARINDEX('@', NOMBRE) -1 ) AS varchar(100)) as channel_name,
    total_subscribers,
		total_views,
		total_videos
from youtube_data_from_python

Stpe 4: Data Quality Test

/*

Data quality tests

  1. The data needs to be 100 records of Youtube channels (row count test) 2.The data needs 4 fields (column count test)
  2. The channek name column must be string format, and the other columns must be numerical data types (data type check)
  3. Each record must be unique in the dataset (duplicate count check)

Row count - 100 (pass) Column count - 4

Data types Channel_name = VARCHER total_subscribers = INTGER total_views = INTGER total_video = INTGER

Duplicate count = 0

*/

1. 
/*
# Count the total number of records (or rows) are in the SQL view
*/

SELECT
    COUNT(*) AS no_of_rows
FROM
    view_uk_youtubers_2024;

2. /*
# Count the total number of columns (or fields) are in the SQL view
*/


SELECT
    COUNT(*) AS column_count
FROM
    INFORMATION_SCHEMA.COLUMNS
WHERE
    TABLE_NAME = 'view_uk_youtubers_2024'
3. /*
# Check the data types of each column from the view by checking the INFORMATION SCHEMA view
*/
SELECT
    COLUMN_NAME,
    DATA_TYPE
FROM
    INFORMATION_SCHEMA.COLUMNS
WHERE
    TABLE_NAME = 'view_uk_youtubers_2024';
4. /*
# 1. Check for duplicate rows in the view
# 2. Group by the channel name
# 3. Filter for groups with more than one row
*/

-- 1.
SELECT
    channel_name,
    COUNT(*) AS duplicate_count
FROM
    view_uk_youtubers_2024

-- 2.
GROUP BY
    channel_name

-- 3.
HAVING
    COUNT(*) > 1;

Result

Dashboard

DAX Measures

// Total Subscribers (M)
Total Subscribers (M) = 
VAR million = 1000000
VAR sumOfSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR totalSubscribers = DIVIDE(sumOfSubscribers, million)
RETURN totalSubscribers

// Total Views (B)
Total Views (B) = 
VAR billion = 1000000000
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR totalViews = ROUND(sumOfTotalViews / billion, 2)
RETURN totalViews
// Total Videos
Total Videos = 
VAR totalVideos = SUM(view_uk_youtubers_2024[total_videos])
RETURN totalVideos

// Average Views Per Video (M)
Average Views per Video (M) = 
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR sumOfTotalVideos = SUM(view_uk_youtubers_2024[total_videos])
VAR avgViewsPerVideo = DIVIDE(sumOfTotalViews, sumOfTotalVideos, BLANK())
VAR finalAvgViewsPerVideo = DIVIDE(avgViewsPerVideo, 1000000, BLANK())
RETURN finalAvgViewsPerVideo
// Subscriber Engagement Rate
Subscriber Engagement Rate = 
VAR sumOfTotalSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR sumOfTotalVideos = SUM(view_uk_youtubers_2024[total_videos])
VAR subscriberEngRate = DIVIDE(sumOfTotalSubscribers, sumOfTotalVideos, BLANK())
RETURN subscriberEngRate 
// Views Per Subscriber
Views Per Subscriber = 
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR sumOfTotalSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR viewsPerSubscriber = DIVIDE(sumOfTotalViews, sumOfTotalSubscribers, BLANK())
RETURN viewsPerSubscriber