This project demonstrates the end-to-end process of extracting data from YouTube using the YouTube API and analyzing it with Python in Jupyter Notebook. By gathering video metrics and related data, we can explore various insights, such as content trends, engagement metrics, and audience behavior. This project highlights my skills in API data extraction, data analysis, and data visualization to deliver meaningful insights.
- Step 1: Create a Project in Google Cloud Console
- Step 2: Enable the YouTube Data API v3
- Step 3: Create an API Key
- Step 4: Store Your API Key in the Project
With the rise of video content, YouTube has become a valuable data source to understand online engagement patterns. In this repository I dived into gathering and analyzing data from YouTube using Python with ease. Whether you're a data analyst, developer, or curious enthusiast, this project will walk you through scraping data from the YouTube API and performing insightful analysis on it.
What to expect
- Accessing YouTube’s API to pull in data such as video details, comments, likes, views, and more.
- Data Processing in Python to clean and organize the data for meaningful analysis.
- Data Visualization & Insights using libraries like pandas, matplotlib, and seaborn.
Prerequisites
- Basic knowledge of Python
- Some familiarity with API requests
- Extract YouTube Data: Use the YouTube Data API to collect relevant information such as view counts, likes, comments, and video details.
- Data Wrangling and Data Cleaning: Process and clean the data to ensure consistency and usability.
- Perform Data Analysis: Analyze metrics like viewer engagement, subscriber ccount, number of videos uploaded and trending topics.
- Visualize Insights: Present data findings through visualizations to convey key insights clearly.
- Python
- Jupyter Notebook
- YouTube Data API v3
- pandas
- numpy
- matplotlib
- seaborn
- requests
- google-auth
- YouTube Data API v3
- Setting up the YouTube API access and authentication.
- Using API calls to gather data, including video statistics, channel information, and comments.
- Cleaning and organizing the extracted data.
Conducting analyses such as:
- Top-performing videos and channels
- View and engagement patterns over time
- Audience demographics and regional insights
- Visualizing data using charts to highlight trends and findings.
- Make sure you sign in to your Google Console
- Go to google console

- If you don't already have a project, click on Select Project in the top navigation bar, then create New Project.

- Note I already have created a project named YouTube API as shown click on the drop down list and create a new project
- In your Google Cloud Console, go to the APIs & Services dashboard.

- Click on + ENABLE APIS AND SERVICES.

- In the search bar, type "YouTube Data API v3" and select it from the results.

- Click Enable to activate the API for your project.

- After enabling the API, go back to the APIs & Services dashboard.

- Click on Credentials in the left sidebar.

- Select + CREATE CREDENTIALS and choose API Key.

- Copy the generated API key.
- NOTE: I DELETED THIS API KEY AFTER CREATING IT
- In the root directory of the project, create a
.envfile. - Open the .env file and add your API key like so:
YOUTUBE_API_KEY= YOUR API KEYNote: The .env file is included in .gitignore to keep your API key private. Do not share or commit this file in a public repository.
- Import python libraries we will need for this project
import os
from googleapiclient.discovery import build
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('ggplot')- Write a python function to fetch data from youtube in JSON format
def channel_data(api_key, channel_id):
all_data = []
youtube = build('youtube', 'v3', developerKey = api_key)
request = youtube.channels().list(
part = 'snippet, contentDetails, statistics',
id = ','.join(channel_id)
)
response = request.execute()
for i in range(len(response['items'])):
data = dict(channel_title = response['items'][i]['snippet']['title'],
created_date = response['items'][i]['snippet']['publishedAt'],
subscribers = response['items'][i]['statistics']['subscriberCount'],
total_videos = response['items'][i]['statistics']['videoCount'],
total_views = response['items'][i]['statistics']['viewCount'],
playlist_id = response['items'][i]['contentDetails']['relatedPlaylists']['uploads']
)
all_data.append(data)
return all_data- Define your Api keys and Channel ID
- Import necessary libraries to access the API Key saved in the .env
from dotenv import load_dotenv
# Load the .env file
load_dotenv()
# Access the API key
api_key = os.getenv("API_KEY")- To find the chanel ID you can use Tune Pocket since it generates channel ID by just typing in the name of the channel you need Channel ID.
CHANNEL_ID = ['UChQXn6sL9ENIpA74qqPG1HA',
'UCaWu4TkcsWcZbw0Pg26OltQ',
'UC6fVFxrbf0HDRW3B2mdWFGA',
'UCE3KVkSH1GwUtAAMcVcJ3QQ',
'UCFBoqaPTCtGJi8kr7pV33Tg',
'UCJ7F5LT-7h8Hfplf6BTTiXg',
'UC7h4tUtdH0L06sDZVmBMc4Q',
'UC5h4-WH0LAV4CWs380yM33A',
'UCx1WDOZzmyIa1MlK1W3RdOg',
'UCgSP5G3RmKJl72aA2lBV_Jw',
'UCVfZr3RQTqRgYQkA-eXAxiA',
'UCPUMDSDu_WC8LVzWjiyVgNQ']- Access the data saved in the channel_data() function to see data saved in the JSON file
channel_stat = channel_data(API_KEY, CHANNEL_ID)
channel_stat- Automate Data Collection Schedule regular data updates using cron jobs or serverless functions.
- Expand Dataset Include additional social media APIs to gather broader context.
- Integrate Machine Learning Develop models to predict engagement or trend patterns based on historical data.


