Skip to content

allan-pg/Data-Scraping-Analysis-with-YouTube-API-in-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 

Repository files navigation

Data Scraping & Analysis with YouTube API in Python

image

Project Overview

This project demonstrates the end-to-end process of extracting data from YouTube using the YouTube API and analyzing it with Python in Jupyter Notebook. By gathering video metrics and related data, we can explore various insights, such as content trends, engagement metrics, and audience behavior. This project highlights my skills in API data extraction, data analysis, and data visualization to deliver meaningful insights.

Table of Contents

  1. Introduction
  2. Project Objectives
  3. Technologies & Tools
  1. Project Workflow
  1. Set up your YouTube API Key
  1. Data Extraction

Introduction

With the rise of video content, YouTube has become a valuable data source to understand online engagement patterns. In this repository I dived into gathering and analyzing data from YouTube using Python with ease. Whether you're a data analyst, developer, or curious enthusiast, this project will walk you through scraping data from the YouTube API and performing insightful analysis on it.

What to expect

  1. Accessing YouTube’s API to pull in data such as video details, comments, likes, views, and more.
  2. Data Processing in Python to clean and organize the data for meaningful analysis.
  3. Data Visualization & Insights using libraries like pandas, matplotlib, and seaborn.

Prerequisites

  • Basic knowledge of Python
  • Some familiarity with API requests

Project Objectives

  • Extract YouTube Data: Use the YouTube Data API to collect relevant information such as view counts, likes, comments, and video details.
  • Data Wrangling and Data Cleaning: Process and clean the data to ensure consistency and usability.
  • Perform Data Analysis: Analyze metrics like viewer engagement, subscriber ccount, number of videos uploaded and trending topics.
  • Visualize Insights: Present data findings through visualizations to convey key insights clearly.

Technologies & Tools

Languages

  • Python

Tools

  • Jupyter Notebook
  • YouTube Data API v3

Libraries

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • requests
  • google-auth

API

  • YouTube Data API v3

Project Workflow

API Setup

  • Setting up the YouTube API access and authentication.

Data Extraction

  • Using API calls to gather data, including video statistics, channel information, and comments.

Data Wrangling

  • Cleaning and organizing the extracted data.

Data Analysis

Conducting analyses such as:

  • Top-performing videos and channels
  • View and engagement patterns over time
  • Audience demographics and regional insights

Visualization

  • Visualizing data using charts to highlight trends and findings.

Set up your YouTube API Key

Step 1: Create a Project in Google Cloud Console

  1. Go to google console image
  2. If you don't already have a project, click on Select Project in the top navigation bar, then create New Project. image
  • Note I already have created a project named YouTube API as shown click on the drop down list and create a new project
  1. Give your project a name, then click Create. image

Step 2: Enable the YouTube Data API v3

  1. In your Google Cloud Console, go to the APIs & Services dashboard. image
  2. Click on + ENABLE APIS AND SERVICES. image
  3. In the search bar, type "YouTube Data API v3" and select it from the results. image
  4. Click Enable to activate the API for your project. image

Step 3: Create an API Key

  1. After enabling the API, go back to the APIs & Services dashboard. image
  2. Click on Credentials in the left sidebar. image
  3. Select + CREATE CREDENTIALS and choose API Key. image
  4. Copy the generated API key.
    • NOTE: I DELETED THIS API KEY AFTER CREATING IT

Step 4: Store Your API Key in the Project

  1. In the root directory of the project, create a .env file.
  2. Open the .env file and add your API key like so:
YOUTUBE_API_KEY= YOUR API KEY

Note: The .env file is included in .gitignore to keep your API key private. Do not share or commit this file in a public repository.

Data Extraction

  • Import python libraries we will need for this project
import os
from googleapiclient.discovery import build
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('ggplot')
  • Write a python function to fetch data from youtube in JSON format
def channel_data(api_key, channel_id):
    all_data = []
    youtube = build('youtube', 'v3', developerKey = api_key)

    request = youtube.channels().list(
        part = 'snippet, contentDetails, statistics',
        id = ','.join(channel_id)
    )

    response = request.execute()
    for i in range(len(response['items'])):
        data = dict(channel_title = response['items'][i]['snippet']['title'],
                    created_date = response['items'][i]['snippet']['publishedAt'],
                    subscribers = response['items'][i]['statistics']['subscriberCount'],
                    total_videos = response['items'][i]['statistics']['videoCount'],
                    total_views = response['items'][i]['statistics']['viewCount'],
                    playlist_id = response['items'][i]['contentDetails']['relatedPlaylists']['uploads']
                   )
        all_data.append(data)
    return all_data
  • Define your Api keys and Channel ID
  • Import necessary libraries to access the API Key saved in the .env
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

# Access the API key
api_key = os.getenv("API_KEY")
  • To find the chanel ID you can use Tune Pocket since it generates channel ID by just typing in the name of the channel you need Channel ID.
CHANNEL_ID = ['UChQXn6sL9ENIpA74qqPG1HA',
              'UCaWu4TkcsWcZbw0Pg26OltQ',
              'UC6fVFxrbf0HDRW3B2mdWFGA',
              'UCE3KVkSH1GwUtAAMcVcJ3QQ',
              'UCFBoqaPTCtGJi8kr7pV33Tg',
              'UCJ7F5LT-7h8Hfplf6BTTiXg',
              'UC7h4tUtdH0L06sDZVmBMc4Q',
              'UC5h4-WH0LAV4CWs380yM33A',
              'UCx1WDOZzmyIa1MlK1W3RdOg',
              'UCgSP5G3RmKJl72aA2lBV_Jw',
              'UCVfZr3RQTqRgYQkA-eXAxiA',
              'UCPUMDSDu_WC8LVzWjiyVgNQ']
  • Access the data saved in the channel_data() function to see data saved in the JSON file
channel_stat = channel_data(API_KEY, CHANNEL_ID)
channel_stat

image

Future Improvements

  • Automate Data Collection Schedule regular data updates using cron jobs or serverless functions.
  • Expand Dataset Include additional social media APIs to gather broader context.
  • Integrate Machine Learning Develop models to predict engagement or trend patterns based on historical data.

About

Data Scraping & Analysis with YouTube API in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors