This project analyzes customer churn in a telecommunications company. The dataset contains customer demographics, service usage, and contract details to identify patterns associated with churn.
The goal is to:
- Perform Exploratory Data Analysis (EDA) to identify trends.
- Handle data preprocessing and feature engineering.
- Use visualizations for insights.
- Optionally, apply machine learning models to predict churn.
The dataset includes:
- Customer ID: Unique identifier.
- Demographics: Gender, senior citizen status, partner, and dependents.
- Service Information: Internet service, online security, streaming TV, etc.
- Contract Details: Contract type, paperless billing, payment method.
- Churn Label: Whether the customer left the service (
YesorNo).
π Data Cleaning Steps:
- Handled missing values.
- Converted categorical variables.
- Engineered new features for analysis.
git clone https://github.com/rizz1406/Customer-Churn-Analysis.git
cd Customer-Churn-AnalysisEnsure you have Python 3.x installed, then install required libraries:
pip install pandas numpy matplotlib seaborn scikit-learnjupyter notebookOpen Telco Customer Churn.ipynb and execute all cells.
import pandas as pd
df = pd.read_csv("Customer Churn.csv")
print(df.info()) # Dataset structure
print(df.describe()) # Statistical summary
print(df.isnull().sum()) # Check missing valuesimport seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(6,4))
sns.countplot(x='Churn', data=df, palette='coolwarm')
plt.title("Customer Churn Distribution")
plt.show()π Insight: Helps understand the proportion of customers who churned vs. stayed.
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap='Blues')
plt.title("Feature Correlation Heatmap")
plt.show()π Insight: Identifies relationships between different variables.
Some feature transformations:
- Encoding categorical variables (
Yes/No,Male/Femaleβ0/1). - Creating new aggregated features.
- Removing redundant columns.
Example transformation:
df['SeniorCitizen'] = df['SeniorCitizen'].map({0: 'No', 1: 'Yes'})
df = pd.get_dummies(df, drop_first=True) # Convert categorical to numericalfrom sklearn.model_selection import train_test_split
X = df.drop(columns=['Churn'])
y = df['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")π Insight: This gives a baseline model to predict churn.
plt.figure(figsize=(8,5))
sns.countplot(x='Contract', hue='Churn', data=df)
plt.title("Churn Rate by Contract Type")
plt.show()π Insight: Customers with month-to-month contracts have a higher churn rate.
plt.figure(figsize=(8,5))
sns.boxplot(x="Churn", y="MonthlyCharges", data=df)
plt.title("Monthly Charges vs Churn")
plt.show()π Insight: Higher monthly charges correlate with increased churn.
- Customers with month-to-month contracts are more likely to churn.
- Senior citizens have a slightly higher churn rate.
- Paperless billing customers churn more frequently.
- Long-term contract customers are more loyal.
π’ Business Recommendation: Offer incentives for long-term contracts to reduce churn.
- β Improve feature selection for better model accuracy.
- β Implement hyperparameter tuning for the ML model.
- β Deploy the model via Flask or Streamlit.
- Feel free to contribute by submitting pull requests.
- Licensed under MIT License.


