Machine-Learning-with-Python/Tasks at master · sumathi16/Machine-Learning-with-Python

141 lines (122 loc) · 5.22 KB
Market_Fact
- Read the Market Fact data into market_df
- Display the top five rows
- Display the last three rows
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Print the row indices of the dataframe
- Check the dtype of each and every columns
- Description for Numerical Column min,max,count,25%,50%,75%,std
- Description for the categorical Columns count,top,freq
- Missing value count column wise
- Total Missing values count in the DataFrame
- Check the output for the info function
- Access 3,5,7 columns using iloc,loc and without using any of them
- How many unique products are there
- How amny unique customers are there
- What is the maximum sales
Prod_dimen.csv
- Read the Prod_diment data into prod_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Merge Market_df and Prod_pf creaate new datarame with comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Product_Category" and "Product_Sub_Category" columns
Orders_dimen.csv
- Read the Orders_dimen data into ord_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Check the "Order_Date" column type if it is object convert it into datetime
- Create two new columns Order_Year and Order_Month with year and month vcalues from Order_date
- Merge comb_data and ord_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Order_Priority"
- In the comb_data check the value counts for "Order_Year"
- In which Year the customer have Highest no of orders
- In the comb_data check the value counts for Order_Month
- In which Month the customer have Highest no of orders
cust_dimen.csv
- Read the cust_dimen data into cust_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Merge comb_data and cust_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Customer_Segment"
shipping_dimen.csv
- Read the shipping_dimen data into ship_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Check the "Ship_Date" column type if it is object convert it into datetime
- Create two new columns Ship_Year and Ship_Month with year and month vcalues from Order_date
- Merge comb_data and ord_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts for "Ship_Year"
- In which Year the customer have Highest no of Shippings
- In the comb_data check the value counts for Ship_Month
- In which Month the customer have Highest no of Shippings
- Create a new columns "Duration" whcih the diffenence b/n Ship_Date and Order_Date 
- Remove the column "Unnamed:0"
- get the categorical column names
- get the numerical columns
- Replace the missing values in numerical columns with their mean
- Draw the histogram of any one numerical column
    - Place the title of the graph as "Histogram"
    - On each bin print the count in that bin in the graph
- Draw the boxplot of any one column
- Create a subplot of (2,2).
    -(2,2,1) is a histogram
    -(2,2,2) is a boxplot
- Create a Subplot caluculate the no of graphs required based on number of numerical columns.
    - No of rows in the subplot equal to no of numerical columns
    - The plot should contains two graphs in each row. 
        - First is histogram and 
        - the second one is boxplot
    - Use figure size in plt.figure for proper graph size.
- Create Barplot for a categorical column
    - use the legend
    - xlabel
    - ylabel
- Create a subplot for categorical column barplots
Understand the data
We will be using customer churn data from the telecom industry for this exercises. The data file is called Orange_Telecom_Churn_Data.csv
- Read the  data into df
- Display the top five rows
- Display the last three rows
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Print the row indices of the dataframe
- Check the dtype of each and every columns
- Description for Numerical Column min,max,count,25%,50%,75%,std
- Description for the categorical Columns count,top,freq
- Missing value count column wise
- Total Missing values count in the DataFrame
- If there are any missing values replace them.
- Check the output for the info function
- Access 3,5,7 columns using iloc,loc and without using any of them
- How many unique states are there
- How many customers are churned
- How many are not churned   
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

Tasks

Latest commit

History

Tasks

File metadata and controls