-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy pathTasks
More file actions
141 lines (122 loc) · 5.22 KB
/
Tasks
File metadata and controls
141 lines (122 loc) · 5.22 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
Market_Fact
- Read the Market Fact data into market_df
- Display the top five rows
- Display the last three rows
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Print the row indices of the dataframe
- Check the dtype of each and every columns
- Description for Numerical Column min,max,count,25%,50%,75%,std
- Description for the categorical Columns count,top,freq
- Missing value count column wise
- Total Missing values count in the DataFrame
- Check the output for the info function
- Access 3,5,7 columns using iloc,loc and without using any of them
- How many unique products are there
- How amny unique customers are there
- What is the maximum sales
Prod_dimen.csv
- Read the Prod_diment data into prod_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Merge Market_df and Prod_pf creaate new datarame with comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Product_Category" and "Product_Sub_Category" columns
Orders_dimen.csv
- Read the Orders_dimen data into ord_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Check the "Order_Date" column type if it is object convert it into datetime
- Create two new columns Order_Year and Order_Month with year and month vcalues from Order_date
- Merge comb_data and ord_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Order_Priority"
- In the comb_data check the value counts for "Order_Year"
- In which Year the customer have Highest no of orders
- In the comb_data check the value counts for Order_Month
- In which Month the customer have Highest no of orders
cust_dimen.csv
- Read the cust_dimen data into cust_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Merge comb_data and cust_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts in "Customer_Segment"
shipping_dimen.csv
- Read the shipping_dimen data into ship_df
- Display the top two rows
- Display the last three rows
- Select one row randomly
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Check the dtype of each and every columns
- Check the "Ship_Date" column type if it is object convert it into datetime
- Create two new columns Ship_Year and Ship_Month with year and month vcalues from Order_date
- Merge comb_data and ord_df and write the reslut into comb_data
- check the number of rows and columns in comb_data.
- In the comb_data check the value counts for "Ship_Year"
- In which Year the customer have Highest no of Shippings
- In the comb_data check the value counts for Ship_Month
- In which Month the customer have Highest no of Shippings
- Create a new columns "Duration" whcih the diffenence b/n Ship_Date and Order_Date
03-02-2020
- Remove the column "Unnamed:0"
- get the categorical column names
- get the numerical columns
- Replace the missing values in numerical columns with their mean
- Draw the histogram of any one numerical column
- Place the title of the graph as "Histogram"
- On each bin print the count in that bin in the graph
- Draw the boxplot of any one column
- Create a subplot of (2,2).
-(2,2,1) is a histogram
-(2,2,2) is a boxplot
- Create a Subplot caluculate the no of graphs required based on number of numerical columns.
- No of rows in the subplot equal to no of numerical columns
- The plot should contains two graphs in each row.
- First is histogram and
- the second one is boxplot
- Use figure size in plt.figure for proper graph size.
- Create Barplot for a categorical column
- use the legend
- xlabel
- ylabel
- Create a subplot for categorical column barplots
07-02-2020
Understand the data
We will be using customer churn data from the telecom industry for this exercises. The data file is called Orange_Telecom_Churn_Data.csv
- Read the data into df
- Display the top five rows
- Display the last three rows
- Check the total number of entries you have
- Check the No of features
- Print features Names(Column Names)
- Print the row indices of the dataframe
- Check the dtype of each and every columns
- Description for Numerical Column min,max,count,25%,50%,75%,std
- Description for the categorical Columns count,top,freq
- Missing value count column wise
- Total Missing values count in the DataFrame
- If there are any missing values replace them.
- Check the output for the info function
- Access 3,5,7 columns using iloc,loc and without using any of them
- How many unique states are there
- How many customers are churned
- How many are not churned