
Assignment Questions
Data Analytics
Assignment 1
Please go to the following link:
Instruction
Use the data from the databases on the right side, write SQL, and generate the results. Write down the SQL and screenshot the results generated (if table too big, just screenshot the top rows)
Ex #1: Find all customers that are from Germany.
Ex #2: Find names of products that the prices are between 15 and 22
Ex #3: Find the average quantity of each order
Ex #4: Find all the Orders that bought Boston Crab Meat
Ex #5: Count the number of orders that bought Boston Crab Meat
Ex #6: Find the average spending of each order
Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
**Files without name or assignment number will not be submitted
Assignment 2
Your content has been submitted
An error occurred. Try again later
Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
**Files without name or assignment number will not be submitted
Your content has been submitted
An error occurred. Try again later
Instruction
Dataset used in the course df_dataset = pd.read_csv('http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv')[KHT1]
Website for your practice and self-learning https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python
Python Assignment
This ecommerce dataset records the attributes of customers including their basic demographic information, spending on the platform and response to the marketing campaigns.
AcceptedCmp1 - 1 if customer accepted the offer in the 1st campaign, 0 otherwise
AcceptedCmp2 - 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
AcceptedCmp3 - 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4 - 1 if customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5 - 1 if customer accepted the offer in the 5th campaign, 0 otherwise
Response (target) - 1 if customer accepted the offer in the last campaign, 0 otherwise
Complain - 1 if customer complained in the last 2 years
DtCustomer - date of customer’s enrolment with the company
Education - customer’s level of education
Marital - customer’s marital status
Kidhome - number of small children in customer’s household
Teenhome - number of teenagers in customer’s household
Income - customer’s yearly household income
MntA - amount spent on A products in the last 2 years
MntB - amount spent on B products in the last 2 years
MntC - amount spent on C products in the last 2 years
MntD - amount spent on D products in the last 2 years
MntE - amount spent on E products in the last 2 years
MntF - amount spent on F products in the last 2 years
NumDealsPurchases - number of purchases made with discount
NumCatalogPurchases - number of purchases made using catalogue
NumStorePurchases - number of purchases made directly in stores
NumWebPurchases - number of purchases made through company’s web site
NumWebVisitsMonth - number of visits to company’s web site in the last month
Recency - number of days since the last purchase
Task
-
Create your Python Jupyter notebook.
-
Import relevant Python package (e.g. pandas)
-
Read in the data set as a pandas dataframe
-
Understand the dataset
-
Use what you have learned to answer the following questions:
-
Display the first 20 rows of the dataframe
-
Sort the dataframe by Year_birth on ascending order
-
Filter the dataframe where customers were born after 1985
-
Count the number of customers was born after 1985
-
What is the average income for customers with different education level?
-
For those accept / reject the offer (reponse), what are their average spending in category A, B, C, D, E and F
[KHT1]Website not found, but we do have the excel file.
Assignment 3
Your content has been submitted
Your content has been submitted
Using the eCommerce dataset: ecommerce_marketing_campaign.csv
Can use “size” argument to change the size of the figure, try to create the visualizations using seaborn:
-
Read in the data as Pandas dataframe, create a new column named “Age”, fill in the “Age” column by calculating age using birth year.
-
Create a regression plot to show the relationship between “Age” and “Income”, for customers with different “Education”
-
Create a histogram that shows for people with different “Marital_Status” and break down by “Education”, what is their response rate to marketing campaigns (“Response”)
-
Create a density plot to show the distribution “Income” for customers with different “Education” level
-
Open exploration:
Our goal is find out what kind of customers are more likely to response to our promotion (what kind of customer might have “Reponse”==1)?
Explore throughout the feature and try to find insights and visualize it.
Sample
Regression | Histogram | Density



Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
*Files without name or assignment number will not be submitted
Assignment 4
Task: use the Ecommerce dataset to understand what kind of customers / behaviors have higher response to the marketing promotions.
For each step, please write down the markdown to describe your step and also add your comments to record your logic
-
Create a jupyter notebook, name the notebook as: Machine learning assignment-’your name’.
-
Import all necessary packages.
-
Read in eCommerce dataset as a pandas dataframe
-
Do the exploratory data analysis (you can just use your visualizations from assignment 3).
-
Build a machine model use random forest (remember data preprocessing, train test split).
-
Test model performance on test dataset, show accuracy, AUC and plot the roc curve.
Additional Task: Write an automatic program for parameter tuning Use a for loop to change the random forest max_depth parameter from 3 to 15 and print the AUC for different max_deptth. Plot a figure where the x axis is max_depth and y-axis is AUC.
Do rename your file to
Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
*Files without name or assignment number will not be submitted
Assignment 5
Task: Imagine you are presenting your findings to the marketing stakeholders.
They want to know:
-
What kind of customers are more likely to be converted? They need actionable insights to guide them
-
How are the machine learning algorithm different from traditional analysis?
-
How can the machine learning algorithm help to better target and convert customer?
-
They want suggestions on how to improve marketing efficiency.
Use slides to communicate your findings to the stakeholders. Make sure to use very solid data analysis results to support the insights and make the presentations logical.
Please submit your assignment 5 under assignment 4 tab, but label it as assignment 5