top of page

Assignment Questions
Data Analytics

Assignment 1

Please go to the following link:

Instruction

Use the data from the databases on the right side, write SQL, and generate the results. Write down the SQL and screenshot the results generated (if table too big, just screenshot the top rows)

Ex #1: Find all customers that are from Germany.

Ex #2: Find names of products that the prices are between 15 and 22

Ex #3: Find the average quantity of each order

Ex #4: Find all the Orders that bought Boston Crab Meat

Ex #5: Count the number of orders that bought Boston Crab Meat

Ex #6: Find the average spending of each order

Upload

Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
**Files without name or assignment number will not be submitted

Upload

Assignment 2

Your content has been submitted

An error occurred. Try again later

Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
**Files without name or assignment number will not be submitted

Upload

Your content has been submitted

An error occurred. Try again later

Instruction

Dataset used in the course df_dataset = pd.read_csv('http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv')[KHT1] 

Website for your practice and self-learning https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python

 

Python Assignment

This ecommerce dataset records the attributes of customers including their basic demographic information, spending on the platform and response to the marketing campaigns.

AcceptedCmp1 - 1 if customer accepted the offer in the 1st campaign, 0 otherwise
 

AcceptedCmp2 - 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
 

AcceptedCmp3 - 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
 

AcceptedCmp4 - 1 if customer accepted the offer in the 4th campaign, 0 otherwise
 

AcceptedCmp5 - 1 if customer accepted the offer in the 5th campaign, 0 otherwise
 

Response (target) - 1 if customer accepted the offer in the last campaign, 0 otherwise
 

Complain - 1 if customer complained in the last 2 years
 

DtCustomer - date of customer’s enrolment with the company
 

Education - customer’s level of education
 

Marital - customer’s marital status
 

Kidhome - number of small children in customer’s household
 

Teenhome - number of teenagers in customer’s household
 

Income - customer’s yearly household income
 

MntA - amount spent on A products in the last 2 years
 

MntB - amount spent on B products in the last 2 years
 

MntC - amount spent on C products in the last 2 years
 

MntD - amount spent on D products in the last 2 years
 

MntE - amount spent on E products in the last 2 years
 

MntF - amount spent on F products in the last 2 years
 

NumDealsPurchases - number of purchases made with discount
 

NumCatalogPurchases - number of purchases made using catalogue
 

NumStorePurchases - number of purchases made directly in stores
 

NumWebPurchases - number of purchases made through company’s web site
 

NumWebVisitsMonth - number of visits to company’s web site in the last month
 

Recency - number of days since the last purchase

 

Task

  1. Create your Python Jupyter notebook.

  2. Import relevant Python package (e.g. pandas)

  3. Read in the data set as a pandas dataframe

  4. Understand the dataset

  5. Use what you have learned to answer the following questions:

  6. Display the first 20 rows of the dataframe

  7. Sort the dataframe by Year_birth on ascending order

  8. Filter the dataframe where customers were born after 1985

  9. Count the number of customers was born after 1985

  10. What is the average income for customers with different education level?

  11. For those accept / reject the offer (reponse), what are their average spending in category A, B, C, D, E and F

 [KHT1]Website not found, but we do have the excel file.

Assignment 3

Your content has been submitted

Your content has been submitted

Using the eCommerce dataset: ecommerce_marketing_campaign.csv

Can use “size” argument to change the size of the figure, try to create the visualizations using seaborn:
 

  1. Read in the data as Pandas dataframe, create a new column named “Age”, fill in the “Age” column by calculating age using birth year.
     

  2. Create a regression plot to show the relationship between “Age” and “Income”, for customers with different “Education”
     

  3. Create a histogram that shows for people with different “Marital_Status” and break down by “Education”, what is their response rate to marketing campaigns (“Response”)
     

  4. Create a density plot to show the distribution “Income” for customers with different “Education” level
     

  5. Open exploration:
     

Our goal is find out what kind of customers are more likely to response to our promotion (what kind of customer might have “Reponse”==1)?

Explore throughout the feature and try to find insights and visualize it.

Sample

Regression | Histogram | Density

1.png
2.png
3.png

Do rename your file to Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
*Files without name or assignment number will not be submitted

Upload

Assignment 4

Task: use the Ecommerce dataset to understand what kind of customers / behaviors have higher response to the marketing promotions.

For each step, please write down the markdown to describe your step and also add your comments to record your logic
 

  1. Create a jupyter notebook, name the notebook as: Machine learning assignment-’your name’.
     

  2. Import all necessary packages.
     

  3. Read in eCommerce dataset as a pandas dataframe
     

  4. Do the exploratory data analysis (you can just use your visualizations from assignment 3).
     

  5. Build a machine model use random forest (remember data preprocessing, train test split).
     

  6. Test model performance on test dataset, show accuracy, AUC and plot the roc curve.
     

Additional Task: Write an automatic program for parameter tuning Use a for loop to change the random forest max_depth parameter from 3 to 15 and print the AUC for different max_deptth. Plot a figure where the x axis is max_depth and y-axis is AUC.

Do rename your file to

Name_AssignmentType_Date
(Eg. Eunice_A1_2Apr)
(Eg. Ben_A3_4Apr)
*Files without name or assignment number will not be submitted

Upload

Assignment 5

Task: Imagine you are presenting your findings to the marketing stakeholders.

They want to know:

 

  1. What kind of customers are more likely to be converted? They need actionable insights to guide them
     

  2. How are the machine learning algorithm different from traditional analysis?
     

  3. How can the machine learning algorithm help to better target and convert customer?
     

  4. They want suggestions on how to improve marketing efficiency.

Use slides to communicate your findings to the stakeholders. Make sure to use very solid data analysis results to support the insights and make the presentations logical.

Please submit your assignment 5 under assignment 4 tab, but label it as assignment 5

bottom of page