(Solved): WPC 300 Assignment | Assignment 4...
WPC 300 (Hybrid Course)
Maximum points: 25
Student Name: _[Mary Alfansha ]___________________ Class Day &Time:__[Monday & 9.15 AM]_
This assignment builds on the ETL In-Class Assignment using the Excel workbook “ETL Exercise.xlsx.” If you haven’t finished that yet, you’ll need to do that before starting this.
- Save the “ETL Exercise.xlsx” as FullName_ETLExercise.xlsx. Save this document as YourFullName_AssignmentETL.docx.
- Finish all tasks below
- the answers to the questions on a new blank word document
- the final version of your YourFullName_ETLExercise.xlsx worksheet.
The In-Class Exercise involved a scenario where you brought together two different data sets from two sources. Each data set contained a group of orders by a group of customers, and those customers did not overlap (no customer was in both data sets).
For this assignment, you’ll be building on that data set by adding new fields to the “Full Set” worksheet. Instead of adding new rows, this time you’ll be adding new columns. The data will come from the “Source 3” worksheet (also in the workbook).
Your submission will be graded based on two factors:
- The correctness of the answers to the questions.
- The accuracy of the “Full Set” worksheet in the “ETL Exercise.xlsx” workbook.
Data preparation for answering assignment questions.
Part 0: Complete the lab for week-5 [5 points]
Part 1: Add the “Credit Line” information from “Source 3” to “Full Set”
We are changing the original credit line policy so that a minimum credit line of $2,000 is being established, even if the customer had an original credit line of $0, it is now changed to $2,000. Create a new column called “New Credit Line” in Source 3 worksheet to reflect these changes. Use the VLOOKUP() function to put this updated information into the “Full Set” worksheet. You will notice that even if you do it correctly, there will be some errors (“N/A” values).
Question: Which customer(s) lacks usable data when you apply the VLOOKUP() function? Explain what is causing this problem? [5 points]
After answering the question above, make the necessary change to the Source 3 worksheet to correct the issue so that usable Credit Line data appears for all the customers.
Part 2: Add the “Missed Payments” information from “Source 3” to “Full Set”
We have observed an anomaly in the “Missed Payment” field. In the Source 3 worksheet, currently if a customer has no missed payments, their value for that field is “NONE”. We want to add a new column called “Missed Payments 2” to convert the string value of “NONE” to a numeric value of 0. Use the IF() function to do this. Once you do the transformation, use the VLOOKUP() function to bring the data in the “Missed Payments 2” column into the “Full Set” worksheet.
Question: Write the data transformation rule for the missed payment field (list the syntax of the IF() function and explain the criteria you used to transform the data). [5 points]
Part 3: Add the “Country” information from “Source 3” to “Full Set”
We have observed a lack of consistency in the “Country” field. Notice that “United States” string is represented in several different ways. Choose one of those representations and transform the remaining “Country” data so that the value for the United States is consistent across all customers. Create “Country 2” column and use it to hold the transformed data. Use an IF() statement to transform the data. Then use the VLOOKUP() function to bring the data into the “Full Set” worksheet.
Question: Write the data transformation rule for the country field (list the syntax of the IF() function and explain the criteria you used to transform the data). [5 points]
Part 4: Applying ETL
Question: Using this assignment, briefly illustrate how ETL plays a critical role in analytics. [5 points]