TDM 20200: Project 13 — 2023
Motivation: Data wrangling tasks can vary between projects. Examples include joining multiple data sources, removing data that is irrelevant to the project, handling outliers, etc. Although we’ve practiced some of these skills, it is always worth it to spend some extra time to master tidying up our data.
Context: We will continue to gain familiarity with the tidyverse
suite of packages (including ggplot
), and data wrangling tasks.
Scope: R, tidyverse, ggplot
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/consumer_complaints/complaints.csv
Questions
Question 1
Just like in Project 12, we will start by loading tidyverse
package and reading (using the read_csv
function) the processed data complaints.csv
into a tibble called dat
. Make sure to also load the lubridate
package.
In project 12, we stated that that we want to improve consumer satisfaction, and to achieve that we will provide them with data-driven tips and plots to make informed decisions.
We started by providing some information on wait time, timely_response
, and it’s association with how the complaint was submitted (submitted_via
).
Let’s continue our exploration of this dataset to provide valuable information to clients. Create a new column called day_sent_to_company
that contais the day of the week the complaint was sent to the company (date_sent_to_company
). To create the new data, use some of your code from project 12 that changes the format of date_sent_to_company
to the correct format, pipes (%>%
), and the appropriate function from lubridate
.
Some students asked about whether or not you need to use the pipes (
Some people like it, some don’t. You can draw your own conclusions, but I’d give all common methods a shot to see what you prefer. |
Also, try to keep using the pipes (%>%
) for the entire project (again, you don’t need to, but we’d encourage you to try). Your code for question one should look something like this:
dat <- dat %>%
insert_correct_function_here(
insert_code_to_change_weekday_complaint_sent_to_proper_format_here,
insert_code_to_get_day_of_the_week_here
)
You may want to use the argument |
-
Code used to solve this problem.
-
Output from running the code.
Question 2
Before we continue with all of the fun, let’s do some sanity checks on the column we just created. Sanity checks are an important part any data science project, and should be performed regularly.
Use your critical thinking to perform at least two sanity checks on the new column day_sent_to_company
. The idea is to take a quick look at the data from this column and check if it seems to make sense to us. Sometimes we know the exact values we should get and that helps be even more certain. Sometimes those sanity checks are not as foolproof, and are just ways to get a feel for the data and make sure nothing is looking weird right away.
-
Code used to solve this problem.
-
Output from running the code.
-
1-2 sentences explaining what you used to perform sanity checks, and what are your results. Do you feel comfortable moving forward with this new column?
Question 3
Using your code from questions 1 and 2, create another new column called day_received
that is the week day the complaint was received. Use sanity checks to double check that everything seems to be in order.
Let’s use these new columns and make some additional recommendations to our consumers! Using at least one of the columns day_received
and day_sent_to_company
with the rest of the data to see whether the consumer disputed the result (consumer_disputed
), create a tip or a plot to help consumer make decisions.
Note that the column |
-
Code used to solve this problem.
-
Output from running the code.
-
Recommendation for consumer in form of a chart with a legend and/or tip.
Question 4
Looking at the columns we have in the dataset, come up with a question whose answer can be used to help consumers make decisions. It is ok if the answer to your question doesn’t provide the most insightful information — for instance, finding out two variables are not correlated can still be valuable information!
Use your skills to answer the question. Transform your answer to a "tip" with an accompanying plot.
-
Code used to solve this problem.
-
Output from running the code.
-
1-2 sentences with your question.
-
Answer to your question.
-
Recommendation to consumer via tip and plot.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |