Stack 4 - Week 3 - Core 002 - Hypothesis Testing with Insurance Data

For this assignment, we will be working with the US Health Insurance Dataset from Kaggle.

We have been asked to use our hypothesis testing skills to answer the following questions:

- Q1. Do smokers have higher insurance charges than non-smokers?
- Q2. Are men more likely to smoke than women?
- Q3. Do different regions have different charges, on average?

For each question, make sure to:

  1. State your Null Hypothesis and Alternative Hypothesis

  2. Select the correct test according to the data type and number of samples

  3. Test the assumptions of your selected test.

  4. Execute the selected test, or the alternative test (if you do not meet the assumptions)

  5. Interpret your p-value and reject or fail to reject your null hypothesis 

  6. Show a supporting visualization that helps display the result

My Submission

Stack 4 - Week 3 - Practice 001 - CDF to Calculate Probabilities

We will use the human height data set:

Use the normal cumulative distribution function with the mean and standard deviation of female height to calculate the probability that a female's height is:

1. between 55.0 and 56.0 inches

2. is less than 5 feet (60 in)

3. is greater than the mean (hint: do you know this answer intuitively?)

Save your code to your Github repository and submit the link

My Submission

Stack 4 - Week 3 - Core 001 - Describing Distributions (Core)

In this assignment, you will be analyzing & visualizing several features in the Medical Dataset.

  • The features to analyze: 

    • VitD_levels

    • Doc_visits

    • TotalCharge

For each feature listed:

  1. Plot a histogram with a kde (kernel density estimate)

    1. Add a line for the mean (red)

    2. Add a line for the median (green)

    3. Add a line for for +1 std from the mean (black)

    4. Add a line for the - 1 std from the mean (black)

    5. Highlight the range between +1 and =1 std (yellow)

  2. Answer the following questions:

    • Is it Discrete or Continuous?

    • Does it have a skew? If so, which direction (+/-)

    • What type of kurtosis does it display? (Mesokurtic, Leptokurtic, Platykurtic)

My Submission

Stack 4 - Week 3 - Project 3 - Part 1 (Core)

The project assignment is at the beginning of this week because you already have all of the background to complete project part 3 based on the first two weeks of the course!

Business Problem

For this project, you have been hired to produce a MySQL database on Movies from a subset of IMDB's publicly available dataset. Ultimately, you will use this database to analyze what makes a movie successful, and will provide recommendations to the stakeholder on how to make a successful movie.

Over the course of this project, you will:

  • Part 1: Download several files from IMDB’s movie data set and filter out the subset of moves requested by the stakeholder.

  • Part 2: Use an API to extract box office revenue and profit data to add to your IMDB data and perform exploratory data analysis.

  • Part 3: Construct and export a MySQL database using your data.

  • Part 4: Apply hypothesis testing to explore what makes a movie successful.

  • Part 5 (Optional): Produce a Linear Regression model to predict movie performance.

My submission

Full Assignment PDF

Stack 4 - Week 2 - Interview Questions

This assignment is optional! If you choose to complete it, your task is to submit a document (word document, Google doc, markdown file, text file, etc.) with your answers to each of the questions below. 

These questions can typically be answered in a couple of sentences, but I encourage you to think about how you would answer these out loud - or even practice answering them out loud - since that is how you would answer them in an actual interview.

It also might be a good idea to keep these somewhere to review for after graduation when you are job hunting! Creating flashcards with potential technical interview questions is a good strategy.

Data Enrichment Week 2 Questions

  1. What are the key steps of the ETL process?

  2. Why is ETL testing important and how can it be done?

  3. Why is data warehousing important?

  4. How is data analyzed in ETL?

  5. What is data profiling in ETL?

  6. What is the difference between an initial load and an incremental load in ETL?

  7. What is the role of ETL in the data mining process?

  8. What are the Enterprise Software ETL Tools?

  9. What are the Cloud-Based ETL Tools?

  10. What is an API?

  11. What are the Limits of API Usage?

  12. Who can use a Web API?

  13. What is API documentation?

  14. How often are the APIs changed and, more importantly, deprecated?

  15. What is REST?

  16. What is a “Resource” in REST?

  17. What are API calls?

Week 2 Data Enrichment Interview Questions Solutions

Answers

Stack 4 - Week 2 - Core 002 - Applying Advanced Transformations

The Data

You will be working with a heavily modified version of the Superheroes dataset from Kaggle.

The dataset includes two csv's:

The Task

Your task is two-fold:

I. Clean the files and combine them into one final DataFrame.

  • This dataframe should have the following columns:

    • Hero (Just the name of the Hero)

    • Publisher

    • Gender

    • Eye color

    • Race

    • Hair color

    • Height (numeric)

    • Skin color

    • Alignment

    • Weight (numeric)

    • Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:

      • Agility

      • Flight

      • Superspeed

      • etc.

Hint: There is a space in "100 kg" or "52.5 cm"

II. Use your combined DataFrame to answer the following questions.

  1. Compare the average weight of super powers who have Super Speed to those who do not.

  2. What is the average height of heroes for each publisher?

(Source)

Submit your notebook or a link to a GitHub repository with your work.

Complete Project

My Submission

Stack 4 - Week 2 - Project3 - Part 2 - Extraction from TMDB

Business Problem

For this project, you have been hired to produce a MySQL database on Movies from a subset of IMDB's publicly available dataset. Ultimately, you will use this database to analyze what makes a movie successful, and will provide recommendations to the stakeholder on how to make a successful movie.

Over the course of this project, you will:

  • Part 1: Download several files from IMDB’s movie data set and filter out the subset of moves requested by the stakeholder.

  • Part 2: Use an API to extract box office revenue and profit data to add to your IMDB data and perform exploratory data analysis.

  • Part 3: Construct and export a MySQL database using your data.

  • Part 4: Apply hypothesis testing to explore what makes a movie successful.

  • Part 5 (Optional): Produce a Linear Regression model to predict movie performance.

Complete Assignment page


Stack 4 - Week 1 - Core 001 - Efficient Yelp API Calls

For this assignment, you will be working with the Yelp API.

As before, you will use the Yelp API to search your favorite city for a cuisine type of your choice.

Extract all of the results from your search and compile them into one dataframe using a for loop as shown in the lesson "Code for Efficient API Extraction"

Save your notebook, commit the change to your repository and submit the repository URL for this assignment.

My Github Submission

Stack 4 - Week 2 - Practice 002 - Using the Yelp API

For this assignment you will practice with the Yelp API and the concept of pagination.

  • Use your API credentials to access the the Yelp api

  • You can choose the location and search term (food), but it must return more than 20 results so you can practice pagination!

  • Save the businesses as a records-oriented JSON file. (df.to_json(orient='records'))

  • Obtain ONLY the first two pages of results

  • Concatenate the results into one data frame

Here is a sample solution for you to explore.

Stack 4 - Week 2 - Practice 001 - Create and Save your Yelp API Key

While this assignment is optional, you can use the notebook & repository created in this practice assignment as a starting point for your first Core assignment.

  • Follow the instructions from the previous 2 lessons to get your Yelp Fusion API credentials. 

  • Save them locally in a yelp-api.json file located in a ".secret/" folder inside your user folder. 

  • Create a new repository with GitHub desktop. 

  • Load in your API credentials using the JSON module and display what keys are in the dictionary.

    •   DO NOT DISPLAY THE VALUES OF THIS DICTIONARY!!!

  • Save, commit, and push your work to GitHub.

Stack 4 - Week 1 - Core 002 - Books

Assignment:

Consider the following "flat" file that a start-up has just started using for its first customers: Client's Original File. They quickly realized that saving this information in .csv format will not meet their needs as they grow. First, consider how you would design a relational database to meet their needs. Be sure to consider conventions of normalization and what information should be separated.

Read More

Stack 4 - Week 1 - Core 001 - Queries: Sakila

Welcome to another Core assignment! Some students like to explore the assignments before they're finished reading through the lessons, and that's okay! It can be good for your brain to have a preview of what your future challenges might be. However, before you begin this assignment, it's important that you've first:

  • Completed the preceding lesson modules

  • Taken the knowledge checks to confirm your understanding

  • Viewed lecture material related to the assignment topics

  • Completed and submitted your practice assignments

Read More

Stack 4 - Week 1 - Project 3 - Part 1 - Business Problem(IMDB)

Welcome to another Core assignment! Some students like to explore the assignments before they're finished reading through the lessons, and that's okay! It can be good for your brain to have a preview of what your future challenges might be. However, before you begin this assignment, it's important that you've first:

  • Completed the preceding lesson modules

  • Taken the knowledge checks to confirm your understanding

  • Viewed lecture material related to the assignment topics

  • Completed and submitted your practice assignments

Read More