Intro to MySQL

 
 

Click image for full page PDF

Learning Objectives:

  • Describe how using MySQL differs from spreadsheets

  • Explain the importance of SQL in the work of a data professional

MySQL

Many students are familiar with storing data in a spreadsheet such as Excel or Google Sheets. However, as the size and complexity of data grow, the spreadsheet as a tool for data storage begins to show a number of limitations. These include:

  • Limited amount of data storage

  • Must work on an area of sheet that you can ‘see’

  • Difficult to share

  • Easy to create a ‘mess’ with formulas referenced across many sheets

  • Limited security of sensitive information

From a data management standpoint, spreadsheets (and raw text files) are not adequate. This shortcoming demonstrates the usefulness of a centrally managed system. Such a system is known as a database.

Databases are typically deployed as servers in a client/server setting. Many applications (clients) interact with it for various reasons.

Most databases exist, not for analytics, but to provide support to applications and processes. The data is used to ‘keep state’ in applications. This means that the data is the memory of an organization.

However, with analytics and data science, the database represents an opportunity to extract insight while maintaining integrity. In most cases, the data analyst/data scientist does not set up the database. It already exists as part of the organization and/or product. However, it is critical for all data professionals to understand the fundamentals of how a database works and to interact with databases in order to answer questions of interest.

There are several different types of databases. The most common types include;

  • Relational Databases: Highly structured sets of flat tables that are "related" to each other through the use of keys

  • NoSQL Databases: Flexible databases with no set schema or relationships

  • Big Data Databases: Large scale storage systems that operate across a set of servers or ‘cluster’

For each of these types of databases, the most common way to interact with them is through the use of a ‘client’ using the Structured Query Language (SQL). Almost all production databases will use SQL as the language of choice when dealing with a database. Sometimes databases have their own unique language. However, it is almost always a variation on traditional SQL. This makes the language a key skill for anyone in the data workforce. Often business value can be created just by being an expert at SQL.

Our course will focus on relational databases, and specifically employ MySQL server. We will also take advantage of the MySQL Workbench. To get started, you will need to follow


Installation:

 
 

MySQL Workbench

Click image for full page PDF

Learning Objectives:

  1. Acquire an introduction to MySQL Workbench

  2. Document first queries in MySQL workbench

What is MySQL Workbench?

MySQL Workbench is just a Graphical User Interface (GUI) for us to interact with MySQL, one of the most popular relational databases in the world - through SQL commands. It is not necessary to use a GUI. We could interact with our MySQL databases through the terminal, but the GUI makes it much easier to see our data and create new databases.

Why MySQL Workbench?

We use MySQL Workbench when we are interacting with our databases because it allow us to directly make SQL queries. It also has a great interface where we can map out our tables and establish relationships between them.

Click on (or stay clicked on) the database symbol at the top left, and open your Local instance. You will need to put in your password, which was "root" based on our installation instructions.

Summary

This lesson gave you an opportunity to explore MySQL Workbench, a GUI for interacting with MySQL databases. You loaded an existing .sql file, the "world" database to begin you exploration of the layout and even to make a few queries. We will continue to work with the "world" database in the next few lessons.

(Full Page PDF)


 

Click image for full page PDF

 

Reverse Engineering

Learning Objectives:

  • Explain what reverse engineering is and how to do it in MySQL Workbench

  • Explore an ERD to understand how a database is designed

Reverse Engineering

This lesson uses the "world" database that you loaded into MySQL Workbench in the previous lesson. We are going to create a visual representation of the database, known as an ERD (Entity Relationship Diagram). We are doing this early in the week because the ERD can help you understand how a database is constructed which makes querying it easier.

What is an Entity Relationship Diagram?

An Entity Relationship Diagram is a visual representation of the information in the database and how it is related. It is a "snapshot" of the database architecture that can help you understand how the information you are working with is related. An entity is essentially anything that can have information stored about it: It can be a person, thing, concept or event. In an ERD, each entity is has its own table with entity as the title. The information stored for that entity is called an attribute. The attributes are included under the title in each table. Note that in a table of data, each attribute would be a column.

There are several variations of ERDs, but we are going to focus on the type we can create in MySQL Workbench.

Summary:

This lesson demonstrated the ease with which MySQL Workbench can "reverse engineer " an existing database. The result is a visual representation of the database known as an ERD. It shows each table, the information in it, and the relationships between tables. We will explore ERDs in more detail, but you may find that having this visual available to you early will help you understand the database and aid you in developing your queries.


Click image for full page PDF

SQL Basic Queries

Learning Objectives:

  • Apply basic SQL queries using: SELECT, FROM, LIMIT, ORDER BY, DISTINCT, WHERE, GROUP BY, AS, HAVING

  • Apply SQL Aggregate Functions

SQL Basic Queries

You have had some practice with basic SQL queries using SQL Alchemy within Python from week 12 of the course. We are going to review the basics of SQL queries and give you a chance to practice these in the MySQL Workbench interface.

For additional practice, we recommend trying the tutorials and quizzes found here.

Summary

This lesson reviewed many of the SQL keywords that you will need for basic queries. Continue practicing by making up questions and designing queries to address them. Verify the output to see if it is what you expected.


Click image for full page PDF

Intro to Joins

Learning Objectives:

  • Identify foreign keys in an ERD

  • Apply INNER JOINS to obtain information from multiple tables

  • Use aliasing with table names

What is a Join?

Joins combine columns from one or more tables into a single output table.

There are several types of joins that often can cause confusion especially when first starting out with SQL queries. Before exploring the different types of joins, it can be easiest to become familiar with the most common type: INNER JOIN. In fact, INNER JOIN is so common that simply typing the keyword JOIN will refer to an INNER JOIN.

It can be helpful to learn the applications of joins by working with an existing database and seeing which type of questions require a join. For this lesson, we will continue to explore the "world" database that you should have loaded into MySQL Workbench in an earlier lesson.

Summary:

This lesson introduced the concept of joins by practicing several inner joins. In order to make a join, there must be a common column in each table. A foreign key in one table references the primary key in another. We specify the table name with the column using table.column. It is common to alias table names with the first letter or two of the table name. The nature of databases makes joins an essential SQL command. You will get more comfortable with joins as you continue to practice.


More SQL Joins

Learning Objectives:

  • Differentiate between different type of joins: Inner Join, left join, right join and full outer join

More SQL Joins

You have practiced Inner joins (JOIN). This lesson will continue to develop your conceptual understanding of different joins.

Summary

This lesson demonstrated the difference in results that you would get from applying inner, left, right, and full joins. When there are missing values in your database, it is especially important to consider what you want your results to include. MySQL does not have a FULL OUTER JOIN, but it can be achieved by using UNION with a left and right join. It is helpful to practice all of these joins on a simple database such as the one here so that you can recognize the difference in outputs.

 

Click image for full page PDF


Click for youtube video

SQL Subqueries

Learning Objectives:

  • Describe what a subquery is

  • Explore examples of subqueries in different places

Subqueries

Subqueries are also known as inner queries or nested queries. A subquery is used to perform the query in multiple steps. The subquery is placed in parenthesis and is completed first. The subquery must be able to run on its own. Subqueries can occur anywhere within the main query. To help you understand how subqueries are used, we will look at at example of a subquery occurring in three different places:

  • WHERE

  • SELECT

  • FROM

Summary

This lesson introduces the concept of subqueries by providing examples of how a subquery might be used in different clauses of the main query. It is important to realize that there are usually multiple ways to achieve the output you are seeking, and subqueries are another tool to add to your growing SQL toolbox.

Full page PDF link


Querying MySQL with Python

Objectives:

  • Use Python to query a MySQL database

You have already had some experience using SQLAlchemy to execute SQL queries in a sqlite file within Python. The ability to transition between SQL and Python is an important skill for data scientists. This lesson shows you how to connect to your local MySQL server with pymysql.

Do NOT use Colab!

You will need to work on your local machine to connect to your MySQL server.

Reminder: you should create a new GitHub repository and jupyter notebook for each assignment (except for your Project 3 Parts 1-4).


Summary

This lesson taught you how to connect your local python to your MySQL database to perform SQL queries.

Congratulations! You are now able to query your local MySQL databases with Python!

Click image for full page PDF


 

Click image for full page PDF

Introduction to Database Design

Objectives:

  • What is a relational database

  • What is important to remember about database design?

  • Designing with primary and foreign keys

What is a database?

Databases are merely collections of organized information that can easily be accessed, managed, and updated. As a data scientist, you need to be familiar with how databases are built and understand the rules by which it stores data!

Summary

This lesson provided an overview for understanding relational database design. We use multiple tables that are connected by foreign keys. This ultimately saves storage space and allows us only to query what we need from a given database. As we have already seen, an ERD can help us understand an existing database. It is also a great place to start when designing a database! We will go into more details related to database design in the next few lessons.


Normalization

Objectives:

  • Explain what is meant by database normalization and why it is important

  • Recognize and apply 3 forms of normalization

What is Normalization?

Database normalization is simply a convention for splitting large tables of data into smaller separate tables with the primary goal being to not repeat data.

Why is this so important? Let's say that you wear a watch so you can check the time, because it's very important for you to know what the current time is. Would wearing eight watches make it easier? No way! Now we have eight conflicting accounts of what the proper time is. Worse yet, if we ever want to update the time, we'd have to do it for every watch independently. That's not very efficient!

Summary

This lesson introduced three forms of normalization that you should be aware of when designing a database. You should use these conventions as a guide for designing your ERDs. Always remember, however, that they are common conventions, and not absolute rules. It is possible to take normalization to an extreme. For example, a simple address field. One state can have many cities, one city can have many streets, one street can have many buildings, one building can have many apartments, one apartment can have many residents... and so on. Yikes, that can get really crazy really quickly! Normalization conventions help databases save storage space and eliminate data repetition. It also allows one piece of information to be updated without disrupting the rest of the data. Consider these conventions for normalization when you are using or designing a database.

 

Click image for full page PDF


 

Click image for full page PDF

Database Relationships

Objectives:

  • What are different types of relationship in the context of databases?

  • What is a join table and how is it used?

Relationships allow information to be stored in different tables, yet recombined when needed (by using joins). The two tables will be linked by a foreign key in one table matching the primary key of another.

In this lesson, we will explore three types of relationships:

  • One to One

  • One to many (or Many to One)

  • Many to Many

Summary

You should be aware of the different types of relationships that are commonly found in a relationship database. It is best to "talk out" the relationship to confirm if it is one to one, one to many, or many to many. Understanding how the data is related will help you structure your database or will help you when querying it. One to one and one to many relationships rely on foreign keys, while a many to many relationship uses a separate join table as the "glue" between the two tables.


Conventions and Data Types

Objectives:

  • Familiarity with database conventions

  • Familiarity with the different types of data that can be stored in MySQL and when to use each

Click image for full page PDF

Conventions

We will be following a set of conventions to create our database. We don't have to follow these conventions, but we recommend our students to follow them when setting up a database in this course.  Note that we will be using many databases that do NOT follow these conventions. Still, it is good to be aware of some best practices.  

  1. make the table name plural and ALL lowercase - make it plural (ex. users, leads, sites, clients, chapters, courses, modules)

  2. use "id" as the primary key - name it id (also make it auto-incremented).

  3. name foreign keys with singular_table_name_id when referencing to a primary key in another table name it [singular name of the table you're referring to]_id (ex. user_id, lead_id, site_id, client_id, chapter_id, course_id, module_id).

  4. use created_at and updated_at as columns in your tables

While these are some best practices, most of the SQL assignments in this course will not follow these conventions.  Similarly, down the line, you may find yourself working with a company that has set up their database with different conventions. 

Summary

This lesson introduced some best practices for database conventions. It also listed many of the variable types that you will encounter when using MySQL. This information can serve as a reference when creating a database.

Simple Data Types

The following are the data types that you will be using 95% of the time. Although there are quite a few other data types that you can use, focus on these for now.

  • VARCHAR(number of characters)

    • Used to store non-numeric values that can be up to 255 characters. It is called a VARCHAR because it can store a variable number of characters and will only use the space required for each record that is stored in the database. VARCHAR should be used for values with different character lengths like an email, first_name, or last_name.

  • CHAR(number of characters)

    • Also used to store non-numeric values, however, it will use up all space for the set number of characters regardless of what value is added. For instance, if I set CHAR(15), and I try to store the value "Coding", it will use up the equivalent of 15 characters even though "Coding" is only 6 characters long. Char is good to use for things that will always be a given number of characters. Char would work well for something like a state_abbreviation.

  • INT

    • Used to store integers.

    • The columns that you will find mostly using the INT are things like a unique identifier for each table. The majority of rows in a table will not exceed 2.1 billion records. INT is good to use for most normal number values like a phone_number or a zip_code.

    • unsigned (positive numbers only) - can store numerical values from 0 up to 4294967295

    • signed (positive and negative numbers) - can store numerical values from -2147483648 up to 2147483647

  • BIGINT

    • BIGINT would be used for columns that would need to store huge numbers. In most cases, you wouldn't need BIGINT, but if you wanted to store something like a Facebook id when using Facebook's API, since they have over a billion users the id will need to be a data type of BIGINT.

    • unsigned (again positive numbers only) - can store numerical values from 0 up to 18446744073709551615

    • signed (positive and negative numbers) - can store numerical values from 9223372036854775807 to -9223372036854775808.

  • TINYINT

    • TINYINT would be good to use for numbers that will be relatively small. A good example of something that would use a TINYINT is user level identifier (0 - inactive user, 1 - active user, 9 - admin).

    • unsigned - can store numerical values from 0 up to 255

    • signed - can store numerical values from -128 up to 127

  • FLOAT

    • Used to store floating point numbers (numbers that need to have decimal places). An example column for this would be like an item_cost.

  • TEXT

    • Used to store a large amount of text, like a description, message, or comment. Use this for any text that VARCHAR() is too small to handle.

  • DATETIME

    • used to store a date and time in the format YYYY-MM-DD hh:mm:ss


Entity Relationship Diagrams(ERD) in MySQL Workbench

Click image for full page PDF

Objectives:

  • Review the terminology associated with ERDs

  • Create an ERD in MySQL Workbench

We introduced ERDs in the "Reverse Engineering" lesson. In this lesson we will create the ERD "from scratch" instead of by reverse engineering it from an existing database.

What is an Entity Relationship Diagram?

An Entity Relationship Diagram is a visual representation of the information in the database and how it is related. It is a "snapshot" of the database architecture that can help you understand how the information you are working with is related.

There are several variations of ERDs, but we are going to focus on the type we can create in MySQL Workbench.

Summary

This lesson went over the vocabulary associated with ERDs such as entity, attribute, and primary and foreign keys. You also created an ERD in MySQL Workbench with a one to one relationship. As with any new tool, it can take a bit of practice to get used to, but you will find that the Workbench is a relatively easy way to produce professional looking ERDs. You will get a chance to practice in the next assignment. And (spoiler alert!): you will be able to convert any ERD you make in MySQL Workbench into a database in the process of forward engineering!


Forward Engineering

Learning Objectives:

  • Create a database by forward engineering an ERD in my SQL workbench

What is Forward Engineering?

The process of designing the ERD first and then creating the database is called "Forward Engineering." MySQL workbench provides us with an easy way to actually create a database easily from the ERD we created.

By using the "Forward Engineering" tool, MySQL workbench will create a SQL script that creates the tables from the information provided in the diagram.

Summary


This lesson showed you how to produce a database from an ERD that you created in MySQL workbench. This process is called "Forward Engineering" because typically you would create the ERD in the early stages of the database design process and then create the database based on the model. (Note that creating a model from an existing database is known as "reverse engineering"). MySQL Workbench allows for these models to be converted into actual databases without the designer needing to write the SQL syntax required to create the database.

Click image for full page PDF


CRUD

Click image for full page PDF

Learning Objectives: Explore SQL commands for the create, update, and delete aspects of CRUD

As a data professional, you should be able to comfortably interact with a database as well as understand how to modify a database. 
Database creation and management falls under the acronym CRUD which stands for 

  • CREATE

  • READ

  • UPDATE

  • DELETE

    The queries you have been executing to extract information from the database all fall under the "Read" category.  This lesson will explore the other 3 steps.  

Summary

This lesson introduced the concept of CRUD which you will hear in the data world. It is an acronym of  all things you may do with a database: create, read, update, and delete.  This lesson shows some of the most common commands you will use. When you have additional specific tasks, you will want to continue to research SQL commands.


Click image for full page PDF

Database Admin with Python

Objectives:

  • Use Python to create a MySQL database

  • Save a pandas data frame as a table in a MySQL database

You have already had some experience using SQLAlchemy to execute SQL queries within Python. The ability to transition between SQL and Python is an important skill for data scientists.  This lesson shows you how to create a MySQL database using python, and then how to add pandas data frames to the database. When you are finished, you will be able to open your database in MySQL workbench.


Exporting MySQL Databases

To share a MySQL Database you've created, you can export the entire database to a .sql file, just like the sakila and world databases you installed earlier this week! Visual step by step instructions here.


Optional Reading Materials