- Get link
- X
- Other Apps
An Introductory SQL Tutorial: How to Write Simple Queries
Ever heard of the computer language called SQL? You may have heard about it in the context of data analysis, but never really thought it would apply to you as a marketer. Or, you may have thought to yourself, "That's for the really advanced data users. I could never do that."
Well, you couldn't be more wrong! The most successful marketers are data-driven, and one of the most important parts of being data-driven is being able to collect data from databases quickly. SQL happens to be one of the best and most popular tools out there for doing just that.
SQL stands for Structured Query Language, and it's used when companies have a ton of data that they want to manipulate in an easy and quick way. If your company already stores data in a database, you may need to learn SQL to access the data. But not to worry -- you're in the right place to get started!
Before we begin, make sure that you have a database management application that will allow you to pull data from your database. Some options include MySQL Workbench or Sequel Pro. Start by downloading one of these options, and then talk to your company about how to connect to your database. The option that you choose will depend on your product's backend, so check with your product team to make sure you select the correct one.
Let's jump right in.
Why Use SQL?
The beauty of SQL is that anyone working at a company that stores data in a relational database can use it. (And chances are, yours does.)
If you work for a software company and want to pull usage data on your customers, you can do that using SQL. If you work for an ecommerce company that has data about customer purchases, you can use SQL to find out which customers are purchasing which products. Of course, these are just a few of many, many examples.
Think about it this way: Have you ever opened a very large data set in Excel, only for your computer to freeze or even shut down? SQL allows you to access only certain parts of your data at a time so you don't have to download the data into a CSV, manipulate it, and possibly overload Excel. In other words, SQL takes care of the data analysis that you may be used to doing in Excel. (If you want to dig into this aspect of SQL a bit more, here is a blog post to get you started.)
How to Write Simple SQL Queries
Understand the hierarchy of your database
Before you get started, it's important to become accustomed to your database and its hierarchy. If you have multiple databases of data, you'll need to zero in on the location of the data you want to work with.
For example, let's pretend we're working with multiple databases about people in the United States. Type in the query "SHOW DATABASES;". Our results may show that you have a couple of databases for different locations, including one for New England.
Within your database, you'll have different tables containing the data you want to work with. Using the same example above, let's say we want to find out which information is contained in one of the databases. If we use the query "SHOW TABLES in NewEngland;", we'll find that we have tables for each state in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Finally, you need to find out which fields are in the tables. Fields are the specific pieces of data that you can pull from your database. For example, if you want to pull someone's address, the field name may not just be "address" -- it may be separated into address_city, address_state, address_zip. In order to figure this out, use the query "Describe people_massachusetts;". That will provide a list of all of the data that you can pull using SQL.
Let's do a quick review of the hierarchy using our New England example:
- Our database is: NewEngland.
- Our tables within that database are: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
- Our fields within the people_massachusetts table include: address_city, address_state, address_zip, hair_color, first_name, and last_name.
Now, to learn how to write a simple SQL query, let's use the following example:
Who are the people who have red hair in Massachusetts and were born in 2003 organized in alphabetical order?
SELECT
SELECT chooses the fields that you want displayed in your chart. This is the specific piece of information that you want to pull from your database. In the example above, we want to find the people who fit the rest of the criteria.
Here is our SQL query:
SELECT
first_name,
last_name
first_name,
last_name
FROM
FROM pinpoints the table that you want to pull the data from. In the earlier section, we found that there were six tables for each of the six states in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont. Because we're looking for people in Massachusetts specifically, we'll pull data from that specific table.
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
first_name,
last_name
FROM
people_massachusetts
WHERE
WHERE allows you to filter your query to be more specific. In our example, we want to filter our query to include only people with red hair who were born in 2003. Let's start with the red hair filter.
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
hair_color could have been part of your initial SELECT statement if you'd wanted to look at all of the people in Massachusetts along with their specific hair color. But if you want to filter to see only people with red hair, you can do so in the WHERE statement.
AND
AND allows you to add additional criteria to your WHERE statement. Remember, we want to filter by people who had red hair in addition to people who were born in 2003. Since our WHERE statement is taken up by the red hair criteria, how can we filter by a specific year of birth as well?
That's where the AND statement comes in. In this case, the AND statement is a date property -- but it doesn't necessary have to be. (Note: Be to check the format of your dates with your product team to make sure it is in the correct format.)
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
When you create SQL queries, you shouldn't have to export the data to Excel. The calculation and organization should be done within the query. That's where the "ORDER BY" and "GROUP BY" functions come in. First, we'll look at our SQL queries with the ORDER BY and then GROUP BY functions, respectively. Then, we'll take a brief look at the difference between the two.
Your ORDER BY clause will allow you to sort by any of the fields that you have specified in the SELECT statement. In this case, let's order by last name.
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
;
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
;
GROUP BY
"GROUP BY" is similar to "ORDER BY," but it will aggregate data that has similarities. For example, if you have any duplicates in your data, iyou can use "GROUP BY" to count the number of duplicates in your fields.
Here is your SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
GROUP BY
last_name
;
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
GROUP BY
last_name
;
ORDER BY VS. GROUP BY
To clearly show you the difference between an "ORDER BY" statement and a "GROUP BY" statement, let's step outside our Massachusetts example briefly to look at a very simple dataset. Below is a list of four employees' ID numbers and names.
If we were to use an ORDER BY statement on this list, the names of the employees would get sorted in alphabetical order. The results would look like this:
If we were to use a GROUP BY statement, the employees would be counted based on the number of times they appeared in the initial table. Note that Peter appeared twice in the initial table. The results would look like this:
With me so far? Okay. Let's return to the SQL query we've been creating about red-haired people in Massachusetts who were born in 2003.
LIMIT
Depending on the amount of data you have in your database, it may take a long time to run the queries. It can be frustrating if you find yourself waiting a long time to run a query that you didn't really want to begin with. If you want to test our query, the LIMIT function is a great one to use because it allows you to limit the number of results you get.
For example, if we suspect there are millions of people who have red hair in Massachusetts, we may want to test out our query using LIMIT before we run it in full to make sure we're getting the information we want. Let's say, for instance, we only want to see the first 100 people.
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
LIMIT
100
;
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
LIMIT
100
;
That's it for the basics!
Feeling good? Here are a few other ways to take your SQL queries up a notch.
Bonus: Advanced SQL Tips
Now that you have mastered how to create a SQL query, let's walk through some other tricks that you can use to take it up a notch, starting with the asterisk.
*
When you add an asterisk to one of your SQL queries, it tells the query that you want to include all the columns of data in your results. In the example we've been using, we've only had two column names: first_name and last_name. But let's say we had 15 columns' worth of data that we want to see in our results -- it would be kind of a pain to type out all 15 column names in the SELECT statement. Instead, if you replace the names of those columns with an asterisk, the query will know to pull all of the columns in to the results.
Here's what the SQL query would look like:
SELECT
*
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
LIMIT
100
;
*
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
LIMIT
100
;
LAST 30 DAYS
Once I started using SQL regularly, I found that one of my go-to queries involved trying to find which people took an action or fulfilled a certain set of criteria within the last 30 days. Since this type of query was so useful for me, I wanted to share that capability with you.
Let's pretend today is December 1, 2014. You could create these parameters by making the birth_date span between November 1, 2014 and November 30, 2014. That SQL query would look like this:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2014-11-01' AND '2014-11-30'
ORDER BY
last_name
LIMIT
100
;
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2014-11-01' AND '2014-11-30'
ORDER BY
last_name
LIMIT
100
;
But that would require thinking about which dates cover the last 30 days, and it would mean you'd have to constantly update this query. Instead, to make the dates automatically span the last 30 days no matter which day it is, you can type this under AND: birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30.
(Note: You'll want to double-check this syntax with your product team because it may differ based on the software you use to pull your SQL queries.)
Your SQL query would therefore look like this:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))
ORDER BY
last_name
LIMIT
100
;
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = "red"
AND
birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))
ORDER BY
last_name
LIMIT
100
;
COUNT
In some cases, you may want to count the number of times that a criterium of a field appears. For example, let's say you want to count the number of times the different hair colors appear for the people you are tallying up from Massachusetts. In this case, COUNT will come in handy so you don't have to manually add up the number of people who have different hair colors or export that information to Excel.
Here's what that SQL query would look like:
SELECT
hair_color,
COUNT(hair_color)
FROM
people_massachusetts
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
GROUP BY
hair_color
;
hair_color,
COUNT(hair_color)
FROM
people_massachusetts
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
GROUP BY
hair_color
;
JOIN
There may be a time where you need to access information from two different tables in one SQL query. In SQL, you can use a JOIN clause to do this. (For those of you familiar with Excel formulas, this is similar to how you would use the VLOOKUP formula when you need to combine information from two different sheets in Excel.)
For example, let's say we have one table that has data of all Massachusetts residents' user IDs and their birthdates. Let's say we also have an entirely separate table that has data of all Masachusetts residents' user IDs and their hair color. If we want to figure out the hair color of Massachusetts residents born in the year 2003, we'd need to access information from both tables and combine them. This works because both tables share a matching column: the Massachusetts residents' user IDs.
Because we're calling out fields from two different tables, our SELECT statement is also going to change slightly. Instead of just listing out the fields we want to include in our results, we'll need to specify which table they're coming from. (Note: The asterisk function may come in handy here so your query includes both tables in your results.)
To specify a field from a specific table, all we'd have to do is combine the name of the table with the name of the field. For example, our SELECT statement would say "table.field" -- with the period separating the table name and the field name.
Let's take a look at what this looks like in action.
We're assuming a few things in this case:
- The Massachusetts birthdate table includes the following fields: first_name, last_name, user_id, birthdate
- The Massachusetts hair color table includes the following fields: user_id, hair_color
Your SQL query would therefore look like:
SELECT
birthdate_massachusetts.first_name,
birthdate_massachusetts.last_name
FROM
birthdate_massachusetts JOIN haircolor_massachusetts USING (user_id)
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
;
birthdate_massachusetts.first_name,
birthdate_massachusetts.last_name
FROM
birthdate_massachusetts JOIN haircolor_massachusetts USING (user_id)
WHERE
hair_color = "red"
AND
birth_date BETWEEN '2003-01-01' AND '2003-12-31'
ORDER BY
last_name
;
This query would join the two tables using the field "user_id" which appears in both the birthdate_massachusetts table and the haircolor_massachusetts table. You would then be able to see a table of people born in 2003 who have red hair.
Congratulations: You're ready to get started with your own SQL queries! While there's a lot more you can do with SQL, I hope you found this overviewof the basics helpful so you can get your handsdirty. With a strong foundation of the basics, you'll be able to navigate SQL better and work toward some of the more complex examples.
- Get link
- X
- Other Apps
Comments
Post a Comment