Notes

Intermediate SQL

Edit on GitHub

Databases
7 minutes

In database management an aggregate function is a function where the values of multiple rows are grouped together to form a single value of more significant meaning or measurement such as a set, a bag or a list. Wikipedia

Common Aggregate Functions

  • count() returns total number of rows that match our search
  • sum() returns added sum of values for a group of rows
  • avg() returns calculated average value for a group of rows
  • max() returns largest value in a group of rows
  • min() returns smallest value in a group of rows
  • sum(), average(), max(), min() only work if the values are numbers (numeric values)
  • count() doesn’t count null values
  • count( * ) will count all the rows, regardless of the null values
 1SELECT count( * ) FROM movies; # count all rows in movies database
 2
 3SELECT count( title ) FROM movies; # count all rows in title column, which are not `null`
 4
 5SELECT sum( cost ) FROM movies; # product cost for all movies
 6
 7SELECT avg( tickets ) FROM movies;  # average ticket cost
 8
 9SELECT max( tickets ) FROM movies; # most popular movie, by determining the max amount of tickets sold
10
11SELECT min( tickets ) FROM movies; # least popular movie, by determining the max amount of tickets sold
12
13SELECT max( tickets ), min( tickets ) FROM movies; # multiple functions = multiple values

Grouping results

  • GROUP BY groups results by column values
1# SELECT ___, aggregateFunction( ___ ) FROM ___ GROUP BY ___
2# SELECT columnName, aggregateFunction( columnName ) FROM tableName GROUP BY columnName
3
4SELECT genre, sum(cost)
5FROM movies
6GROUP BY genre # returns lists of all genres & sum total of costs for each of them

Grouping with conditions

  • HAVING is the keyword for inclusion, show results only if they have met the condition
  • HAVNIG works in conjunction with an aggregate function
 1# SELECT ___, aggregateFunction( ___ ) 
 2# FROM ___ 
 3# GROUP BY ___ 
 4# HAVING aggregate_functino(___) operator value;
 5
 6# SELECT columnName, aggregateFunction( columnName ) FROM tableName GROUP BY columnName HAVING aggregate_functino(columnName) operator value;
 7
 8SELECT genre, sum(cost)
 9FROM movies
10GROUP BY genre 
11HAVING count(*) > 1 # returns lists of all genres (grouped) & sum total of costs for each of them, only if there are more than one row
  • You can add a WHERE clause before GROUP BY to further refine your query

Constraints

Constraints are meant to avoid addition of bad or unwanted data. They provide additional validation

  • NOT NULL prevents NULL values
  • UNIQUE Ensures column values are unique (prevents duplicates)
  • PRIMARY KEY unique identifiers (both UNIQUE and NOT NULL). Can only be assigned once per table (i.e. you can not have more than one primary key column).
  • PRIMARY KEY can only be defined on only one column (once per table) while UNIQUE and NOT NULL can be used on multiple columns

Column constraints syntax

 1CREATE TABLE Promotions
 2(
 3id int PRIMARY KEY, # name must be NOT NULL, must be UNIQUE
 4name varchar(150) NOT NULL, # name can not be null
 5category varchar(15)
 6);
 7
 8
 9CREATE TABLE Promotions
10(
11id int PRIMARY KEY,
12name varchar(150) NOT NULL UNIQUE, # name can't be null, or the same as another value already in column
13category varchar(15)
14);
ERROR: duplicate key value violates unique constraint "promotions_name_key"
DETAIL:  Key (name)=(Matinee) already exists.

Table constraints syntax

You can define your constraints (i.e. assign it a name). Makes it easy to edit them later. Also, you get the constraint name in the error message instead of the key value

 1CREATE TABLE Promotions
 2(
 3id int PRIMARY KEY,
 4name varchar(150) NOT NULL, 
 5category varchar(15)
 6
 7CONSTRAINT unique_name UNIQUE (name) # unique_name is what we assigned for our custom constraint name
 8);
 9
10CREATE TABLE Promotions
11(
12id int PRIMARY KEY,
13name varchar(150) NOT NULL, 
14category varchar(15)
15
16CONSTRAINT unique_name UNIQUE (name, category) # constraint on multiple columns
17);
ERROR: duplicate key value violates unique constraint "unique_name"
DETAIL:  Key (name)=(Matinee) already exists.

Value Constraints

With value constraints you create references between two tables.

Naming convention for these is to singularize the table you are referencing, followed by an underscore and the column name (e.g. country_id, movie_id, customer_id)

Foreign Key

  • A foreign key is a column in one table that references the primary key column of another table
  • A movie_id column is a foreign key referencing the id primary key column in the Movies table
1# find out movie id
2SELECT id
3FROM movies
4WHERE title = "Gone With The Wind"; # assume it returns 2
5
6# use that movie id to find out promotions
7SELECT name, category
8FROM Promotions
9WHERE movie_id = 2;
  • To avoid overriding referenced columns (e.g. adding values to a foreign key column will add the value, but it wouldn’t give you any referenced data because there is no reference, you just added the value directly in this table, not the referenced one), you use a REFERENCES constraint
  • You can only reference (i.e. create a constraint) an existing table
1movie_id int REFERENCES movies(id)
 1# table being referenced must be created first (should be existing)
 2CREATE TABLE Movies
 3(
 4  id int PRIMARY KEY,
 5  title varchar(120) NOT NULL UNIQUE
 6);
 7
 8
 9CREATE TABLE Promotions
10(
11  id int PRIMARY KEY,
12  movie_id int REFERENCES movies(id),
13  name varchar(50),
14  category varchar(15)
15);
16
17
18CREATE TABLE Promotions # same as above
19(
20  id int PRIMARY KEY,
21  movie_id int REFERENCES movies, # primary key column is used by default
22  name varchar(50),
23  category varchar(15)
24);
25
26
27CREATE TABLE Promotions # same as above, different syntax
28(
29  id int PRIMARY KEY,
30  movie_id int,
31  name varchar(50),
32  category varchar(15),
33  FOREIGN KEY (movie_id) REFERENCES movies
34);

Once you have a foreign key constraint in place, it’ll give you an error if you try to reference a foreign key value that doesn’t exist

1INSERT INTO Promotions (id, movie_id, name, category)
2VALUES (4, 999, 'Fake Promotion, 'Hoax');
ERROR: insert or update on table "promotions" violates foreign key constraint "promotions_movie_id fkey"
DETAIL: Key (movie_id)=(999) is not present in table "movies".

Orphan records

(Child) records with a foreign key reference to a (parent) record that has been deleted

  • Orphan records are records with foreign key references where the foreign records have since been deleted. For example, you were referencing movie_id 11 in your Promotions table, but the movie has been deleted.
  • This could happen when you delete the referenced row or just drop the referenced table
  • Orphan records = bad data
  • The solution is a foreign key constraint
1DELETE FROM Movies WHERE id = 6;
ERROR: update or delete on table "movies" violates foreign key constraint "promotions_movie_id_fkey" on table "promotions"
DETAIL: Key (id)=(6) is still being referenced from table "promotions". 
1DROP TABLE Movies;
ERROR: can not drop table movies because other objects depend on it
DETAIL: constraint promotions_movie_id_fkey on table promotions depends on table movies

If you delete both the Movie and Promotions, you’ll get no errors. Basically, you’ll delete all referenced/associated records first and then delete the thing you wanted to delete

1DELETE FROM Promotions WHERE movie_id = 6;
2DELETE FROM Movies WHERE id = 6;

Same goes for dropping tables. First, drop any tables that are making references to the table you want to delete, and then delete the actual table

1DROP TABLE Promotions;
2DROP TABLE Movies;

Validating input values

For example, let’s say you want to make sure people are not entering negative values

1CREATE TABLE Movies
2(
3  id int PRIMARY KEY,
4  title varchar(120) NOT NULL UNIQUE,
5  genre varchar(120),
6  duration int CHECK (duration > 0) # validate input for this column
7);
1INSERT INTO Movies (id, title, genre, duration)
2VALUES (7, 'Intouchables', 'Drama', -10);
ERROR: new row for relation "movies" violates check constraint "movies_duration_check"
DETAIL: Failing row contains (7, 'Intouchables', 'Drama', -10 )