Database Concepts ABCs

Typically a database is positioned at the bottom of a technology stack. This in no way indicates importance. In actual fact, the database is one of the most important components in the architecture of a system. Here we have shown the database to the right of the diagram and client interacting with it through some kind of server. The client can interact directly with the database but this is under special circumstances - such as performing admin activities.

 

When working with data in relational databases there are some key concepts to understand.

  • Data is organised and stored in tables.

  • Databases hold collections of data stored in tables.

  • Data is accessed using a structured query language (SQL)

The relational model works of the principle of taking the complex world and reducing it to sets (tables), describing the structure (schema) of those sets (tables) using a Data Definition Language (DDL), discovering and describing (part of the schema) what relationships should exist between those sets (tables), enforcing certain constraints (part of the schema) on those relationships (referential integrity), and accessing the data in those sets through a language (SQL).

Relationships that are allowed to be captured in the database are -

one-to-one: e.g. one husband and one wife
many-to-one: e.g. many students and one school
one-to-many: e.g. one customer and many bank accounts
many-to-many: e.g. many students and many teachers

We would model these relationships using an Entity Relationship Diagram (ERD) or a Unified Modelling Language (UML) Class Diagram.

Designing the database

So consider the example where an individual travels to many parts of the world, and we would like to track the countries that they visit. We could construct a table with the following characteristics

 

And if we populate it with some data

first name

last name

destination

city

destination

country

reason

for visit

number

of visits

first name

last name

destination

city

destination

country

reason

for visit

number

of visits

Sel

RTR

Buzios

Brasil

Sun

2

Sel

RTR

Jo/burg

South Africa

Work, Sun

4

Sel

RTR



Nepal

Mountains

2

Sel

RTR

Rio De Janeiro

Brasil

Lifestyle, Sun

2

Sel

RTR

empumalanga

South Africa

Safari

2

Sel

RTR

Flogaman

Jamaica

Family, Sun, Lifestyle

10



It's clear to see from this table that there are quite a few repeated elements such as the first name, last name, destination country, and reason for visit.  We can easily improve on this model by splitting the table into two tables

 

In our example above, we can easily link the two tables together because there is only one traveller.  But what if we were to increase the number of travellers and destinations to the following sets

Traveller Table

first name

last name

first name

last name

Sel

RTR

Sam

Wright

Carol

Anne

Destination Table

destination

city

destination

country

reason

for visit

number

of visits

destination

city

destination

country

reason

for visit

number

of visits

Buzios

Brasil

Sun

2

Jo/burg

South Africa

Work, Sun

4

Tokyo

Japan

Manga comics

2



Nepal

Mountains

2

Mauritius

India

Chilling in the sun

3

Rio De Janeiro

Brasil

Lifestyle, Sun

2

Bridgetown

Barbados

Sun

3

Puerto Banús

Spain

Lifestyle

5

Empumalanga

South Africa

Safari

2

Flogaman

Jamaica

Family, Sun, Lifestyle

10

It's now clear that we cannot easily (actually it's impossible without intimate knowledge of each traveller) associate the destination rows with the travellers.  We need some way of identifying which destination row belongs to which traveller. 

If we create a unique identifier for each row in each table, then we can uniquely identify each row  (using the traveller names would be inadequate as there may be two or more travellers with the same name).  

Our tables now look like this

 

These new fields can be used as the tables' primary keys. A primary key is a value that can be indexed (allows for faster searches) and is unique. All tables should have a primary key column. If one is not specified all the columns collectively become the primary key (expensive for searches).

And populated with the updated IDs

Traveller Table

traveller

id

first name

last name

traveller

id

first name

last name

1

Sel

RTR

2

Sam

Wright

3

Carol

Anne

Destination Table

destination

id

destination

city

destination

country

reason

for visit

number

of visits

destination

id

destination

city

destination

country

reason

for visit

number

of visits

1

Buzios

Brasil

Sun

2

2

Jo/burg

South Africa

Work, Sun

4

3

Tokyo

Japan

Manga comics

2

4



Nepal

Mountains

2

5

Port Luis

Mauritius

Chilling in the sun

3

6

Rio De Janeiro

Brasil

Lifestyle, Sun

2

7

Bridgetown

Barbados

Sun

3

8

Puerto Banús

Spain

Lifestyle

5

9

Empumalanga

South Africa

Safari

2

10

Flogaman

Jamaica

Family, Sun, Lifestyle

10

The IDs in both cases must be unique.  You might be asking yourself: do we need the destination id?  We will come back to this.

Now that we have unique IDs we can use them to link rows in both tables together. We could add another column into the traveller table called destination id, that holds the unique destination from the destination table.  This wouldn't work because there are many destinations to a single traveller. We would have an individual row for each unique destination that a traveller visited.  This would take us back to our original problem of repeating entries (first name and last name) but this time in the traveller table.  It would be better to add a column called traveller id to the destination table.  The only repeated element here would be the traveller id.

Destination Table (updated with traveller id)

destination

id

traveller

id

destination

city

destination

country

reason

for visit

number

of visits

destination

id

traveller

id

destination

city

destination

country

reason

for visit

number

of visits

1

1

Buzios

Brasil

Sun

2

2

1

Jo/burg

South Africa

Work, Sun

4

3

2

Tokyo

Japan

Manga comics

2

4

1



Nepal

Mountains

2

5

2

Port Luis

Mauritius

Chilling in the sun

3

6

1

Rio De Janeiro

Brasil

Lifestyle, Sun

2

7

3

Bridgetown

Barbados

Sun

3

8

3

Puerto Banús

Spain

Lifestyle

5

9

1

Empumalanga

South Africa

Safari

2

10

1

Flogaman

Jamaica

Family, Sun, Lifestyle

10

We now have enough information to be able to link the rows on either table to each other.  

The idea of adding data from one table into another table for lookup (linking) purposes is known as a foreign key entry (destination.traveller id is the primary key from traveller table).

Do we still need the destination id?  If you removed destination id, which column would you use as the primary key?  The only column that offers any uniqueness is the destination id, but what it holds isn't very useful.  It would be better to construct the primary key column from data in the other columns such as city and country (you could use international country codes and then the first three letters for each city), so the destination table would be populated as follows (using ISO 3166 country codes https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes )

Destination Table (updated with traveller id)

destination

id

traveller

id

destination

city

destination

country

reason

for visit

number

of visits

destination

id

traveller

id

destination

city

destination

country

reason

for visit

number

of visits

BRABUZ

1

Buzios

Brasil

Sun

2

ZAFJOH

1

Jo/burg

South Africa

Work, Sun

4

JPNTOK

2

Tokyo

Japan

Manga comics

2

NPLXXX

1



Nepal

Mountains

2

MUSPOR

2

Port Luis

Mauritius

Chilling in the sun

3

BRARIO

1

Rio De Janeiro

Brasil

Lifestyle, Sun

2

BRBBRI

3

Bridgetown

Barbados

Sun

3

ESPPUE

3

Puerto Banús

Spain

Lifestyle

5

ZAFEMP

1

Mpumalanga

South Africa

Safari

2

JAMFLO

1

Flogaman

Jamaica

Family, Sun, Lifestyle

10

Now we specify the destination id as the primary key and index on it.  Also, we would write some algorithms to identify the full country name from the data in the destination id column.

Creating the database (DDL)

The CREATE TABLE command is used to tell the DBMS that a table should be created, the full syntax is as follows

CREATE TABLE [IF NOT EXISTS] traveller( <column name> <type> [NOT NULL] [AUTO_INCREMENT] [PRIMARY KEY] [,... other columns] [DEFAULT <value>] [CONSTRAINT <constraint expression>] );

So our traveler table could be defined as follows

CREATE TABLE IF NOT EXISTS traveller( traveller_id int NOT NULL AUTO_INCREMENT PRIMARY KEY, first_name varchar(40) NOT NULL, last_name varchar(40) NOT NULL);

And the destination table is as follows

CREATE TABLE IF NOT EXISTS destination ( destination_id char(8) NOT NULL, traveller_id int, destination_city varchar(100), destination_country varchar(50) NOT NULL, reason_for_visit varchar(100) NOT NULL, number_of_visits int, CONSTRAINT CH_visits CHECK (number_of_visits > 0), PRIMARY KEY(destination_id), FOREIGN KEY(traveller_id) REFERENCES traveller(traveller_id));

In this example, we have used a CONSTRAINT just to show an example of its usage.  We could have and should have used the NOT NULL constraint as shown on the destination_country.

Notice the last two entries, the PRIMARY KEY, and FOREIGN KEY.  The PRIMARY KEY statement is an alternative syntax for declaring the primary key.  The FOREIGN KEY entry is how you link a column in one table to another table.

Inserting data in the database (SQL)

Now that you have your schema, it's time to populate it with some data.

The most general syntax is - to insert a value into every column



The most common syntax is - to insert a value into selected columns



The usual syntax when a field has an auto-increment constraint - insert to selected fields

Querying the data

Once you have data in your database, you can use the query elements of SQL to find data.  The SELECT command has many forms, we simply use it to query one and two tables.

Its most basic form is the following - this will read all data from all columns



You can restrict the columns returned using



We can restrict the amount of data returned using this SELECT ... WHERE form

The WHERE clause is used to constrain the result set that is returned.  You can see this if you run the query above that contains any of the WHERE clauses.

On many occasions your query needs to bring together results from across two or more tables, this is known as performing a JOIN.  Think of it this way - we began our discussion with a single table and then split it out into two tables, we now use SQL to JOIN the two tables back into a single RESULT SET.

A SQL join is an instruction to combine data from two sets of data (i.e. two tables).  So in our example, above we can bring the two tables back together as a single result set using the following query

Unfortunately, when you run this query it will duplicate the traveller_id because the select * selects all columns from both tables.  Let's narrow this by altering the query