Search
SQLite

SQLite

SQLite

SQL: structured query language for relational databases

  • SQLite implements a simple SQL dialect
  • SQLite uses data files
    • No need to setup a database server
  • Widely used and supported, e.g. on mobile phones

SQLite Interactive

SQLite Interactive

  1. Download and install SQLite
  2. Go to your Terminal or Command Line, and enter:
sqlite3
  1. Start using SQLite interactive with
    • dot commands
    • SQL statements

Here is the interface on the Mac terminal / command line:

sqlite

Enter .help (a dot command) on SQLite prompt will list all supported commands.

sqlite> .help

Load SQL Magic

%load_ext sql
The sql extension is already loaded. To reload it, use:
  %reload_ext sql

Database college.db

Suppose we are to manage the following data in a college:

  1. Degree programs with ID, program name, and description
  2. Students in the programs, with: ID, name, program information, and class year.

Load SQLite Database

%sql sqlite:///college.db

Let's open -- create if it does not exist -- a database file called college.db:

sqlite> .open college.db

Screenshot on Mac terminal:

sqlite

We can now create the two tables using SQL statements with the SQLite3 interactive.

Please note that %%sql symbols, for example:

%%sql
select * from my_table;

in this tutorial is for SQL magic in the Jupyter Notebook and should be excluded from your actual SQL statement.

programs table

Create the programs table structure:

%%sql
drop table if exists programs;
create table programs (
    program_id varchar(10) not null primary key, 
    program_name varchar(20), 
    program_desc varchar(500)
);
 * sqlite:///college.db
Done.
Done.
[]

Now add (insert) program data:

%%sql
insert into programs (program_id, program_name, program_desc)
            values   ('IS', "Information Systems", "Information systems with a business angle");
insert into programs values ('DS', "Data Science", "New data science program");
insert into programs values 
        ('CS', "Computer Science", "Traditional computer science"),
        ('CST', "Computing & Security Technology", "CST with a new focus on security");
 * sqlite:///college.db
1 rows affected.
1 rows affected.
2 rows affected.
[]

Now let's take a look at data in the table:

%%sql
select * from programs;
 * sqlite:///college.db
Done.
program_id program_name program_desc
IS Information Systems Information systems with a business angle
DS Data Science New data science program
CS Computer Science Traditional computer science
CST Computing & Security Technology CST with a new focus on security

students table

Create the students table:

%%sql
drop table if exists students;
create table students (
    student_id integer primary key, 
    student_name varchar(100), 
    program_id varchar(10),
    class_year int
);
 * sqlite:///college.db
Done.
Done.
[]

For the students tables, we have preppared a csv file with students records, which can be downloaded from: students.csv.

Suppose data file students.csv is in the same directory as the college.db:

csv
1   John Smith,        IS,  2020
2   Albert Einstein,   DS   2021
3   George Washington, CS,  2019
4   Donald Trump,      CST, 2020
5   Barack Obama,      IS,  2021
.   ...                ..   ....

You can use the following dot commands to import the CSV data:

sqlite> .mode csv
sqlite> .import students.csv students

Now check out the data:

%%sql
select * from students limit 5;
 * sqlite:///college.db
Done.
student_id student_name program_id class_year
1 John Smith IS 2020
2 Albert Einstein DS 2021
3 George Washington CS 2019
4 Donald Trump CST 2020
5 Barack Obama IS 2021
%%sql
select count(*) from students;
 * sqlite:///college.db
Done.
count(*)
50

Select and Filter

Now that we have the data, we can go ahead to search the data and run statistics.

For example, if you are interested in students in the DS program:

%%sql
select * from students where program_id='DS';
 * sqlite:///college.db
Done.
student_id student_name program_id class_year
2 Albert Einstein DS 2021
6 Mark Twain DS 2020
8 Kanye West DS 2040
13 SpongeBob DS 2023
14 Jackob Ziv DS 2020
39 Spiderman DS 2030
40 Harambe Noll DS 2020
43 Johnny Bravo DS 2022

Or, you would like to know who have graduated before 2019:

%%sql
select * from students where class_year<2019;
 * sqlite:///college.db
Done.
student_id student_name program_id class_year
9 Felix Monika CST 2012
10 Noble Pearl IS 2015
11 Linus Torvalds CS 2016
18 Josh Brolin CST 2018
23 Matt Murdoc IT 2018
28 Keith Kogane CS 2018
31 Claude Giroux CST 2018
35 Bruce Char CST 2018
49 John Fry CS 2010

Group Aggregates

Perhaps, you only want to know statistics in each program:

%%sql
select program_id, 
       count(*) as student_count, 
       min(class_year) as first_class, 
       max(class_year) as last_class
from students 
group by program_id
order by student_count DESC;
 * sqlite:///college.db
Done.
program_id student_count first_class last_class
IS 14 2015 2023
CST 13 2012 2022
CS 11 2010 2031
DS 8 2020 2040
IT 4 2018 2020

Update Student Data

Now that you see from the statistics some data values are unrealistic such as a class year $>2030$. Perhaps we can reset all these values to $2030$ if they are greater than that:

%%sql
update students set class_year = 2030 where class_year > 2030;

We can run the statistics again to see if data have been updated:

%%sql
select program_id, 
       count(*) as student_count, 
       min(class_year) as first_class, 
       max(class_year) as last_class
from students 
group by program_id
order by student_count DESC;
 * sqlite:///college.db
Done.
program_id student_count first_class last_class
IS 14 2015 2023
CST 13 2012 2022
CS 11 2010 2030
DS 8 2020 2030
IT 4 2018 2020

Join Tables

Suppose you want to retrieve student data with their corresponding program information (e.g. program name):

%%sql
select student_name, program_name, class_year
from students s join programs p on s.program_id = p.program_id
limit 10;
 * sqlite:///college.db
Done.
student_name program_name class_year
John Smith Information Systems 2020
Albert Einstein Data Science 2021
George Washington Computer Science 2019
Donald Trump Computing & Security Technology 2020
Barack Obama Information Systems 2021
Mark Twain Data Science 2020
Joey Binzinger Information Systems 2023
Kanye West Data Science 2030
Felix Monika Computing & Security Technology 2012
Noble Pearl Information Systems 2015

We can rerun the previous statistics based on the program names, instead of program IDs:

%%sql
select program_name, 
       count(*) as student_count, 
       min(class_year) as first_class, 
       max(class_year) as last_class
from students s join programs p on s.program_id = p.program_id
group by program_name
order by student_count DESC;
 * sqlite:///college.db
Done.
program_name student_count first_class last_class
Information Systems 14 2015 2023
Computing & Security Technology 13 2012 2022
Computer Science 11 2010 2030
Data Science 8 2020 2030

References

This is not the most efficient method to join and run the group by clause. But for now, it is good for the small dataset here.