SQLite Interactive
- SQLite has a SQL interactive tool
- Please download from: http://www.sqlite.org/download.html
- Guide to SQLite interpreter: http://sqlite.org/cli.html
Here is the interface on the Mac terminal / command line:
Enter .help
(a dot command) on SQLite prompt will list all supported commands.
sqlite> .help
%load_ext sql
%sql sqlite:///college.db
Let's open -- create if it does not exist -- a database file called college.db
:
sqlite> .open college.db
Screenshot on Mac terminal:
We can now create the two tables using SQL statements with the SQLite3 interactive.
Please note that %%sql
symbols, for example:
%%sql
select * from my_table;
in this tutorial is for SQL magic in the Jupyter Notebook and should be excluded from your actual SQL statement.
%%sql
drop table if exists programs;
create table programs (
program_id varchar(10) not null primary key,
program_name varchar(20),
program_desc varchar(500)
);
Now add (insert) program data:
%%sql
insert into programs (program_id, program_name, program_desc)
values ('IS', "Information Systems", "Information systems with a business angle");
insert into programs values ('DS', "Data Science", "New data science program");
insert into programs values
('CS', "Computer Science", "Traditional computer science"),
('CST', "Computing & Security Technology", "CST with a new focus on security");
Now let's take a look at data in the table:
%%sql
select * from programs;
%%sql
drop table if exists students;
create table students (
student_id integer primary key,
student_name varchar(100),
program_id varchar(10),
class_year int
);
For the students tables, we have preppared a csv file with students records, which can be downloaded from: students.csv.
Suppose data file students.csv
is in the same directory as the college.db:
csv
1 John Smith, IS, 2020
2 Albert Einstein, DS 2021
3 George Washington, CS, 2019
4 Donald Trump, CST, 2020
5 Barack Obama, IS, 2021
. ... .. ....
You can use the following dot commands to import the CSV data:
sqlite> .mode csv
sqlite> .import students.csv students
Now check out the data:
%%sql
select * from students limit 5;
%%sql
select count(*) from students;
For example, if you are interested in students in the DS program:
%%sql
select * from students where program_id='DS';
Or, you would like to know who have graduated before 2019:
%%sql
select * from students where class_year<2019;
%%sql
select program_id,
count(*) as student_count,
min(class_year) as first_class,
max(class_year) as last_class
from students
group by program_id
order by student_count DESC;
%%sql
update students set class_year = 2030 where class_year > 2030;
We can run the statistics again to see if data have been updated:
%%sql
select program_id,
count(*) as student_count,
min(class_year) as first_class,
max(class_year) as last_class
from students
group by program_id
order by student_count DESC;
%%sql
select student_name, program_name, class_year
from students s join programs p on s.program_id = p.program_id
limit 10;
We can rerun the previous statistics based on the program names, instead of program IDs:
%%sql
select program_name,
count(*) as student_count,
min(class_year) as first_class,
max(class_year) as last_class
from students s join programs p on s.program_id = p.program_id
group by program_name
order by student_count DESC;
References
- Chapter 3 Managing Data, section Managing Data using SQL, of LauraCasselland AlanGauld (2014). Python Projects.https://ebookcentral-proquest-com.ezproxy2.library.drexel.edu/lib/drexel-ebooks/detail.action?docID=1875898
- SQLite operations in Python:http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html
This is not the most efficient method to join and run the group by
clause. But for now, it is good for the small dataset here.