This assignment is to load data from CSV, populate them in an SQLite database, and run in-database analytics with SQL.
Please create a Jupyter Notebook for this assignment.
A. Data
The data is provided as CCSubset.csv, which is a subset of the UCI Credit Card data set. You only need to use this provided subset for the assignment.
Please open and review the data file CCSubset.csv to determine the following when creating your table structure:
- The name of the table: please use a name with NO Space in it.
- How many fields (columns) do you have and what are they?
- What are their field types?
The full dataset is available at: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset.
import sqlite3
Create a new database, connect to it, and create a table structure corresponding to data fields in the CSV data source.
- Use
sqlite3.connect(“DatabaseFileName”)
to connect the database and once you establish the connection, use its.cursor()
method to maintain the cursor. For example:
my_connection = sqlite3.connect("mydatabase.db")
my_cursor = my_connection.cursor()
Use the cursor’s
.execute()
method to run SQL statements;Use
CREATE TABLE
statement as a parameter for theexecute()
method to create a new table.
Syntax of the create table
statement in SQL:
CREATE TABLE table1 (
field1 int primary key,
field2 varchar(100),
field3 numeric
)
C. Load Data into SQLite (2 points)
Read data from the CSV file, use a loop to read each CSV line (data instance), and insert it into the SQLite table:
Include
import csv
so you can use CSV-related functions;Please use the following program structure to read CSV data:
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
Inside the loop:
You can use
row['CSVColumnName']
to read each column value;Execute a SQL statement using
INSERT INTO
, combined with data from CSV, for example, within the for loop:
Example:
sql = "INSERT INTO table1 (field1, field2, field3)" \
" VALUES({0}, '{1}', {2})".format(row["column1"], row["column2"], row["column3"])
my_cursor.execute(sql)
Change (and add) names such as field1/2/3/...
and column1/2/3/...
to the correct ones in database and CSV respectively.
D. In-database Query and Analytics with SQLite Data (6 points)
Outside the previous structure, create code in Python with related SQL statements to do the following:
Update
the data so that current marriage=2 (single) and marriage=3 values (others) are merged/updated to 2 (single).Remove all data records with
negative BILL_AMT
values usingDELETE
.Select and show the first 10 records in the database table, using
SELECT … LIMIT…
.Select and show all records with a
BILL_AMT
amount> 500,000
;Compute the total number of records
count(*)
, average (avg
) AGE,min
LIMIT_BAL,max
LIMIT_BAL in the data.Count
the # records, average AGE,min
LIMIT_BAL,max
LIMIT_BAL, usingGROUP BY
, for each group ofDefaultNext
(0 and 1 levels).Count
the # records,average
AGE,min
LIMIT_BAL,max
LIMIT_BAL for eachMarriage
group (1, 2), again usingGROUP BY
.Count the # records in each combination of
Marriage
groups andDefaultNext
levels, usingGROUP BY Marriage, DefaultNext
.
For Items 1 and 2 above, you only need to call the cursor’s
.execute()
method with proper SQL statements to execute.
For Items 3 and after, when you execute the SQL statement, you need to print the retrieved results using a loop such as:
print("Say a few words about what you are printing here")
rows = my_cursor.fetchall()
for row in rows
print(row[0], row[1], ...)