This assignment is to load data from CSV, populate them in an SQLite database, and run in-database analytics with SQL.
Please create a Jupyter Notebook for this assignment.
A. Data
The data is provided as CCSubset.csv, which is a subset of the UCI Credit Card data set. You only need to use this provided subset for the assignment.
Please open and review the data file CCSubset.csv to determine the following when creating your table structure:
- The name of the table: please use a name with NO Space in it.
- How many fields (columns) do you have and what are they?
- What are their field types?
The full dataset is available at: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset.
import sqlite3
Create a new database, connect to it, and create a table structure corresponding to data fields in the CSV data source.
- Use
sqlite3.connect(“DatabaseFileName”)to connect the database and once you establish the connection, use its.cursor()method to maintain the cursor. For example:
my_connection = sqlite3.connect("mydatabase.db")
my_cursor = my_connection.cursor()
Use the cursor’s
.execute()method to run SQL statements;Use
CREATE TABLEstatement as a parameter for theexecute()method to create a new table.
Syntax of the create table statement in SQL:
CREATE TABLE table1 (
field1 int primary key,
field2 varchar(100),
field3 numeric
)
C. Load Data into SQLite (2 points)
Read data from the CSV file, use a loop to read each CSV line (data instance), and insert it into the SQLite table:
Include
import csvso you can use CSV-related functions;Please use the following program structure to read CSV data:
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
Inside the loop:
You can use
row['CSVColumnName']to read each column value;Execute a SQL statement using
INSERT INTO, combined with data from CSV, for example, within the for loop:
Example:
sql = "INSERT INTO table1 (field1, field2, field3)" \
" VALUES({0}, '{1}', {2})".format(row["column1"], row["column2"], row["column3"])
my_cursor.execute(sql)
Change (and add) names such as field1/2/3/... and column1/2/3/... to the correct ones in database and CSV respectively.
D. In-database Query and Analytics with SQLite Data (6 points)
Outside the previous structure, create code in Python with related SQL statements to do the following:
Updatethe data so that current marriage=2 (single) and marriage=3 values (others) are merged/updated to 2 (single).Remove all data records with
negative BILL_AMTvalues usingDELETE.Select and show the first 10 records in the database table, using
SELECT … LIMIT….Select and show all records with a
BILL_AMTamount> 500,000;Compute the total number of records
count(*), average (avg) AGE,minLIMIT_BAL,maxLIMIT_BAL in the data.Countthe # records, average AGE,minLIMIT_BAL,maxLIMIT_BAL, usingGROUP BY, for each group ofDefaultNext(0 and 1 levels).Countthe # records,averageAGE,minLIMIT_BAL,maxLIMIT_BAL for eachMarriagegroup (1, 2), again usingGROUP BY.Count the # records in each combination of
Marriagegroups andDefaultNextlevels, usingGROUP BY Marriage, DefaultNext.
For Items 1 and 2 above, you only need to call the cursor’s
.execute() method with proper SQL statements to execute.
For Items 3 and after, when you execute the SQL statement, you need to print the retrieved results using a loop such as:
print("Say a few words about what you are printing here")
rows = my_cursor.fetchall()
for row in rows
print(row[0], row[1], ...)