Search
Working with JSON

JSON

Python

Outline

  • JSON overview
  • JSON syntax
  • JSON Processing in Python

JSON

JSON stands for

Javascript Object Notation

JSON is text with a special notation (syntax)

  • A lightweight format used for data interchange
  • A popular data format for web application APIs
  • Compared to XML, SOAP, YAML, etc.
  • Supported by different programming languages

JSON Example

{"teammembers":[
    {
        "name":"Agnes",
        "title":"Vice President of Accounting",
        "bio":"With over 14 years of public accounting... "
    },
    {
        "name":"Wilbur",
        "title":"Founder and CEO",
        "bio":"While Wilbur is the founder and CEO ... "
    }
]}

Simple interpretation: A list of team members, each member is a data object with name, title, and bio fields. Each field and value provided in the key:value format.

JSON Syntax

Name (key) Value Pair

"first_name":"John"
  • Name (key) can be any string
  • Value can be one of the types
    • number, string, boolean, object, array, null
  • String value in quotes ""

JSON Object

{
    "first_name":"John", 
    "last_name":"Paul"
}
  • A collection of name-value pairs
  • In a set of curly braces

JSON List (Array)

A list of values:

{
       "brands":["Apple", "Google", "Microsoft"]
}
  • In a set of square brackets [ ]

A list of data objects:

{
       "brands":[
           {"name":"Apple", "website":"https://apple.com"}, 
           {"name":"Google", "website":"https://google.com"}, 
           {"name":"Microsoft", "website":"https://microsoft.com"}
       ]
}

Again, each data object contains name-value pairs in a set of { }.

Summary of JSON (JavaScript Object Annotation)

Single data object:

{
    "name":"Fenders Acoustic Guitar", 
    "category":"Music Instrument", 
    "price":199.99
}
  • { } to include one data object
  • Contains "key":value pairs
    • A string or date value in quotes, e.g. "name":"John"
    • A numeric value without quotes, e.g. "age":33
  • Comma , to separate multiple key-value pairs

Data object list:

{
       "product":[
            {
                "name":"Fenders Acoustic Guitar", 
                "category":"Music Instrument", 
                "price":199.99
            }, 
            {
                "name":"Steinway Piano", 
                "category":"Music Instrument", 
                "price":28500.00
            }
       ]
}
  • [ ] to include multiple data objects
  • , to separate the objects

Processing JSON with Python

The json module in Python:

  1. Loads data in the JSON format
  2. Parses JSON arrays and objects
  3. Gives access to specific data elements in JSON

Example data in students.json:

[
    {
        "id":1,
        "name":"John Smith",
        "program":"IS",
        "class_year":2020
    },
    {
        "id":2,
        "name":"Albert Einstein",
        "program":"DS",
        "class_year":2021
    },
    ...
]

Let's first try loading the data:

import json
from pprint import pprint

with open("data/students.json") as f: 
    # read and parse json data
    data = json.loads(f.read())
    # print to see if data have been loadedb
    pprint(data)
[{'class_year': 2020, 'id': 1, 'name': 'John Smith', 'program': 'IS'},
 {'class_year': 2021, 'id': 2, 'name': 'Albert Einstein', 'program': 'DS'},
 {'class_year': 2019, 'id': 3, 'name': 'George Washington', 'program': 'CS'},
 {'class_year': 2020, 'id': 4, 'name': 'Donald Trump', 'program': 'CST'},
 {'class_year': 2021, 'id': 5, 'name': 'Barack Obama', 'program': 'IS'},
 {'class_year': 2020, 'id': 6, 'name': 'Mark Twain', 'program': 'DS'},
 {'class_year': 2023, 'id': 7, 'name': 'Joey Binzinger', 'program': 'IS'},
 {'class_year': 2040, 'id': 8, 'name': 'Kanye West', 'program': 'DS'},
 {'class_year': 2012, 'id': 9, 'name': 'Felix Monika', 'program': 'CST'},
 {'class_year': 2015, 'id': 10, 'name': 'Noble Pearl', 'program': 'IS'},
 {'class_year': 2016, 'id': 11, 'name': 'Linus Torvalds', 'program': 'CS'},
 {'class_year': 2019, 'id': 12, 'name': 'Fonzi Zern', 'program': 'CST'},
 {'class_year': 2023, 'id': 13, 'name': 'SpongeBob', 'program': 'DS'},
 {'class_year': 2020, 'id': 14, 'name': 'Jackob Ziv', 'program': 'DS'},
 {'class_year': 2022, 'id': 15, 'name': 'Notafake person', 'program': 'IS'},
 {'class_year': 2020, 'id': 16, 'name': 'Ryan Bragg', 'program': 'IS'},
 {'class_year': 2022, 'id': 17, 'name': 'DefNotFake', 'program': 'CST'},
 {'class_year': 2018, 'id': 18, 'name': 'Josh Brolin', 'program': 'CST'},
 {'class_year': 2021, 'id': 19, 'name': 'Ryan Reynolds', 'program': 'IS'},
 {'class_year': 2020, 'id': 20, 'name': 'Bob Dole', 'program': 'CS'},
 {'class_year': 2021, 'id': 21, 'name': 'Camroon Mann', 'program': 'IS'},
 {'class_year': 2031, 'id': 22, 'name': 'John wick', 'program': 'CS'},
 {'class_year': 2018, 'id': 23, 'name': 'Matt Murdoc', 'program': 'IT'},
 {'class_year': 2019, 'id': 24, 'name': 'Bruce Wayne', 'program': 'CST'},
 {'class_year': 2020, 'id': 25, 'name': 'Lance Mcclain', 'program': 'CST'},
 {'class_year': 2019, 'id': 26, 'name': 'Jordan Fisher', 'program': 'IT'},
 {'class_year': 2020, 'id': 27, 'name': 'Eric Forman', 'program': 'CS'},
 {'class_year': 2018, 'id': 28, 'name': 'Keith Kogane', 'program': 'CS'},
 {'class_year': 2022, 'id': 29, 'name': 'Notabot', 'program': 'CST'},
 {'class_year': 2023, 'id': 30, 'name': 'Ian Mckellan', 'program': 'CS'},
 {'class_year': 2018, 'id': 31, 'name': 'Claude Giroux', 'program': 'CST'},
 {'class_year': 2022, 'id': 32, 'name': 'Deadpool ', 'program': 'IS'},
 {'class_year': 2020, 'id': 33, 'name': 'Leslie Knope', 'program': 'IT'},
 {'class_year': 2019, 'id': 34, 'name': 'Roger Federer', 'program': 'CST'},
 {'class_year': 2018, 'id': 35, 'name': 'Bruce Char', 'program': 'CST'},
 {'class_year': 2022, 'id': 36, 'name': 'Rafael Nadal', 'program': 'CS'},
 {'class_year': 2022, 'id': 37, 'name': 'ItoldUnotabot', 'program': 'IS'},
 {'class_year': 2021, 'id': 38, 'name': 'FrDLstTymnotabot', 'program': 'IS'},
 {'class_year': 2030, 'id': 39, 'name': 'Spiderman', 'program': 'DS'},
 {'class_year': 2020, 'id': 40, 'name': 'Harambe Noll', 'program': 'DS'},
 {'class_year': 2021, 'id': 41, 'name': 'Prince Farquad', 'program': 'CS'},
 {'class_year': 2021, 'id': 42, 'name': 'Bam Margera', 'program': 'CST'},
 {'class_year': 2022, 'id': 43, 'name': 'Johnny Bravo', 'program': 'DS'},
 {'class_year': 2022, 'id': 44, 'name': 'Zippy', 'program': 'IS'},
 {'class_year': 2021, 'id': 45, 'name': 'Skippy', 'program': 'IS'},
 {'class_year': 2019, 'id': 46, 'name': 'Claudio Bravo', 'program': 'IT'},
 {'class_year': 2021, 'id': 47, 'name': 'Tony Hawk', 'program': 'CST'},
 {'class_year': 2019, 'id': 48, 'name': 'Jeff Bezos', 'program': 'CS'},
 {'class_year': 2010, 'id': 49, 'name': 'John Fry', 'program': 'CS'},
 {'class_year': 2020, 'id': 50, 'name': 'Julius Erving', 'program': 'IS'}]

Now that data have been loaded and parsed into the JSON model properly, we can access the data, individual instances and/or attributes in the structure.

For example, if you only want to list student's name and class_year for those in the DS (data science) program:

import json
from pprint import pprint

# read and parse json data
with open("data/students.json") as f: 
    students = json.loads(f.read())

    for student in students: 
        # filter based on "program"
        if student['program'] == 'DS': 
            # show "name" and "class_year"
            print(student['name'], student['class_year'])
Albert Einstein 2021
Mark Twain 2020
Kanye West 2040
SpongeBob 2023
Jackob Ziv 2020
Spiderman 2030
Harambe Noll 2020
Johnny Bravo 2022