Yelp data set: converting from json to csv

This is a complimentary explanation about how to convert Yelp dataset (https://www.yelp.com/dataset_challenge) in json format into csv file(s) so we could use them in data analytics. The topic is covered in a Coursera class that I’ve worked on – Social Media Data Analytics (https://www.coursera.org/learn/social-media-data-analytics/home/welcome).

Yelp JSON File

Suppose you are converting a business data in json format (e.g., yelp_academic_dataset_business.json, which you downloaded from the Yelp site).

The data structure in the file would be as follows:

business

{
    'type': 'business',
    'business_id': (encrypted business id),
    'name': (business name),
    'neighborhoods': [(hood names)],
    'full_address': (localized address),
    'city': (city),
    'state': (state),
    'latitude': latitude,
    'longitude': longitude,
    'stars': (star rating, rounded to half-stars),
    'review_count': review count,
    'categories': [(localized category names)]
    'open': True / False (corresponds to closed, not business hours),
    'hours': {
        (day_of_week): {
            'open': (HH:MM),
            'close': (HH:MM)
        },
        ...
    },
    'attributes': {
        (attribute_name): (attribute_value),
        ...
    },
}

 

Python Script

Meanwhile, the python script for the converting (e.g.,  json2csv_business.py) is as follows:

#!/usr/bin/python

import sys
import json
import csv

ifilename = sys.argv[1]
try:
 ofilename = sys.argv[2]
except:
 ofilename = ifilename + ".csv"

# LOAD DATA
json_lines = [json.loads( l.strip() ) for l in open(ifilename).readlines() ]
OUT_FILE = open(ofilename, "w")
root = csv.writer(OUT_FILE)
root.writerow(["business_id","name","full_address","hours","open","categories","city","state","review_count","stars"])
json_no = 0 
for l in json_lines:
 root.writerow([l["business_id"], l["name"],l["full_address"],l["hours"],l["open"],l["categories"],l["city"],l["state"],l["review_count"],l["stars"]])
 json_no += 1

print('Finished {0} lines'.format(json_no)) 
OUT_FILE.close()

As you can see, this script only imports several features related to business such as business_id, name, full_address, hours, open, categories, etc. shown in the code. Then it reads each line in the json file to pick values to the selected features.

 

Converting

Now you can convert the json file into a csv file running the following command in your system console or Python IDEs:

>> python json2csv_business.py yelp_academic_dataset_business.json

 

Advertisements