data-sourcing-challenge's Introduction

Columbia AI Module 6 Challenge: Data Sourcing

Requirements

Part 1: Access the New York Times API (35 points)

Part 2: Access The Movie Database API (40 points)

Note: The movie database was giving abnormally slow responses. There were no service outages posted, so perhaps they have been getting higher levels of traffic than their infrastructure can support. Fetching all 200 titles in a single notebook cell was proving difficult. Every 20 titles took about 30 minutes to resolve. (At that rate, doing them in a single cell would have taken about 5 hours). The solution was to dump the NYT reviews into a JSON file (reviews.json) so that I wouldn't be up against NYT's rate limit, then divvy up the requests to the movie database in a separate file, get_movie.ipynb, and once the complete movie list was accumulated in memory, dump that into a JSON file (movies.json) as well.

Preparation (4 points):

An empty list called tmdb_movies_list is created (1 point).
A variable called request_counter is created and assigned the value of 1 (1 point).
A for loop is created to loop through the titles list (2 points).

Inside the titles for loop (12 points):

request_counter is incremented by 1 (1 point).
time.sleep(1) when request_counter reaches a multiple of 50 (3 points).
A GET request that sends the title to The Movie Database search is performed, and the JSON results are retrieved (4 points).
A try-except clause is used (3 points).
The except clause prints out a statement if a movie is not found (1 point).

Inside the try clause (20 points):

The movie ID is collected from the first result and saved as a variable (2 points).
A GET request is made using the movie query URL and movie ID to retrieve the full movie details in JSON format (4 points).
The genre names are extracted from the results into a list called genres (2 points).
The spoken_languages' English names are extracted from the results into a list called spoken_languages (2 points).
The production_countries' names are extracted from the results into a list called production_countries (2 points).
A dictionary is created with the specified 15 fields (4 points).
The results dictionary is appended to the tmdb_movies_list list (3 points).
A message is printed with the name of the movie to indicate that the title was found. (1 point)

Actions after the results are collected (4 points):

The first five results are previewed using json.dumps with the argument indent=4 (2 points).
The results are converted to a DataFrame called tmdb_df with pd.DataFrame() (2 points).

Part 3: Merge and Clean the Data for Export (25 points)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

housker / data-sourcing-challenge Goto Github PK

data-sourcing-challenge's Introduction

Columbia AI Module 6 Challenge: Data Sourcing

Requirements

Part 1: Access the New York Times API (35 points)

Part 2: Access The Movie Database API (40 points)

Preparation (4 points):

Inside the titles for loop (12 points):

Inside the try clause (20 points):

Actions after the results are collected (4 points):

Part 3: Merge and Clean the Data for Export (25 points)

data-sourcing-challenge's People

Contributors

Watchers

data-sourcing-challenge's Issues

Module 6 Challenge: API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent