Giter Site home page Giter Site logo

code-challenge's Introduction

Extract Van Gogh Paintings Code Challenge

Goal is to extract a list of Van Gogh paintings from the attached Google search results page.

Van Gogh paintings

Instructions

This is already fully supported on SerpApi. (relevant test, html file, sample json, and expected array.) Try to come up with your own solution and your own test. Extract the painting name, extensions array (date), and Google link in an array.

Fork this repository and make a PR when ready. Do not use more than 4 hours of your time.

Programming language wise, Ruby is suggested but feel free to use whatever you feel like.

Parse directly the HTML result page (html file) in this repository. No extra HTTP requests should be needed for anything.

Add also to your array the painting thumbnails present in the result page file (not the ones where extra requests are needed).

Test against 2 other similar result pages. (Pages that contain the same kind of carrousel. Don't necessarily have to be paintings.)


CHALLENGE MET

Implementation

Fast and lightweight implementation; with Nokogiri, Thor and ActiveSupport as dependencies; that 3rd is only used for message formatting.

  • No HTTP calls -> the documents to parse are already captured in files/
  • No JS execution required: thumbnails image source extraction is done by a REGEX. Otherwise we would have required something like selenium, capybara-webkit...which is heavy and cumbersome (thinking of requiring you to install xvfb, qt, PhantomJS)

RUNNING

I use Thor gem to provide the command line interface

cd into the directory and run thor serpapi:search_google_image "Van Gogh paintings"

As instructed I added 2 other artwork pages for Pablo Picasso and Claude Monet. You can run them the same way too:

thor serpapi:search_google_image "Pablo Picasso paintings"

OR

thor serpapi:search_google_image "Claude Monet paintings"

Notes

  • I matched the search keyword with the names of the files used for the data extraction (see within files/), so anything different will not hit the right page and will return an empty result
  • As of now July 2020, Google slightly updated the HTML structure of the artworks carousel. With that fact in mind, I had to adapt the code with some switches to support Van Gogh, which I guess was captured in 2019, and Picasso/Monet that I captured today 15th July 2020.

TESTING

Rake + Minitest are used for the unit test of GoogleImageSearch class.

I covered the case of Van Gogh only using the provided files/expected-array.json

To run the tests do rake test or just rake

THE TEST COVERAGE FAILS!! As I remarked and wrote as a comment in the test scenario, the provided files/expected-array.json is wrong beginning at the 9th image: you expect the image to be NIL while the provided document shows an image for "The Yellow House (1888)"

code-challenge's People

Contributors

hartator avatar franckyu avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.