Giter Site home page Giter Site logo

image-downloader's Introduction

#Specification

You are required to build a tool with a command line interface that will be able to download all images from given HTML page, save them to local disk and transcode them to different sizes and formats

Processed images can be later used on a mobile site.

##Minimal functionality

  1. Downloader tool has to accept at least two parameters: URL of a source page and location of output directory on local disc
  2. It has to download all images defined with tag on this page to specified directory
  3. Tool should be optimized to download only new or modified images. Images that have not been changed since tool was last run and still exist in specified directory should be omitted
  4. Once the image is downloaded, it should be resized to three different sizes: width 100px, 220px and 320px with preserved width/height ratio
  5. Each of the three image sizes should be saved in at least 2 formats: png and jpg
  6. Since tool can be used to parse large number of different HTML pages the application should consider performance
  7. Tool can skip very small images (width or height <= 10 pixels) since they are most likely unusable on a mobile site.

#Solution

##Requirements

##Usage:

To build the application navigate to the root directory and run 'mvn clean install'

To run (using an external test website): java -jar target/image-downloader.jar http://adambarnes5000.weebly.com downloads

This will create a new folder 'downloads' which will be populated with pictures of a dog, a cat and a monkey in the original size/format, and resized to 100, 220, 320 width in jpg and png formats.

##Design Summary

The application is built using Spring/Spring Integration. This decision was made to take advantage of some components offered out of the box by Spring Integration notably Splitter, Filter and Aggregator and also simple integration of multithreading. This is implemented by using a splitter to go from a webpage --> list of image urls, each image becomes a new payload. The image payloads are then placed on a channel and dispatched using a task executor (configured with a pool of 20 threads). The messages are then run through 2 filters, one filtering out previously downloaded and unchanged images, the other images of insufficient size. The remaining messages are then resized and converted to various formats and saved to disk. The messages (both succesful and filtered) are then aggregated and a number indicating how many images were downloaded is returned to the main class.

In terms of other possible approaches, an alternative would be to implement from scratch. This would have meant much more boilerplate code particularly around multithreading, this was not needed in my version leaving me free to concentrate on business logic.

Besides Spring the following libraries were also used:

  • Scalr - Used for resizing images
  • Jsoup - For parsing web pages
  • Spock - Groovy unit and integration tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.