Giter Site home page Giter Site logo

flickr-archivr's Introduction

Flickr Archivr

This project downloads your entire Flickr archive and turns it into a static website.

You can browse the website directly from your filesystem or you can upload the photos to a web server to make it public. The photos are stored on disk organized by date so you can even browse the files just by navigating the folders on your computer.

Getting Started

Install PHP and the curl and simplexml extensions. Install Composer.

Install the project dependencies by running

composer install

Copy .env.example to .env

Start by creating an app for the Flickr API. https://www.flickr.com/services/apps/create/apply/

Copy the API Key and Secret to the .env file.

Log in to get your access token:

php scripts/login.php

After successfully authorizing the app, it will give you two lines to add to your .env file.

FLICKR_ACCESS_TOKEN=
FLICKR_ACCESS_TOKEN_SECRET=

Downloading your Flickr Archive

Choose a folder to download everything into. Put the full path to the folder in the .env file, make sure to include a trailing slash.

Make sure you have enough disk space in the chosen location! For reference, Flickr says I have 120gb of photos, and once this script downloaded all the different resolutions, it took 255gb on disk.

STORAGE_PATH=/path/to/photos/

Then start the download:

php scripts/download.php

Since Flickr sometimes throws a 500 server error randomly, you can instead run the wrapper script which will retry when errors are encountered.

./scripts/download.sh

This will also download your album and people metadata as well.

Folder Structure

Photos are downloaded to a folder structure like the below, based on the date the photo was taken (if available) otherwise the date the photo was uploaded.

2022/
    /08/
       /12/
          /XXXXXXXXXXXXXX/
                         /info/photo.json
                         /info/sizes.json
                         /info/exif.json
                         /info/comments.json
                         /sizes/XXXXXXXXXXXXXX_k.jpg
                         /sizes/XXXXXXXXXXXXXX_b.jpg
                         /sizes/....
                         XXXXXXXXXXXXXX.jpg
          /XXXXXXXXXXXXXX/
                         /info/photo.json
                         /info/sizes.json
                         /info/exif.json
                         /info/comments.json
                         /sizes/XXXXXXXXXXXXXX_k.jpg
                         /sizes/XXXXXXXXXXXXXX_b.jpg
                         /sizes/....
                         XXXXXXXXXXXXXX.jpg
       /13/
          /XXXXXXXXXXXXXX/
                         /info/photo.json
                         /info/sizes.json
                         /info/exif.json
                         /info/comments.json
                         /sizes/XXXXXXXXXXXXXX_k.jpg
                         /sizes/XXXXXXXXXXXXXX_b.jpg
                         /sizes/....
                         XXXXXXXXXXXXXX.jpg

Each photo gets its own folder at: YEAR/MONTH/DAY/PHOTO_ID/. Inside the folder are:

  • The original photo stored as PHOTO_ID.jpg
  • A folder with every other size that Flickr provides, as sizes/PHOTOID_SIZE.jpg
  • A folder with JSON files containing
    • photo.json - The photo info including title, description, dates, tags, etc
    • exif.json - The complete exif data
    • sizes.json - Info about all the sizes of the photo available
    • comments.json - If present, all the comments on the photo

Download Albums

Albums (formerly known as photosets), can be downloaded with the command below.

php scripts/photosets.php

This creates a new folder with a subfolder for each album:

albums/
      /XXXXXXXXX/album.json
                /photos.json
      /XXXXXXXXX/album.json
                /photos.json    

The file album.json has the album metadata such as name and modified date. The file photos.json contains a list of all the photos in the album.

Download People

If you've tagged people in your photos, you can download metadata about them so their name and link appears in your archive.

php scripts/downloadpeople.php

Build Indexes

The indexes are used for various purposes when building the web pages to browse the photos.

Since photos are stored in a folder by date, this index helps other parts of the system find the photos on disk by just their photo ID.

php scripts/indexphotos.php

To build an index of all the people and which photos they appear in, run the command below.

php scripts/indexpeople.php

Build an index of all tags in order to create tag pages

php scripts/indextags.php

Note: You can build all indexes with the bash script included:

./scripts/index.sh

This just runs the three php scripts sequentially.

Build the Site

After everything is downloaded, build the static website:

./scripts/build.sh

Now you can browse your website by opening up the storage folder in a browser! If you have the folder locally on disk, just open the index.html file. If you've run this on a remote server, you can configure your web server to serve that folder, or run the built in PHP server:

php -S 0.0.0.0:8080 -t photos

Replace photos with the path you configured as the STORAGE_PATH where your photos have been downloaded.

flickr-archivr's People

Contributors

aaronpk avatar smcgivern avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

smcgivern

flickr-archivr's Issues

Download only original size?

Thanks for putting this together, I've been looking for something like this for a while!

This line in the README intrigued me:

Make sure you have enough disk space in the chosen location! For reference, Flickr says I have 120gb of photos, and once this script downloaded all the different resolutions, it took 255gb on disk.

I see that we download all the sizes of a photo here:

foreach($sizes as $size) {
if($size['label'] == 'Video Player')
continue;
$filename = $folder.'/'.sizeToFilename($info['id'], $size);
if($skip_download_if_exists) {
if(file_exists($filename)) {
continue;
}
}
echo "Downloading ".$size['source']." to $filename\n";
$fp = fopen($filename, 'w+');
$ch = curl_init($size['source']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
fclose($fp);
$finalURL = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
}

Would you accept a patch that adds an option to only download the original (maximum) size? It's not clear to me why we want the other sizes anyway, especially if the cost in disk space is so high.

progress.json prevents updates from happening

progress.json records the last page that was processed, which is useful for a one-shot download that may span multiple process invocations. However, it means that trying to do an incremental update (get new photos since the last archival run) fails, because the last page always contains the oldest photos.

This isn't really your fault; as far as I can tell, flickr.people.getPhotos is most-recent-first, with no parameter to change that.

For now I'm just deleting progress.json every time I start a new incremental update. I'm not really sure if there's a great fix here on this project's side - feel free to close it as wontfix if you want ๐Ÿ˜ƒ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.