Giter Site home page Giter Site logo

g_photos_takeout_metadata_merger's Introduction

Takeout metadata merger

Google Takeout is the only way to pull down original copies of photos in Google Photos in bulk and without manually downloading images in a web broswer.

Unfortunately Google Photos stores photos and photo metadata separately. This means that the takeout archive contains JSON files for image capture time, location, and other data. That metadata is not stored in the exif section of the downloaded images. Additionally for large libraries the Takeout process splits data across archives. So it is possible to have a asset 'foo/image.png' in archive1.tgz and its metadata 'foo/image.png.json' in archive2.tgz.

In order to load these images into a digital asset manager like Lightroom or similar tools, we need to move parse the custom takeout format and stored it in the downloaded images.

Assumptions & Limitations

  • we process takeout archives only in the form of gzipped tarballs.
  • we assume Google's metadata format is relatively consistent between images and videos
  • we can ignore metadata json files if they don't have a matching image or video file in the same directory
  • we use the pyexiv2 library to place metadata into files for cross platform compatibility between *nix and Windows. Unfortunately, the underlying exiv library does not support video files so for those file types and others we write XMP sidecars instead of writing the metadata into the files themselves.

Goals

  • Take standard image and video metadata, like creation time, that Google Photos stores separately and embed that data into the files. For everything else we export to a XMP sidecar.
  • Perform as much, if not all, processing in memory by streaming the archive contents
  • Only extract the base content files and their associated metadata
  • Extract each media file exactly once. This is achieved by hashing the content and storing in an in-memory dictionary. The dictionary is persisted between applications runs by saving to a gzipped JSON file. This should probably be replaced with something more robust like sqlite, etc.
  • Extract each media file in a 'YYYY/MM' folder structure. Takeout archives contain folders for each album, duplicating images for every different album they appear in. Beside deduplicating files we also collapse the export into a more standardized, date based, folder structure.

Usage

Ensure that you're on a platform and python interpreter version supported by pyexiv2 and then,

  • git clone https://github.com/msh9/g_photos_takeout_metadata_merger.git
  • preferably, set up your choice virtual env
  • pip install -r requirements.txt
  • if you like tests, python -m unittest discover tests
  • from the checkout, python photo_metadata_merger/photo_metadata_merger.py

ToDos

This works for my purposes now so these are unlikely to be addressed.

  • Refactor the setup class functions in test_content.py
  • Refactor the main execution function in photo_metadata_merger.py
  • Update storage classes to use something more robust that a periodically saved, gzipped, JSON file.
  • Setup as an installable package / make it easier to use

g_photos_takeout_metadata_merger's People

Contributors

msh9 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.