Giter Site home page Giter Site logo

living-with-machines / mapreader Goto Github PK

View Code? Open in Web Editor NEW
69.0 6.0 7.0 398.23 MB

A computer vision pipeline for exploring and analyzing images at scale

Home Page: https://mapreader.readthedocs.io/en/latest/

License: Other

Python 0.72% SCSS 0.01% Jupyter Notebook 99.27% TeX 0.01%
hut23 hut23-96 computer-vision deep-learning machine-learning pytorch article digital-humanities maps spatial-data

mapreader's Introduction


MapReader

A computer vision pipeline for exploring and analyzing images at scale

PyPI License Integration Tests badge DOI

What is MapReader?

Annotated Map with Prediction Outputs

MapReader is an end-to-end computer vision (CV) pipeline for exploring and analyzing images at scale.

MapReader was developed in the Living with Machines project to analyze large collections of historical maps but is a generalizable computer vision pipeline which can be applied to any images in a wide variety of domains.

Overview

MapReader is a groundbreaking interdisciplinary tool that emerged from a specific set of geospatial historical research questions. It was inspired by methods in biomedical imaging and geographic information science, which were adapted for use by historians, for example in our Journal of Victorian Culture and Geospatial Humanities 2022 SIGSPATIAL workshop papers. The success of the tool subsequently generated interest from plant phenotype researchers working with large image datasets, and so MapReader is an example of cross-pollination between the humanities and the sciences made possible by reproducible data science.

MapReader pipeline

MapReader pipeline

The MapReader pipeline consists of a linear sequence of tasks which, together, can be used to train a computer vision (CV) classifier to recognize visual features within maps and identify patches containing these features across entire map collections.

See our About MapReader page to learn more.

Documentation

The MapReader documentation can be found at https://mapreader.readthedocs.io/en/latest/index.html.

New users should refer to the Installation instructions and Input guidance for help with the initial set up of MapReader.

All users should refer to our User Guide for guidance on how to use MapReader. This contains end-to-end instructions on how to use the MapReader pipeline, plus a number of worked examples illustrating use cases such as:

  • Geospatial images (i.e. maps)
  • Non-geospatial images

Developers and contributors may also want to refer to the API documentation and Contribution guide for guidance on how to contribute to the MapReader package.

Join our Slack workspace! Please fill out this form to receive an invitation to the Slack workspace.

What is included in this repo?

The MapReader package provides a set of tools to:

  • Download images/maps and metadata stored on web-servers (e.g. tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere).
  • Load images/maps and metadata stored locally.
  • Pre-process images/maps:
    • patchify (create patches from a parent image),
    • resample (use image transformations to alter pixel-dimensions/resolution/orientation/etc.),
    • remove borders outside the neatline,
    • reproject between coordinate reference systems (CRS).
  • Annotate images/maps (or their patches) using an interactive annotation tool.
  • Train or fine-tune Computer Vision (CV) models and use these to predict labels (i.e. model inference) on large sets of images/maps.

Various plotting and analysis functionalities are also included (based on packages such as matplotlib, cartopy, Google Earth, and kepler.gl).

How to cite MapReader

If you use MapReader in your work, please cite both the MapReader repo and our SIGSPATIAL paper:

  • Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, and Katherine McDonough. 2022. MapReader: a computer vision pipeline for the semantic exploration of maps at scale. In Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities (GeoHumanities '22). Association for Computing Machinery, New York, NY, USA, 8–19. https://doi.org/10.1145/3557919.3565812
  • Kasra Hosseini, Rosie Wood, Andy Smith, Katie McDonough, Daniel C.S. Wilson, Christina Last, Kalle Westerling, and Evangeline Mae Corcoran. “Living-with-machines/mapreader: End of Lwm”. Zenodo, July 27, 2023. https://doi.org/10.5281/zenodo.8189653.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1).

Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Maps above reproduced with the permission of the National Library of Scotland https://maps.nls.uk/index.html

Contributors

All Contributors

Katie McDonough
Katie McDonough

🔬 🤔 📖 📋 📆 👀 📢
Daniel C.S. Wilson
Daniel C.S. Wilson

🔬 🤔 📢 📖 📋
Kasra Hosseini
Kasra Hosseini

💻 🤔 🔬 👀 📢
Rosie Wood
Rosie Wood

💻 📖 🤔 📢 👀 🚧 🔬
Kalle Westerling
Kalle Westerling

💻 📖 🚧 👀 📢
Chris Fleet
Chris Fleet

🔣
Kaspar Beelen
Kaspar Beelen

🤔 👀 🔬
Andy Smith
Andy Smith

💻 📖 🧑‍🏫 👀

mapreader's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mapreader's Issues

classifier.batch_info() always prints set name "train" regardless of set being shown

Describe the bug
classifier.batch_info() and classifier.show_sample() always prints set name "train" regardless of set being shown

To Reproduce
Steps to reproduce the behavior:

  1. go through steps to load data into classifer:
    from mapreader import classifier my_classifier = classifier() my_classifier.add2dataloader(your_dataset)
  2. run batch_info or show_sample command and specify set_name
    my_classifier.batch_info(set_name="val")

Will output dataset: 'train' even if set_name is not train.

Expected behavior
Should output dataset: set_name

Screenshots
e.g.
image

require 'set_name' argument when loading datasets using classifier.add2dataloader()

Describe the bug
In 'train' subpackage:
-datasets are loaded into classifier using .add2dataloader() which currently does not require 'set_name' argument
-default is set_name=None
-later code uses 'set_name' as argument and throws up error if it is == None so this should be required when loading data into dataloader.

Expected behavior
To stop errors further down the line, set_name should be required argument of .add2dataloader()

commandline script fails

How to reproduce

  1. Install MapReader using either of the two methods in the README.md
  2. At the shell enter the command
$ mapreader

Expected outcome

A welcome note, or usage instructions or similar.

Actual outcome

A ModuleNotFoundError:

Traceback (most recent call last):
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/bin/mapreader", line 33, in <module>
    sys.exit(load_entry_point('mapreader', 'console_scripts', 'mapreader')())
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/bin/mapreader", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'mapreader.mapreader'

Required Actions

  • Fix entry point in ./setup.py
  • Uncomment entry point details in ./conda/meta.yaml

Testing `MapReader`

Hi All 👋🏼

I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

  • Installation
    • Install git clone [email protected]:Living-with-machines/MapReader.git
    • git branch -> * dev
    • git pull origin dev
    • poetry install
    • poetry shell
      • this command was not included in the README.md (unlike conda activate ...)
  • Notebooks code execution
    • 001_retrieve_patchify_plot.ipynb
    • 002_annotation.ipynb
    • 003_train_classifier.ipynb
    • 004_inference.ipynb

Publish as a conda package

Is your feature request related to a problem? Please describe.
Some of the dependencies are much easier to install using conda than using pip. We also recommend that users install MapReader in a conda environment. Mixing conda installs and pip installs is known to be problematic.

Describe the solution you'd like
Publish MapReader as a conda package, eliminating the need to mix conda installs and pip installs.

Plant phenotyping example

Summary

These todos were collected while testing the notebooks on plant images.

TODO

  • Work with images stored in non-flat file directories
    • e.g., we can create unique IDs from the path: plantID_data_filename
  • In annotation, we assume that all parent images are stored in one directory. Can we make this a bit more flexible, e.g., "/path/to/*/*.png"?
  • Count the number of pixels in parent images (x and y), add as metadata
  • Add a warning/error when the number of pixels in x or y directions are different from what we expect.
  • Colorbars, transparency for out of the range patches?
  • normalize_mean and normalize_std in 003_train_classifier need to be set based on the plant images.
  • Data augmentation and transformations for the plant images, e.g., the intensities or brightness, mimic the changes in light/brightness.
  • Multiple views, compare different side-views, they should be the same scale (for each side view), transfer learning
  • Change over time, which features emerge first
  • Chain of classifiers? The first classifier would decide plant/no and then classify the flower/greenspace, then the shape of the features. Classify green features into different groups.
  • torchvision and MapReader installation issue

Ensure all functions/methods/etc. have doc strings

Is your feature request related to a problem? Please describe.
Some functions/methods/etc. do not have doc strings associated to them, makes it difficult to know which arguments to specify when using them.

Describe the solution you'd like
a. some have specific options that you can use - these should be listed in doc string
b. some are wrappers of other functions which do have doc strings - can they be pulled through/over
c. some have no doc string at all so will need completely writing

Additional context
Although an aesthetic problem, this is good for usability of code and will be pulled through in api docs so will be good to update/add.

Enable use with geotiffs

Describe the solution you'd like
Enable geotiffs and their metadata (coordinates) to be used in mapreader

Additional context
Currently geotiffs can be used as image files (my_files=loader('path/to/geo.tif') works) but require separate metadata file (my_files.add_metadata('path/to/metadata.csv')) to load in coords etc.

Getting started with Maps Tutorial in Binder

Opening an issue to capture our work creating a "maps-specific quickstart" in binder. This will document a usse case, research question and code implementation for a MapReader pipeline looking at historical maps! 🗺

Here are the resources you will need to get involved:

  • we are capturing all the content for the binder-deployed notebook in this hackMD
  • we are working on deployable version on binder here

Current tasks

  • @ChristinaLast to add code to predict buildings to hackMD with "quickstart" tutorial for maps.
    • @ChristinaLast to get pre-labelled building annotations, or dedicate time to labelling images for model trainins
  • @kmcdono2 to capture explanations suitable for non-technical historian to run quickstars notebook

You may also find the new and improved README draft useful!

submit annotation module for DHTech code review

The DHTech ADHO SIG is testing out a code review process for DH projects. We are considering submitting part of the MapReader code.

@kasra-hosseini suggested mapImages, annotation tool, or generally the install setup as options for a partial code review.

Submitting the code review requires preparing answers to questions documented here.

Tasks

  • confirm which part of code to review
  • prepare answers for submission (HackMD note)
  • make any updates to repo in preparation for submission (TBD)

Model inference in one step

Summary

Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

# example interface
my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...)
my_classifier.plot()

TODO

  • Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).
  • It would be interesting to have a similar function/method in MapReader.

Some dependances appear to require conda install

How to reproduce

Follow the instructions for Method 1, from then Readme.md:

conda create -n mr_py38 python=3.8
conda activate mr_py38
pip install mapreader 

or
Follow the instructions for Method 2, from then Readme.md:

conda create -n mr_py38 python=3.8
conda activate mr_py38
git clone https://github.com/Living-with-machines/MapReader.git 
cd /path/to/MapReader
pip install -v -e .

Expected outcome

MapReader should install without error.

Actual outcome

In some cases, MapReader fails to install. Some dependencies (eg "scikit-image) do not install reliably via pip. This can be resolved by deleting and recreating the conda environment and install the packages using conda. eg:

conda create -n mr_py38 python=3.8
conda activate mr_py38
conda install scikit-image==0.18.3
pip install mapreader 

example error message:

Collecting scikit-image<0.19.0,>=0.18.3
  Using cached scikit-image-0.18.3.tar.gz (29.2 MB)
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [3680 lines of output]
      Ignoring numpy: markers 'python_version == "3.6" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.8" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_machine != "aarch64" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_machine != "aarch64" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.9" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_python_implementation == "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_python_implementation == "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.10"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.8" and platform_python_implementation == "PyPy"' don't match your environment
      Collecting wheel
        Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
      Collecting setuptools<=51.0.0
        Using cached setuptools-51.0.0-py3-none-any.whl (785 kB)
      Collecting Cython>=0.29.18
        Using cached Cython-0.29.33-py2.py3-none-any.whl (987 kB)
      Collecting numpy==1.17.3
        Using cached numpy-1.17.3.zip (6.4 MB)
        Preparing metadata (setup.py): started
        Preparing metadata (setup.py): finished with status 'done'
      Building wheels for collected packages: numpy
        Building wheel for numpy (setup.py): started
        Building wheel for numpy (setup.py): finished with status 'error'
        error: subprocess-exited-with-error
      
        × python setup.py bdist_wheel did not run successfully.
        │ exit code: 1
        ╰─> [3301 lines of output]
            Running from numpy source directory.
            blas_opt_info:
            blas_mkl_info:
            customize UnixCCompiler
              libraries mkl_rt not found in ['/Users/rwood/miniconda3/envs/test/lib', '/usr/local/lib', '/usr/lib']
              NOT AVAILABLE
 
...

        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: legacy-install-failure
      
      × Encountered error while trying to install package.
      ╰─> numpy
      
      note: This is an issue with the package mentioned above, not pip.
      hint: See above for output from the failure.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Bugs in 001_retrieve_patchify_plot.ipynb

MapReader/examples/geospatial/classification_one_inch_maps_001/001_retrieve_patchify_plot.ipynb

  • Requires Cartopy to be installed else you get empty mapsheets
  • Still throws the error even after you install it in a prior cell

Linting

We are now using black for all MapReader/mapreader and MapReader/tests dirs. Also, we check them in CI (see Quality Assurance section).

Rename all 'slice' functions/methods to 'patchify' to align with documentation/streamline vocab

Describe the solution you'd like
Option A - change names of all 'slice' functions/methods to 'patchify' or 'patch' words
Option B - change all docs and doc strings that use the word 'patchify' to 'slice' so that it is clear what is what.

Additional context
Affected files (from grep -r -l "slice" mapreader):

./slicers/slicers.py
./slicers/pycache/init.cpython-38.pyc
./slicers/pycache/slicers.cpython-38.pyc
./utils/compute_and_save_stats.py
./utils/slice_parallel.py
./annotate/utils.py
./loader/pycache/images.cpython-38.pyc
./loader/images.py
./train/datasets.py
./train/pycache/datasets.cpython-38.pyc

:bug: some errors in `binder` deployment.

Tasks

  • Fix 'great_circle' is not defined
  • Fix simplekml needs to be installed to create KML outputs!

Associated tracebacks

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60/1620428857.py in <module>
      3 
      4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
----> 5         mymaps.calc_pixel_width_height(all_maps[0])

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
    349 
    350         elif calc_size_in_m in ['gc', 'great-circle']:
--> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
    352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
    353             top = great_circle((ymax, xmax), (ymax, xmin)).meters

NameError: name 'great_circle' is not defined
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    817         try:
--> 818             import simplekml
    819         except:

ModuleNotFoundError: No module named 'simplekml'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/tmp/ipykernel_60/28836796.py in <module>
      4             save_kml_dir="./kml_tutorial",
      5             figsize=(20, 20),
----> 6             image_width_resolution=600)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
    675                                     value=one_image_id,
    676                                     coords=self.images["parent"][one_image_id]["coord"],
--> 677                                     counter=-1)
    678                 else:
    679                     plt.title(one_image_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    818             import simplekml
    819         except:
--> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
    821 
    822         (lon_min, lon_max, lat_min, lat_max) = coords

ImportError: [ERROR] simplekml needs to be installed to create KML outputs!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.