living-with-machines / mapreader Goto Github PK

View Code? Open in Web Editor NEW

69.0 6.0 7.0 398.23 MB

A computer vision pipeline for exploring and analyzing images at scale

Home Page: https://mapreader.readthedocs.io/en/latest/

License: Other

Python 0.72% SCSS 0.01% Jupyter Notebook 99.27% TeX 0.01%

hut23 hut23-96 computer-vision deep-learning machine-learning pytorch article digital-humanities maps spatial-data

mapreader's Introduction

MapReader

A computer vision pipeline for exploring and analyzing images at scale

What is MapReader?

MapReader is an end-to-end computer vision (CV) pipeline for exploring and analyzing images at scale.

MapReader was developed in the Living with Machines project to analyze large collections of historical maps but is a generalizable computer vision pipeline which can be applied to any images in a wide variety of domains.

Overview

MapReader is a groundbreaking interdisciplinary tool that emerged from a specific set of geospatial historical research questions. It was inspired by methods in biomedical imaging and geographic information science, which were adapted for use by historians, for example in our Journal of Victorian Culture and Geospatial Humanities 2022 SIGSPATIAL workshop papers. The success of the tool subsequently generated interest from plant phenotype researchers working with large image datasets, and so MapReader is an example of cross-pollination between the humanities and the sciences made possible by reproducible data science.

MapReader pipeline

The MapReader pipeline consists of a linear sequence of tasks which, together, can be used to train a computer vision (CV) classifier to recognize visual features within maps and identify patches containing these features across entire map collections.

See our About MapReader page to learn more.

Documentation

The MapReader documentation can be found at https://mapreader.readthedocs.io/en/latest/index.html.

New users should refer to the Installation instructions and Input guidance for help with the initial set up of MapReader.

All users should refer to our User Guide for guidance on how to use MapReader. This contains end-to-end instructions on how to use the MapReader pipeline, plus a number of worked examples illustrating use cases such as:

Geospatial images (i.e. maps)
Non-geospatial images

Developers and contributors may also want to refer to the API documentation and Contribution guide for guidance on how to contribute to the MapReader package.

Join our Slack workspace! Please fill out this form to receive an invitation to the Slack workspace.

What is included in this repo?

The MapReader package provides a set of tools to:

Download images/maps and metadata stored on web-servers (e.g. tileservers which can be used to retrieve maps from OpenStreetMap (OSM), the National Library of Scotland (NLS), or elsewhere).
Load images/maps and metadata stored locally.
Pre-process images/maps:
- patchify (create patches from a parent image),
- resample (use image transformations to alter pixel-dimensions/resolution/orientation/etc.),
- remove borders outside the neatline,
- reproject between coordinate reference systems (CRS).
Annotate images/maps (or their patches) using an interactive annotation tool.
Train or fine-tune Computer Vision (CV) models and use these to predict labels (i.e. model inference) on large sets of images/maps.

Various plotting and analysis functionalities are also included (based on packages such as matplotlib, cartopy, Google Earth, and kepler.gl).

How to cite MapReader

If you use MapReader in your work, please cite both the MapReader repo and our SIGSPATIAL paper:

Kasra Hosseini, Daniel C. S. Wilson, Kaspar Beelen, and Katherine McDonough. 2022. MapReader: a computer vision pipeline for the semantic exploration of maps at scale. In Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities (GeoHumanities '22). Association for Computing Machinery, New York, NY, USA, 8–19. https://doi.org/10.1145/3557919.3565812
Kasra Hosseini, Rosie Wood, Andy Smith, Katie McDonough, Daniel C.S. Wilson, Christina Last, Kalle Westerling, and Evangeline Mae Corcoran. “Living-with-machines/mapreader: End of Lwm”. Zenodo, July 27, 2023. https://doi.org/10.5281/zenodo.8189653.

Acknowledgements

This work was supported by Living with Machines (AHRC grant AH/S01179X/1) and The Alan Turing Institute (EPSRC grant EP/N510129/1).

Living with Machines, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.

Maps above reproduced with the permission of the National Library of Scotland https://maps.nls.uk/index.html

Contributors

_{Katie McDonough} 🔬 🤔 📖 📋 📆 👀 📢 ✅	_{Daniel C.S. Wilson} 🔬 🤔 📢 📖 📋	_{Kasra Hosseini} 💻 🤔 🔬 👀 📢	_{Rosie Wood} 💻 📖 🤔 📢 ✅ 👀 🚧 🔬	_{Kalle Westerling} 💻 📖 🚧 👀 📢	_{Chris Fleet} 🔣	_{Kaspar Beelen} 🤔 👀 🔬
_{Andy Smith} 💻 📖 🧑‍🏫 👀

mapreader's People

Stargazers

Watchers

Forkers

evangeline-corcoran dcsw2 npedrazzini lakillo jl106jml oshistory f-macfarlane95

mapreader's Issues

`loadAnnotations`: check the consistency between annotations and available patches.

Refer to #2 and specifically #2 (comment).

TODO:

A function in the loadAnnotations class to check if all annotations have locally stored patches.
If specified by the user, keep only those that have patches.

[analysis] unsupervised clustering of results (by label)

We discussed this in the past. Would it be possible to enable this as a 2nd step, e.g. to cluster the railspace patches/building patches with an eye towards analyzing how they vary?

classifier.batch_info() always prints set name "train" regardless of set being shown

Describe the bug
classifier.batch_info() and classifier.show_sample() always prints set name "train" regardless of set being shown

To Reproduce
Steps to reproduce the behavior:

go through steps to load data into classifer:
from mapreader import classifier my_classifier = classifier() my_classifier.add2dataloader(your_dataset)
run batch_info or show_sample command and specify set_name
my_classifier.batch_info(set_name="val")

Will output dataset: 'train' even if set_name is not train.

Expected behavior
Should output dataset: set_name

Screenshots
e.g.

Decide on visualisation tool within demo tutorials

@andrewphilipsmith to explore visualisation methods for inline visualisations of tutorial output

Batch size in the annotation interface

Currently, the batch size is set to 100 (in the annotation tool): https://github.com/Living-with-machines/MapReader/blob/main/mapreader/annotate/utils.py#L155

Add this option to prepare_annotation (https://github.com/Living-with-machines/MapReader/blob/main/mapreader/annotate/utils.py#L239) so that the user can change the batch size if needed.

require 'set_name' argument when loading datasets using classifier.add2dataloader()

Describe the bug
In 'train' subpackage:
-datasets are loaded into classifier using .add2dataloader() which currently does not require 'set_name' argument
-default is set_name=None
-later code uses 'set_name' as argument and throws up error if it is == None so this should be required when loading data into dataloader.

Expected behavior
To stop errors further down the line, set_name should be required argument of .add2dataloader()

Choose a tool to simplify diffs on .ipynb files.

Consider

and others

Build into workflow using pre-commit/CI as appropriate.

Test on both ubuntu-latest and windows-latest

commandline script fails

How to reproduce

Install MapReader using either of the two methods in the README.md
At the shell enter the command

$ mapreader

Expected outcome

A welcome note, or usage instructions or similar.

Actual outcome

A ModuleNotFoundError:

Traceback (most recent call last):
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/bin/mapreader", line 33, in <module>
    sys.exit(load_entry_point('mapreader', 'console_scripts', 'mapreader')())
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/bin/mapreader", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/Users/a.smith/anaconda3/envs/mapreader_dev/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'mapreader.mapreader'

Required Actions

Fix entry point in ./setup.py
Uncomment entry point details in ./conda/meta.yaml

Testing `MapReader`

Hi All 👋🏼

I will be testing MapReader install and the demo notebooks run to evecute the analysis. I will document my process here

Review installation, prepare pypi

PR (WIP): #44

Add tests for geospatial inputs

Tests for downloading images from tileserver.

Publish as a conda package

Is your feature request related to a problem? Please describe.
Some of the dependencies are much easier to install using conda than using pip. We also recommend that users install MapReader in a conda environment. Mixing conda installs and pip installs is known to be problematic.

Describe the solution you'd like
Publish MapReader as a conda package, eliminating the need to mix conda installs and pip installs.

Deploying `MapReader` through `binder`

@ChristinaLast and @andrewphilipsmith to walk through binder deployment
- adding requirements.txt with no hashed libraries for binderhub deployment

Update documentation to create ipykernel for local notebook start

separate out "local build" a "quickstart - launch through jupyter" option in README.md
check ipykernel is a dependency in the package build
include ipykernel setup in conda/pip setup.

Plant phenotyping example

Summary

These todos were collected while testing the notebooks on plant images.

TODO

make MapReader a conda package (install via conda)

Currently MapReader is installed via pip but requires some additional packages installed via conda.
Would be good to streamline this by making MapReader a conda package.

Add tests

Copyright and re-use terms

We can share the following links:

OS one-inch 2nd edition layer: https://mapseries-tilesets.s3.amazonaws.com/1inch_2nd_ed/index.html
OS six-inch 1st edition layer for Scotland at: https://mapseries-tilesets.s3.amazonaws.com/os/6inchfirst/index.html

Consistant Logging and Error handling all the steps in the pipeline

Publish on PyPI using GitHub actions

For v0.3.3, we published MapReader on PyPI manually using twine.

British railspace and buildings as predicted by a MapReader

Ensure all functions/methods/etc. have doc strings

Is your feature request related to a problem? Please describe.
Some functions/methods/etc. do not have doc strings associated to them, makes it difficult to know which arguments to specify when using them.

Describe the solution you'd like
a. some have specific options that you can use - these should be listed in doc string
b. some are wrappers of other functions which do have doc strings - can they be pulled through/over
c. some have no doc string at all so will need completely writing

Additional context
Although an aesthetic problem, this is good for usability of code and will be pulled through in api docs so will be good to update/add.

Enable use with geotiffs

Describe the solution you'd like
Enable geotiffs and their metadata (coordinates) to be used in mapreader

Additional context
Currently geotiffs can be used as image files (my_files=loader('path/to/geo.tif') works) but require separate metadata file (my_files.add_metadata('path/to/metadata.csv')) to load in coords etc.

MapReader on PyPI

MapReader v0.3.3 is on PyPI now: https://pypi.org/project/mapreader/

`annotation` function in `002_annotation.ipynb` freezes on same patch after 99 annotations

Testing annotation function (works really well!) up until 99 annotations, at which point it freezes on the same patch.

Context images, revisit

Summary

In MapReader, we have the functionality to create model ensembles, i.e.:

We have tested this on some examples but not on our current labelled data. The idea here is to test this functionality on train/validation sets.

TODO

Review: https://github.com/Living-with-machines/MapReader/blob/main/mapreader/train/classifier_context.py
Check the notebook (not on github)
How to set context size, NN architectures and ... hyperparams.
Create a tutorial

Interface with Label Studio (https://labelstud.io/)

Link: https://labelstud.io/

Plant images: performance (initial results)

Initial results of training different types of CV models on ~1500 labelled patches. The validation set has ~500 patches.

`max_mean_pixel`, annotation interface

In whole-plant images (side views), the white frame has very high mean pixel RGB values. We now added max_mean_pixel to the annotation interface.

Check MapReader installation + test notebooks

Installation: https://github.com/Living-with-machines/MapReader#installation
Notebooks: https://github.com/Living-with-machines/MapReader#use-cases

Getting started with Maps Tutorial in Binder

Opening an issue to capture our work creating a "maps-specific quickstart" in binder. This will document a usse case, research question and code implementation for a MapReader pipeline looking at historical maps! 🗺

Here are the resources you will need to get involved:

we are capturing all the content for the binder-deployed notebook in this hackMD
we are working on deployable version on binder here

Current tasks

@ChristinaLast to add code to predict buildings to hackMD with "quickstart" tutorial for maps.
- @ChristinaLast to get pre-labelled building annotations, or dedicate time to labelling images for model trainins
@kmcdono2 to capture explanations suitable for non-technical historian to run quickstars notebook

You may also find the new and improved README draft useful!

Updates/discussions

Add `min_std_pixel` and `max_std_pixel` to `prepare_annotation`

So that we can filter out black patches easier. We have trained some MapReader models using ~6K annotated patches (the plant phenotyping project), and now we need to extend the dataset, particularly for non-black patches.

patch bounding box as part of output file

Request to include bounding box coordinates for each patch in MapReader output (e.g. in addition to patch centroid).

Multilabel classification (annotation)

🐛 `LoadAnnotations` not returning annotation interface

When using a local notebook to run through the annotation section of the quick_start notebook, I am unable to see the LoadAnnonations object returned in order to generate new labels! See screen shot below:

submit annotation module for DHTech code review

The DHTech ADHO SIG is testing out a code review process for DH projects. We are considering submitting part of the MapReader code.

@kasra-hosseini suggested mapImages, annotation tool, or generally the install setup as options for a partial code review.

Submitting the code review requires preparing answers to questions documented here.

Tasks

confirm which part of code to review
prepare answers for submission (HackMD note)
make any updates to repo in preparation for submission (TBD)

Satellite images (some references)

I just had a talk with one of the REG members on https://github.com/urbangrammarai and they are using this tool to download satellite images: https://github.com/urbangrammarai/gee_pipeline/.
The other option is : https://planetarycomputer.microsoft.com/

Model inference in one step

Summary

Currently, we first need to patchify an image and then do the model inference (in two separate steps). In this issue, we plan to have a method that does both steps, i.e.,

# example interface
my_classifier.inference(path2image, **kwds for the slice method, including patch size, ...)
my_classifier.plot()

TODO

Refer to https://github.com/alan-turing-institute/mapreader-plant-scivision. Here, we have a function/method called "predict" that does model inference on an image. Under the hood, it slices an image into patches, does model inference on the patches and then plot the results (and return the predicted labels).
It would be interesting to have a similar function/method in MapReader.

Some dependances appear to require conda install

How to reproduce

Follow the instructions for Method 1, from then Readme.md:

conda create -n mr_py38 python=3.8
conda activate mr_py38
pip install mapreader

or
Follow the instructions for Method 2, from then Readme.md:

conda create -n mr_py38 python=3.8
conda activate mr_py38
git clone https://github.com/Living-with-machines/MapReader.git 
cd /path/to/MapReader
pip install -v -e .

Expected outcome

MapReader should install without error.

Actual outcome

In some cases, MapReader fails to install. Some dependencies (eg "scikit-image) do not install reliably via pip. This can be resolved by deleting and recreating the conda environment and install the packages using conda. eg:

conda create -n mr_py38 python=3.8
conda activate mr_py38
conda install scikit-image==0.18.3
pip install mapreader

example error message:

Collecting scikit-image<0.19.0,>=0.18.3
  Using cached scikit-image-0.18.3.tar.gz (29.2 MB)
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [3680 lines of output]
      Ignoring numpy: markers 'python_version == "3.6" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.8" and platform_machine == "aarch64"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_machine != "aarch64" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_machine != "aarch64" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.9" and platform_python_implementation != "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_python_implementation == "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_python_implementation == "PyPy"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.10"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.8" and platform_python_implementation == "PyPy"' don't match your environment
      Collecting wheel
        Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
      Collecting setuptools<=51.0.0
        Using cached setuptools-51.0.0-py3-none-any.whl (785 kB)
      Collecting Cython>=0.29.18
        Using cached Cython-0.29.33-py2.py3-none-any.whl (987 kB)
      Collecting numpy==1.17.3
        Using cached numpy-1.17.3.zip (6.4 MB)
        Preparing metadata (setup.py): started
        Preparing metadata (setup.py): finished with status 'done'
      Building wheels for collected packages: numpy
        Building wheel for numpy (setup.py): started
        Building wheel for numpy (setup.py): finished with status 'error'
        error: subprocess-exited-with-error
      
        × python setup.py bdist_wheel did not run successfully.
        │ exit code: 1
        ╰─> [3301 lines of output]
            Running from numpy source directory.
            blas_opt_info:
            blas_mkl_info:
            customize UnixCCompiler
              libraries mkl_rt not found in ['/Users/rwood/miniconda3/envs/test/lib', '/usr/local/lib', '/usr/lib']
              NOT AVAILABLE
 
...

        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: legacy-install-failure
      
      × Encountered error while trying to install package.
      ╰─> numpy
      
      note: This is an issue with the package mentioned above, not pip.
      hint: See above for output from the failure.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Plant images: transforms.Normalize(mean, std)

We computed mean and std of RGB channels on a sample dataset with ~70K patches. However, the unnormalized batches seem more reasonable. We need to further investigate this.

Restructuring `MapReader` `README.md`

@kmcdono2 to add comments on updated demo material and structure and README.md

Bugs in 001_retrieve_patchify_plot.ipynb

MapReader/examples/geospatial/classification_one_inch_maps_001/001_retrieve_patchify_plot.ipynb

Requires Cartopy to be installed else you get empty mapsheets
Still throws the error even after you install it in a prior cell

Linting

We are now using black for all MapReader/mapreader and MapReader/tests dirs. Also, we check them in CI (see Quality Assurance section).

Rename all 'slice' functions/methods to 'patchify' to align with documentation/streamline vocab

Describe the solution you'd like
Option A - change names of all 'slice' functions/methods to 'patchify' or 'patch' words
Option B - change all docs and doc strings that use the word 'patchify' to 'slice' so that it is clear what is what.

Additional context
Affected files (from grep -r -l "slice" mapreader):

./slicers/slicers.py
./slicers/pycache/init.cpython-38.pyc
./slicers/pycache/slicers.cpython-38.pyc
./utils/compute_and_save_stats.py
./utils/slice_parallel.py
./annotate/utils.py
./loader/pycache/images.cpython-38.pyc
./loader/images.py
./train/datasets.py
./train/pycache/datasets.cpython-38.pyc

Example for Maps
Example for Plant images

For the latter, we have a link now:

https://mybinder.org/v2/gh/Living-with-machines/MapReader/main?labpath=examples%2Fquick_start%2Fquick_start.ipynb

I just tested it, and it works, but I want to also add all the cells/codes of that notebook to CI.

Related issue: #28

:bug: some errors in `binder` deployment.

Tasks

Fix 'great_circle' is not defined
Fix simplekml needs to be installed to create KML outputs!

Associated tracebacks

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_60/1620428857.py in <module>
      3 
      4 xmin, xmax, ymin, ymax, myimg_shape, size_in_m = \
----> 5         mymaps.calc_pixel_width_height(all_maps[0])

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in calc_pixel_width_height(self, parent_id, calc_size_in_m)
    349 
    350         elif calc_size_in_m in ['gc', 'great-circle']:
--> 351             bottom = great_circle((ymin, xmin), (ymin, xmax)).meters
    352             right = great_circle((ymin, xmax), (ymax, xmax)).meters
    353             top = great_circle((ymax, xmax), (ymax, xmin)).meters

NameError: name 'great_circle' is not defined

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    817         try:
--> 818             import simplekml
    819         except:

ModuleNotFoundError: No module named 'simplekml'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/tmp/ipykernel_60/28836796.py in <module>
      4             save_kml_dir="./kml_tutorial",
      5             figsize=(20, 20),
----> 6             image_width_resolution=600)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in show(self, image_ids, value, plot_parent, border, border_color, vmin, vmax, colorbar, alpha, discrete_colorbar, tree_level, grid_plot, plot_histogram, save_kml_dir, image_width_resolution, kml_dpi_image, **kwds)
    675                                     value=one_image_id,
    676                                     coords=self.images["parent"][one_image_id]["coord"],
--> 677                                     counter=-1)
    678                 else:
    679                     plt.title(one_image_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/mapreader/loader/images.py in _createKML(self, path2kml, value, coords, counter)
    818             import simplekml
    819         except:
--> 820             raise ImportError("[ERROR] simplekml needs to be installed to create KML outputs!")
    821 
    822         (lon_min, lon_max, lat_min, lat_max) = coords

ImportError: [ERROR] simplekml needs to be installed to create KML outputs!

living-with-machines / mapreader Goto Github PK

mapreader's Introduction

MapReader

A computer vision pipeline for exploring and analyzing images at scale

What is MapReader?

Overview

MapReader pipeline

Documentation

What is included in this repo?

How to cite MapReader

Acknowledgements

Contributors

mapreader's People

Stargazers

Watchers

Forkers

mapreader's Issues

How to reproduce

Expected outcome

Actual outcome

Required Actions

Summary

TODO

Summary

TODO

Current tasks

Summary

TODO

How to reproduce

Expected outcome

Actual outcome

Tasks

Associated tracebacks

Recommend Projects

Recommend Topics

Recommend Org