Giter Site home page Giter Site logo

tommoral / dicodile Goto Github PK

View Code? Open in Web Editor NEW
17.0 5.0 9.0 33.5 MB

Experiments for "Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals"

Home Page: https://tommoral.github.io/dicodile/

License: BSD 3-Clause "New" or "Revised" License

Python 99.95% Shell 0.03% TeX 0.02%
convolutional-dictionary-learning distributed-computing signals pattern-discovery

dicodile's Introduction

Build Status codecov

This package is still under development. If you have any trouble running this code, please open an issue on GitHub.

DiCoDiLe

Package to run the experiments for the preprint paper Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals.

Installation

All the tests should work with python >=3.6. This package depends on the python library numpy, matplotlib, scipy, mpi4py, joblib. The package can be installed with the following command run from the root of the package.

pip install  -e .

Or using the conda environment:

conda env create -f dicodile_env.yml

To build the doc use:

pip install  -e .[doc]
cd docs
make html

To run the tests:

pip install  -e .[test]
pytest .

Usage

All experiments are with mpi4py and will try to spawned workers depending on the parameters set in the experiments. If you need to use an hostfile to configure indicate to MPI where to spawn the new workers, you can set the environment variable MPI_HOSTFILE=/path/to/the/hostfile and it will be automatically detected in all the experiments. Note that for each experiments you should provide enough workers to allow the script to run.

All figures can be generated using scripts in benchmarks. Each script will generate and save the data to reproduce the figure. The figure can then be plotted by re-running the same script with the argument --plot. The figures are saved in pdf in the benchmarks_results folder. The computation are cached with joblib to be robust to failures.

Note

Open MPI tries to use all up network interfaces. This might cause the program to hang due to virtual network interfaces which could not actually be used to communicate with MPI processes. For more info Open MPI FAQ.

In case your program hangs, you can launch computation with the mpirun command:

  • either spefifying usable interfaces using --mca btl_tcp_if_include parameter:
$ mpirun -np 1 \
     --mca btl_tcp_if_include wlp2s0 \
     --hostfile hostfile \
     python -m mpi4py examples/plot_mandrill.py
  • or by excluding the virtual interfaces using --mca btl_tcp_if_exclude parameter:
$ mpirun -np 1 \
     --mca btl_tcp_if_exclude docker0 \
     --hostfile hostfile \
     python -m mpi4py examples/plot_mandrill.py

Alternatively, you can also restrict the used interface by setting environment variables OMPI_MCA_btl_tcp_if_include or OMPI_MCA_btl_tcp_if_exclude

$ export OMPI_MCA_btl_tcp_if_include="wlp2s0"

$ export OMPI_MCA_btl_tcp_if_exclude="docker0"``

dicodile's People

Contributors

agramfort avatar cedricallain avatar hndgzkn avatar rprimet avatar tommoral avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dicodile's Issues

inconsistent test behavior

Inconsistent behavior in test_stopping_criterion.

on main branch(799f3fe) (version with reusable workers)

  • When no network interface set:

    • runs for both1d and 2d values
    • if 1d value is commented out, test with 2d value hangs
  • When virtual network interfaces are exluded or single network interface set:

    • runs for both 1d and 2d values

on #38 (removes reusable workers)

  • When no network interface set:

    • runs for 1d value, but fails for 2d (1d value is not commented out)
  • When virtual network interfaces are exluded or single network interface set:

    • runs for both 1d and 2d values

We think that removing the reusable workers reveals a hidden bug. However it is still unclear why virtual network interfaces prevent it from working in the case the test mpi workers are running on the local machine.

Implement get_max_error_patch in dicodile

This step is necessary to get greedy dictionary learning to work efficiently with dicodile from alphacsc.
the core idea is to find the patch with shape (n_channels, *atoms_support) with the largest reconstruction error from the current reconstruction.

The distributed algorithm would be:

  • In each worker, find the largest error patch using something similar to get_max_error_dict for alphacsc but adapted with 2D convolutions.
  • Return the value of the max error as well as the patch associated to this to the main process.
  • Select and return the patch extracted from the worker which has the worst reconstruction error.

Enable data caching for gh-pages build?

We enabled data caching for the unittest workflow, in order to limit re-downloading the example datasets.

We could also perform the same optimization on the gh-pages workflow.

windowing the init dictionary may degrade encoding quality and performance

During the drafting of the gait example, it appeared that windowing the init dictionary may lead to a worse reconstruction.

This issue is a reminder that we may want to investigate that :-)

A few related plots:

An example of signal reconstruction with initial windowing:
image

The init dictionary associated with that reconstruction:
image

Same reconstruction without init windowing:
image

And the (unwindowed) init dictionary:
image

From a quantitative standpoint:

  • the cross-correlation figure for the LAX signal is 0.45 (with windowing) vs 0.91 without windowing
  • computation time for the multichannel gait example went from 8s (no windowing in init dict) to 27s (windowing) on my laptop

Add python 3.9 to unit tests matrix

We are currently running unit tests on python 3.8 due to numba not supporting python 3.9.

It looks like support has landed, so we could require numba >= 0.53.1 and add python 3.9 to the test matrix?

Unable to install with `conda env create -f dicodile_env.yml`

Hi!

Thanks for sharing your code! I am trying to install your package in an isolated conda environment, using

conda env create -f dicodile_env.yml

However, I am getting this error

Ran pip subprocess with arguments:
['/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/cmendoza/dicodile/condaenv.daadydz8.requirements.txt']
Pip subprocess output:
Collecting git+https://github.com/dask/dask-jobqueue.git (from -r /home/cmendoza/dicodile/condaenv.daadydz8.requirements.txt (line 1))
  Cloning https://github.com/dask/dask-jobqueue.git to /tmp/pip-req-build-e5f3dopd
  Resolved https://github.com/dask/dask-jobqueue.git to commit bad0e7f6ce578397e06a2fcfe5fb7f2e405fb358
Obtaining file:///home/cmendoza/dicodile (from -r /home/cmendoza/dicodile/condaenv.daadydz8.requirements.txt (line 3))
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'

Pip subprocess error:
  Running command git clone -q https://github.com/dask/dask-jobqueue.git /tmp/pip-req-build-e5f3dopd
  ERROR: Command errored out with exit status 1:
   command: /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/bin/python /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmphof_g2w8
       cwd: /home/cmendoza/dicodile
  Complete output (18 lines): 
  Traceback (most recent call last):
    File "/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 349, in <module>
      main()
    File "/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 331, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 117, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-mzif3rqf/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 155, in get_requires_for_build_wheel
      config_settings, requirements=['wheel'])
    File "/tmp/pip-build-env-mzif3rqf/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 135, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-mzif3rqf/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 150, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 6, in <module>
      import toml  # noqa: F401
  ModuleNotFoundError: No module named 'toml'
  ----------------------------------------
WARNING: Discarding file:///home/cmendoza/dicodile. Command errored out with exit status 1: /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/bin/python /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmphof_g2w8 Check the logs for full command output.
ERROR: Command errored out with exit status 1: /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/bin/python /home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmphof_g2w8 Check the logs for full command output.

CondaEnvException: Pip failed

If I print the system path variable just before importing setuptools in setup.py, I get

 ['/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/site-packages/pip/_vendor/pep517/in_process', '/tmp/pip-build-env-mzif3rqf/site', '/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python36.zip', '/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6', '/home/cmendoza/local/bin/anaconda/anaconda3/envs/dicodile/lib/python3.6/lib-dynload', '/tmp/pip-build-env-mzif3rqf/overlay/lib/python3.6/site-packages', '/tmp/pip-build-env-mzif3rqf/normal/lib/python3.6/site-packages']

So, conda creates the environment and install most of the packages, but the installation of the dicodile package using pip is failing. I tried to look for some hints on the Internet, but couldn't find anything. Could you please provide some hints to solve this issue?

My system:
Ubuntu 18.04.
conda 4.8.2

Failure when computed distributed beta?

Running test_dicodile_greedy (see PR #51 ) with n_workers=2 yields the following error:

    def main_check_beta(comm, workers_segments):
        """Check that beta computed in overlapping workers is identical.
    
        This check is performed only for workers overlapping with the first one.
        """
        global_test_points = get_global_test_points(workers_segments)
        for i_probe, pt_global in enumerate(global_test_points):
            sum_beta = np.empty(1, 'd')
            value = []
            for i_worker in range(workers_segments.effective_n_seg):
    
                pt = workers_segments.get_local_coordinate(i_worker, pt_global)
                if workers_segments.is_contained_coordinate(i_worker, pt):
                    comm.Recv([sum_beta, MPI.DOUBLE], source=i_worker,
                              tag=constants.TAG_ROOT + i_probe)
                    value.append(sum_beta[0])
            if len(value) > 1:
                # print("hello", pt_global)
>               assert np.allclose(value[1:], value[0]), value
E               AssertionError: [0.05110520535035923, -0.0016083304582741459]

ENH Implement LGCD with rank1 dictionary

It would be nice to implement LGCD for rank1 dictionary in dicodile so it can be easily integrated in alphacsc. I would go with the following steps:

  • first add a rank1 parameter in dicodile and update set_D and other communication function that communicate D so that it is retireve in rank1. For this first step, the workers should store a full rank D = uv^T (ie forget about the rank 1 inside the worker by computing D from its rank1 form \sum_k u[k]v[k]^T). Also store the rank1 form as a new attribute uv in each worker.
  • Then track all steps that make use of D in the workers and update them to use uv if rank1.
    All these steps should already be implemented in sub functions used in https://github.com/alphacsc/alphacsc/blob/master/alphacsc/utils/coordinate_descent.py as each worker runs an algorithm similar to this one. In particular, I think you need to update the computation of DtD, the reconstruction of X and the _init_beta.

Nothing about installing open mpi in the doc

Hello,

When running pip install -e ., an error is raised if open mpi is not installed. Maybe something should be said in the doc about the procedure to install it (at least a link).

Thanks !

test releases on TestPyPI seem to be numbered incorrectly

The CI is currently set up with a job that will

  • make a release on TestPyPI for all tags and pushes to main
  • make a release on PyPI for all tags

Release numbers are handled by setuptools_scm. Unfortunately, it seems that there is a hiccup in the numbering (configuration issue?).

dicodile 0.1 has been released, and the current TestPyPI versions should use the 0.2devN version number but use the 0.1devN version number instead.

For instance, the last push to main resulted in 0.1.dev77 being pushed to TestPyPI instead of 0.2.dev77

testpypi

(notice that 0.1.dev77 was published after 0.1)

Unit tests fail with mpich

Mandrill example runs without problems however unit tests fail with mpich.

When tests are run with:

$ pytest

Output is:

dicodile/tests/test_dicodile.py::test_dicodile [mpiexec@hande] match_arg (utils/args/args.c:160): unrecognized argument pmi_args
[mpiexec@hande] HYDU_parse_array (utils/args/args.c:175): argument matching returned error
[mpiexec@hande] parse_args (ui/mpich/utils.c:1603): error parsing input array
[mpiexec@hande] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1655): unable to parse user arguments
[mpiexec@hande] main (ui/mpich/mpiexec.c:128): error parsing parameters

This might be due to missing Singleton feature in mpich .

When tests are run with:

$ mpirun -np 1 pytest
  • dicodile/tests/test_dicodile.py runs without problems but hangs after running the tests and cannot stop the spawned process.
  • dicodile/update_z/tests/test_dicod.py hangs at first iteration.

When tests are run with mpirun command with openmpi implementation, all tests run without problems but it also hangs after running all the tests, leaving all processes spawned by the last test alive.

The problem with mpich seems to be valid only for the tests; for example examples/plot_mandrill.py runs without problems with mpich.

Problem spawning processes on ubuntu-18.04 with openmpi 2.1.1

Unit tests fail on ubuntu 18.04 with openmpi 2.1.1 after renaming dicodile.py to _dicodile.py and exposing dicodile function in
__init__.py as

from ._dicodile import dicodile

__all__ = ['dicodile']

While running the test:

dicodile/update_z/tests/test_dicod.py::test_stopping_criterion[6-signal_support0-atom_support0]

It returns

0 Exception
[hande-VirtualBox:04908] [[59073,0],0] ORTE_ERROR_LOG: Not found in file orted/pmix/pmix_server_dyn.c at line 87
1 Exception
[hande-VirtualBox:04908] [[59073,0],0] ORTE_ERROR_LOG: Not found in file orted/pmix/pmix_server_dyn.c at line 87
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 6 slots
that were requested by the application:
  /home/hande/dev/dicodile/env/bin/python

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
2 Exception
[hande-VirtualBox:04908] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
[hande-VirtualBox:04908] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
3 Exception
[hande-VirtualBox:04908] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
4 Exception
[hande-VirtualBox:04908] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
5 Exception
[hande-VirtualBox:04932] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

Each exception occurs at line

print(i, "Exception")

while trying to spawn workers at line

comm = MPI.COMM_SELF.Spawn(

The code spawns specified number of processes (6 in this case). The processes start
executing the specified main_worker.py script. However it stops at

from dicodile.utils import constants

where it tries to import from dicodile package.

I've tried adding lines before the import line, all run until the import line. But then it fails silently.

For nb_workers = [1, 2], the code runs without problems.

For nb_workers = 6, it raises exception in spawning processes.

I thought, the code was not able to access hostfile_test, however I realized that the loop starting at line

for i in range(10):

continues running and spawning the specified number of processes in each iteration. It complains about insufficient number of slots when the number of slots in hostfile_test would exceed at that iteration.

For example for the above example, hostfile_test specifies 16 slots. For 1st iteration, it spawns 6 processes, then raises
exception. However the processes continue to run. For second iteration it starts 6 more processes, 12 in total. For 3rd iteration, as it has 3 slots left, it complains that there are not enough slots.

I tried the same with 20 slots and it complained in 4th iteration after initializing 18 processes in the first 3.

Similar problem while running plot_mandrill.py example with 16 slots in hostfile with the command:
mpirun -np 1 --hostfile hostfile python -m mpi4py examples/plot_mandrill.py

Replace is False and data exists, so doing nothing. Use replace=True to re-download the data.
[DEBUG:DICODILE] Lambda_max = 11.274413430904202
0 Exception
[hande-VirtualBox:05655] [[58362,0],0] ORTE_ERROR_LOG: Not found in file orted/pmix/pmix_server_dyn.c at line 87
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 9 slots
that were requested by the application:
  /home/hande/dev/dicodile/env/bin/python

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
1 Exception
[hande-VirtualBox:05655] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
[hande-VirtualBox:05655] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
2 Exception
[hande-VirtualBox:05655] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
3 Exception
[hande-VirtualBox:05655] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
4 Exception
[hande-VirtualBox:05655] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
5 Exception
[hande-VirtualBox:05655] 1 more process has sent help message help-orte-rmaps-base.txt / orte-rmaps-base:alloc-error
6 Exception
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[hande-VirtualBox:05664] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.