seung-lab / igneous Goto Github PK

View Code? Open in Web Editor NEW

41.0 18.0 16.0 5.41 MB

Scalable Neuroglancer compatible Downsampling, Meshing, Skeletonizing, Contrast Normalization, Transfers and more.

License: GNU General Public License v3.0

Python 99.74% Dockerfile 0.26%

neuroglancer python cloud kubernetes docker visualization python3 connectomics downsample meshing

igneous's People

Contributors

Stargazers

Watchers

Forkers

pgunn fcollman jabae neurodata manuel-castro sandyhider liu3xing3long aplbrain zettaai virusuki donglaiw jackiezhai rmorey jakobtroidl j6k4m8

igneous's Issues

Mesh Brotli Support

In CloudVolume there is a compress argument that allows output of brotli compressed files. However, I do not see such an argument in Igneous, and I always get gzip compressed files. Is there a way to get brotli compressed files?

Downsampling 2x2x2?

Hi, igneous team
I see the igneous can downsample dataset

tasks = create_downsampling_tasks(
    layer_path, # e.g. 'gs://bucket/dataset/layer'
    mip=0, # Start downsampling from this mip level (writes to next level up)
    fill_missing=False, # Ignore missing chunks and fill them with black
    axis='z', 
    num_mips=5, # number of downsamples to produce. Downloaded shape is chunk_size * 2^num_mip
    chunk_size=None, # manually set chunk size of next scales, overrides preserve_chunk_size
    preserve_chunk_size=True, # use existing chunk size, don't halve to get more downsamples
    sparse=False, # for sparse segmentation, allow inflation of pixels against background
    bounds=None, # mip 0 bounding box to downsample 
    encoding=None # e.g. 'raw', 'compressed_segmentation', etc
    delete_black_uploads=False, # issue a delete instead of uploading files containing all background
    background_color=0, # Designates the background color
    compress='gzip', # None, 'gzip', and 'br' (brotli) are options
  )

I see it just downsample the dataset 2x2x1, if the dataset's size is 256x256x256 it will get a 128x128x256.
So how to get a 2x2x2 downsample, for example: the dataset's size is 256x256x256 it will get a 128x128x128.

support sharded format for downsampling output

I just started thinking about the sharded format, and my initial test with the downsampling showed igneous supported reading the sharded format -- yay!. The downsampled output is unsharded, which is likely just fine (sharding is most important for mip=0).

Is it technically feasible to automatically determine sharding specs for higher mips and write out sharded format? For example, one could keep the same #minishard/chunk and #minishard/shard.

PyPI Package?

Now that Igneous has a usable CLI, it would be nice to be able to install it as a PyPI package. This is possible thanks to the fact that we moved all the heavy compiled code into other packages already.

I feel the organization of the Igneous code could use some work. Also, because I waited so long, someone else took the igneous PyPI name...

Python syntax warning

There is a syntax warning I encountered:

igneous/igneous/chunks.py:35: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if (shape is None or dtype is None) and encoding is not 'npz':

Doesn't break anything at the moment but looks like a simple fix.

documentation for docker usage?

Hi, I am trying to use docker, but get some errors. I am probably not using it correctly, could you give me a hint?

~/workspace/zfish_analysis: docker run -it -v /secrets:/secrets -v /import:/import -e "LEASE_SECONDS=3000" seunglab/igneous:master
Deprecation Warning: /root/.cloudvolume/secrets/google-secret.json is now preferred to /secrets/google-secret.json.
Deprecation Warning: /root/.cloudvolume/secrets/aws-secret.json is now preferred to /secrets/aws-secret.json.
Deprecation Warning: /root/.cloudvolume/secrets/boss-secret.json is now preferred to /secrets/boss-secret.json.
Pulling from pull-queue://pull-queue
raised name 'leaseSecs' is not defined
 Traceback (most recent call last):
  File "/igneous/igneous/task_execution.py", line 70, in execute
    task = tq.lease(tag=tag, seconds=int(LEASE_SECONDS))
  File "/usr/local/lib/python3.4/site-packages/taskqueue/taskqueue.py", line 155, in lease
    tag=tag,
  File "/usr/local/lib/python3.4/site-packages/taskqueue/google_queue_api.py", line 110, in lease
    'leaseSecs': leaseSecs,
NameError: name 'leaseSecs' is not defined

 undefined task
 on host 4d57a4214084
Traceback (most recent call last):
  File "/igneous/igneous/task_execution.py", line 89, in <module>
    command()
  File "/usr/local/lib/python3.4/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.4/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.4/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/igneous/igneous/task_execution.py", line 33, in command
    execute(tag, queue, server, qurl)
  File "/igneous/igneous/task_execution.py", line 70, in execute
    task = tq.lease(tag=tag, seconds=int(LEASE_SECONDS))
  File "/usr/local/lib/python3.4/site-packages/taskqueue/taskqueue.py", line 155, in lease
    tag=tag,
  File "/usr/local/lib/python3.4/site-packages/taskqueue/google_queue_api.py", line 110, in lease
    'leaseSecs': leaseSecs, 
NameError: name 'leaseSecs' is not defined

[feature request] sparse meshing

In zebrafish, the meshing takes a few days. Sometimes there is some tasks left and I have to rerun the whole thing to get it done.
Actually, we only need to mesh the reconstructed cells, meaning sparse meshing.
It would be nice to have this capability?

CLI doesn't exit (return control to the shell)

I installed the latest version of igneous and try to use the CLI to do simple job like downsampling, after I followed the instruction on the README page to create tasks and excute it, the log information would stop refreshing at some point showing something like "INFO FunctionTask 23a62836-1dbc-4350-9872-5dc8f6d06a96 succesfully executed in 18.43 sec." and never go back to shell, I can only exit the process by ctrl+c.

But I also tried to use the python interface as a simple script to do the same job, and it successfully finished.

I did my test on "CentOS Stream 8" and I installed the package using pip. Could you tell me if it is normal or what I should do to correctly use the CLI? Thanks a million.

networkx in requirements.txt

I noticed that if the version of networkx is not specified in requirements.txt an incompatible version will be installed. Can be fixed by specifying networkx==2.1.

create multi-resolution meshes

Is there a way to create multi-resolution meshes using igneous? The instructions on README.md seems to generate legacy single resolution meshes only.

Graphene Meshing Remapping Bug

igneous docker build error

just put this issue here.

Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/req/req_install.py", line 339, in check_if_exists                                                                                             
    self.satisfied_by = pkg_resources.get_distribution(str(no_marker))
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 476, in get_distribution                                                                                       
    dist = get_provider(dist)
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 352, in get_provider                                                                                           
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 895, in require                                                                                                
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 786, in resolve                                                                                                
    raise VersionConflict(dist, req).with_context(dependent_req)
pip._vendor.pkg_resources.ContextualVersionConflict: (intern 0.9.9 (/src/intern), Requirement.parse('intern>=0.9.10'), {'cloud-volume'})                                                                   

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/cli/base_command.py", line 143, in main
    status = self.run(options, args)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 318, in run
    resolver.resolve(requirement_set)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 102, in resolve
    self._resolve_one(requirement_set, req)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 256, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 193, in _get_abstract_dist_for                                                                                              
    req, self.require_hashes, self.use_user_site, self.finder,
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 329, in prepare_editable_requirement                                                                             
    req.check_if_exists(use_user_site)
  File "/usr/local/lib/python3.5/dist-packages/pip/_internal/req/req_install.py", line 350, in check_if_exists                                                                                             
    self.req.name
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 476, in get_distribution                                                                                       
    dist = get_provider(dist)
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 352, in get_provider                                                                                           
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 895, in require                                                                                                
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/local/lib/python3.5/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 786, in resolve                                                                                                
    raise VersionConflict(dist, req).with_context(dependent_req)
pip._vendor.pkg_resources.ContextualVersionConflict: (intern 0.9.9 (/src/intern), Requirement.parse('intern>=0.9.10'), {'cloud-volume'})

Igneous CLI

All the cool kids have a CLI for their task system.

Examples:

igneous downsample gs://bucket/test/image --mip 2 --sparse  --queue sqs://my-sqs-queue
igneous xfer gs://bucket/test/image s3://bucket/test/image --mip 0  --queue /my/queue/dir
igneous delete gs://bucket/test/image --queue /my/queue/dir
igneous mesh create gs://bucket/test/segmentation --mip 3 --shape 511,511,511 --queue sqs://my-sqs-queue
igneous mesh merge gs://bucket/test/segmentation --magnitude 0 --queue sqs://my-sqs-queue
igneous skeleton create gs://bucket/test/segmentation --mip 3 --shape 511,511,511 --sharded
igneous skeleton merge gs://bucket/test/segmentation --mip 3 --shape 511,511,511 

igneous execute sqs://my-sqs-queue --parallel 2

Add lifetime Feature to task_execution.py

To live easily with certain batch job systems, task_execution should live for only a limited amount of time. We can always start new instances.

How to process skeletons from local segmentation files?

I use Igneous to extract the skeletons, but i got confused that if my files are not in the google cloud, but in my local system.

should I make an infofor each file? and the function create_skeletonizing_tasksdose it process all the files at the same time? i was so confused about that. because my files were in local system, and did not know the file structure? could you give me some examples about how to use local files?

I think the cloudpath will read all the files from google cloud, but how to read local files?This is the most confused.

The file structure like this:

├── seg_results
│   ├── 0
│   │   ├── 10
│   │   ├── 11
│   │   └── 12
│   ├── 1
│   │   ├── 10
│   │   ├── 11
│   │   ├── 12
│   │   └── 8
│   ├── 10
│   │   ├── 10
│   │   ├── 11
│   │   ├── 12
│   │   ├── 13
│   ├── 100
│   │   ├── 10
│   │   ├── 11
│   │   ├── 12
│   │   ├── 13
and i use the following code to process the files

cloudpath = 'file:///mnt/f/seg_results/0/10/'
mip = 0
# First Pass: Generate Skeletons
tasks = tc.create_skeletonizing_tasks(
    cloudpath, 
    mip, # Which resolution to skeletionize at (near isotropic is often good)
    shape=Vec(512, 512, 512), # size of individual skeletonizing tasks (not necessary to be chunk aligned)
    sharded=False, # Generate (true) concatenated .frag files (False) single skeleton fragments
    spatial_index=False, # Generate a spatial index so skeletons can be queried by bounding box
    #info=None, # provide a cloudvolume info file if necessary (usually not)
    info = CloudVolume.create_new_info(
        num_channels    = 1,
        layer_type      = 'segmentation',
        data_type       = 'uint64', # Channel images might be 'uint8'
        encoding        = 'raw', # raw, jpeg, compressed_segmentation, fpzip, kempressed
        resolution      = [4, 4, 40], # Voxel scaling, units are in nanometers
        voxel_offset    = [0, 0, 0], # x,y,z offset in voxels from the origin
        mesh            = 'mesh',
        # Pick a convenient size for your underlying chunk representation
        # Powers of two are recommended, doesn't need to cover image exactly
        chunk_size      = [ 128, 128, 64 ], # units are voxels
        #volume_size     = [ 250000, 250000, 25000 ], # e.g. a cubic millimeter dataset
        volume_size     = [125, 1250, 1250], # e.g. a cubic millimeter dataset
    ),
    fill_missing=False, # Use zeros if part of the image is missing instead of raising an error
    # see Kimimaro's documentation for the below parameters
    teasar_params={'scale':10, 'const': 10}, 
    object_ids=None, # Only skeletonize these ids
    mask_ids=None, # Mask out these ids
    fix_branching=True, # (True) higher quality branches at speed cost
    fix_borders=True, # (True) Enable easy stitching of 1 voxel overlapping tasks 
    dust_threshold=1000, # Don't skeletonize below this physical distance
    progress=False, # Show a progress bar
    parallel=1, # Number of parallel processes to use (more useful locally)
  )

but from couldpathcan only read a portion of files. should i change the cloudpathorinfoto access the whole files?

Setting up Igneous on macOS Catalina

Hi,

I'm trying to set up igneous on macOS Catalina and having issues with both pre-build Docker and manual installation. Do you suggest any workaround for setting up igneous on Catalina?

$ docker run seunglab/igneous
/usr/local/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "/igneous/igneous/task_execution.py", line 12, in <module>
    from igneous import logger
  File "/igneous/igneous/logger.py", line 13, in <module>
    google_credentials_path, project=PROJECT_NAME)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/client.py", line 74, in from_service_account_json
    with io.open(json_credentials_path, "r", encoding="utf-8") as json_fi:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cloudvolume/secrets/google-secret.json'

and kimimaro seems to fail during build when running pip install -r requirements.txt.

Thank you,
manoaman

error from consensus building task

I am running our consensus building routing.

This is the command I am using:

sudo docker run -v /secrets:/secrets seunglab/
igneous:master bash -c ' export PIPELINE_USER_QUEUE=zfish;  export QUEUE_TYPE=sqs; export SQS_UR
L=https://sqs.us-east-1.amazonaws.com/098703261575/zfish;  export LEASE_SECONDS=600; alias pytho
n=python3; export LC_ALL=C.UTF-8; export LANG=C.UTF-8; python /igneous/igneous/task_execution.py
 '

it used to work well, but I am getting some error now.

Pulling from sqs://https://sqs.us-east-1.amazonaws.com/098703261575/zfish
HyperSquareConsensusTask(src_path='gs://neuroglancer/zfish_v1/segmentation2',dest_path='gs://neu
roglancer/zfish_v1/consensus-20181125',ew_volume_id=28652,consensus_map_path='gs://neuroglancer/
zfish_v1/consensus-20181125/zfish_consensus.all.json',shape=[896, 896, 112],offset=[72960, 29952
, 16848])
Deprecation Warning: /root/.cloudvolume/secrets/google-secret.json is now preferred to /secrets/
google-secret.json.
raised a bytes-like object is required, not 'NoneType'
 Traceback (most recent call last):
  File "/igneous/igneous/task_execution.py", line 74, in execute
    task.execute()
  File "/igneous/igneous/tasks/tasks.py", line 646, in execute
    consensus = cache(self, self.consensus_map_path).decode('utf8')
  File "/igneous/igneous/tasks/tasks.py", line 92, in cache
    f.write(filestr)
TypeError: a bytes-like object is required, not 'NoneType'
 HyperSquareConsensusTask(src_path='gs://neuroglancer/zfish_v1/segmentation2',dest_path='gs://ne
uroglancer/zfish_v1/consensus-20181125',ew_volume_id=28652,consensus_map_path='gs://neuroglancer
/zfish_v1/consensus-20181125/zfish_consensus.all.json',shape=[896, 896, 112],offset=[72960, 2995
2, 16848])
 on host 98906f76805c
Traceback (most recent call last):
  File "/igneous/igneous/task_execution.py", line 94, in <module>
    command()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/igneous/igneous/task_execution.py", line 34, in command
    execute(tag, queue, server, qurl, loop)
  File "/igneous/igneous/task_execution.py", line 74, in execute
    task.execute()
  File "/igneous/igneous/tasks/tasks.py", line 646, in execute
    consensus = cache(self, self.consensus_map_path).decode('utf8')
  File "/igneous/igneous/tasks/tasks.py", line 92, in cache
    f.write(filestr)
TypeError: a bytes-like object is required, not 'NoneType'

Extract skeletons using igneous

Hi all, nice library
Recently I am using your library to extract neuron skeleton. But I am curious why the skeleton I extracted is broken and discontinuous, It shows below:
I change the skeleton to swc file and show it using Vaa3d,

In the red box the skeleton is broken, but the actual neurons are intact, why does it happen? How to slove it?

Custom mesh_dir for create_mesh_manifest_tasks()

create_mesh_tasks now supports specifying a mesh_dir to override the setting in the info file (#27). But create_mesh_manifest_tasks still only uses the info file, which makes this less useful for non-ChunkedGraph datasets.

64-bit Meshing

On Ran's agglomeration test:

raised value too large to convert to unsigned int Traceback (most recent call last): File "/igneous/igneous/task_execution.py", line 60, in execute task.execute() File "/igneous/igneous/tasks.py", line 219, in execute self._compute_meshes() File "/igneous/igneous/tasks.py", line 224, in _compute_meshes self._mesher.mesh(data.flatten(), *data.shape[:3]) File "_mesher.pyx", line 34, in _mesher.Mesher.mesh File "stringsource", line 48, in vector.from_py.__pyx_convert_vector_from_py_unsigned_int OverflowError: value too large to convert to unsigned int MeshTask(shape=[512, 512, 512],offset=[2635, 2000, 522],layer_path='gs://neuroglancer/ranl/flyem_agglomeration_test',mip=3,simplification_factor=100,max_simplification_error=40) on host igneous-1996647607-q94q2

Necessary for Phase II.

Non-Zero Inflationary Downsampling

You asked, you demanded, you got it.

For some use cases with data that are mostly black, it makes sense to pick non-zero values over zeroed values when downsampling. For data that consist of single points, this can be important. The existing downsampling code treats zero like any other value.

However, in my underground alchemic algorithms laboratory, I developed a new variant that can handle this case. We can integrate it as an option into igneous.

https://github.com/william-silversmith/countless/blob/master/python/countless2d.py#L78

TRAVIS_BRANCH environment variable

version.py requires the environment variable TRAVIS_BRANCH . However, in many cases this variable is not set and users need to define it manually (and fill it with nonesense) to make igneous work.

https://github.com/seung-lab/igneous/blob/master/version.py#L11

Create spatial_index for Images

This way we don't have to download everything to check what labels exist.

It would be good to add this as a CLI option too. We should automatically do this for downsampled images or transfers, but older datasets may need upgrades so it should be a separate task too.

display skeltons in neuroglancer

Hi all,

thanks for this cool library.

I am generating skeltons from a local precomputed segmentation layer. And it worked after changing the function name to create_skeletonizing_tasks(). (the doc says create_skeletonization_tasks())

However, in neuroglancer, the skeletons are not recognized / displayed. The info file points to the correct skeletons folder.

Any pointers what I am doing wrong?

Cheers,
Chris

Help understanding non-aligned write error when downsampling

I have a volume representing a stack of images read out from a microscope. There are 687 2D images each of which have dimension 2160 x 2560. So the resulting volume has dimension [2160,2560,687] in x,y,z. At the full resolution this volume loads too slow in Neuroglancer to be usable so I have turned to downsampling. A successful downsampling scheme I have used is the following:

mip = 0, factor = [2,2,1], resulting dimension = [1080,1280,687]
mip = 1, factor = [2,2,1], resulting dimension = [540,640,687]
mip = 2, factor = [2,2,1], resulting dimension = [270,320,687] 
mip = 3, factor = [2,2,1], resulting dimension = [135,160,687]

I am using chunk_size=[128,128,64] for all mip levels.

The code structure I am using to do the downsampling is:

mips = [0,1,2,3]
for mip in mips:
        cv = CloudVolume(rechunked_cloudpath, mip)
        chunks = calculate_chunks(downsample, mip) # uses the scheme mentioned above
        factors = calculate_factors(downsample, mip) # uses the scheme mentioned above
        tasks = tc.create_downsampling_tasks(cv.layer_cloudpath, 
                            mip=mip, 
                            num_mips=1, 
                            factor=factors,
                            preserve_chunk_size=False,
                            compress=True, 
                            chunk_size=chunks)
        tq.insert(tasks)
        tq.execute()

This works fine but I would like to downsample in z as well. Let's say I adopt the following downsampling scheme instead:

mip = 0, factor = [2,2,1], resulting dimension = [1080,1280,687]
mip = 1, factor = [2,2,1], resulting dimension = [540,640,687]
mip = 2, factor = [2,2,3], resulting dimension = [270,320,229] 
mip = 3, factor = [2,2,1], resulting dimension = [135,160,229]

The only difference is the factor=[2,2,3] in the mip=2 downsample, where I chose 3 because 687 is divisible by 3. The python code fails on this downsample level and gives me the following error:

 Alignment Check:
    Mip:             3
    Chunk Size:      [128 128  64]
    Volume Offset:   [0 0 0]
    Received:        Bbox([0, 0, 192],[128, 128, 229], dtype=int32)
    Nearest Aligned: Bbox([0, 0, 192],[128, 128, 256], dtype=int32)

I don't understand what the problem is. It seems like there are some left over pixels in the z dimension so there is an incomplete chunk. But shouldn't this happen always unless your image size happens to be divisible by the chunk size? That's almost never the case. I didn't get this error when the z dimension was 687 for example. What is meant by "Aligned" in the error message?

I can strip off a z plane at the end to make the z dimension an even number but I'd ideally like to understand what is causing this issue.

Thanks,
Austin

How to serve millions of meshes created by igneous on a school server

Hi,

I've been using cloud-volume and igneous to display segmentation and meshes, served from a school server (e.g. link). My problem is that when I have millions of objects, the *.gz files generated by igneous in a single folder become really slow to be displayed. E.g., each single mesh description file takes long time to access link.

Wonder if there is any way to solve this issue?

Thanks again for the amazing tool and support!
Donglai

register igneous to pip?

I need to use some functions from igneous, it would be easier to make travis tests if igneous was registered.

the task queue name in log is not correct.

the QUEUE_NAME is not correct when we use SQS?
I always get the taskQueueName: pull_queue eventhough I am using sqs. This is not important but would be nice to fix it.

pipeline using all of the available memory

Hi,

We are running this pipeline on a 20.04 ubuntu host and when when this pipline runs, it begins to consume ram and continues to consume it until all available system memory has been used up and the process then terminates. We have tried adding more ram and cpu to the machine but the outcome is the same and the ram eventually gets used up by python processes that are using multiple gb of ram each. Any idea why this might be happening? Appreciate the help

Compatibility of github repo and dockerhub image

Are the recent github code and dockerhub versions compatible? I was browsing the commits of the last few months, and don't notice anything that looks breaking.

I am updating dependencies in my project, and ran into a conflict with oauth2client being pinned in the older version of igneous I was using. My client code isn't fancy -- it's submitting transfer tasks with downsampling to an AWS queue. My igneous cluster deploys an older igneous docker image, which reads from the queue.

Previously I had pinned:
github commit: 2e1db31f60331420f72c958cedb7932a84fe6ef (2020/09/02)
dockerhub sha256: b359fce8e5b3e5061d6b4800fd61cd2b9b9c8c10e2b5f87f11984d1a3ee7cdfb (cannot locate)

Currently, the most recent versions are:
github commit: 823c9b1 (2021/05/15)
dockerhub sha256: 896dd8db6d7d3bf53bbe12623653eb6d4eb485fcb632ef5c426caa65b25d3ad3 (4 months ago)

I don't know how to tell at which the dockerhub image was built. Are there any incompatibilities to be aware of if I jump to the newest versions?

Thanks in advance!

CI test overwrites igneous:master

The travis config seems misconfigured and causes the most recent PR to be pushed to Docker's seunglab/igneous:master 😱

using igneous for google segmentation

Hi, nice library, recently i use this library to extract skeleton

i have download the google segmentation for FAFB datasets in my local computer.

but when i use the following code, there are something wrong:

cloudpath1 = 'file:///mnt/d/braindata/google_segmentation/google_256.0x256.0x320.0/'
mip = 0
# First Pass: Generate Skeletons
tasks1 = tc.create_skeletonizing_tasks(
    cloudpath1, 
    mip, # Which resolution to skeletionize at (near isotropic is often good)
    shape=Vec(64, 64, 32), # size of individual skeletonizing tasks (not necessary to be chunk aligned)
    sharded=False, # Generate (true) concatenated .frag files (False) single skeleton fragments
    spatial_index=False, # Generate a spatial index so skeletons can be queried by bounding box
    info=None, # provide a cloudvolume info file if necessary (usually not)
    fill_missing=False, # Use zeros if part of the image is missing instead of raising an error

    # see Kimimaro's documentation for the below parameters
    teasar_params={'scale':10, 'const': 50}, 
    object_ids=None, # Only skeletonize these ids
    mask_ids=None, # Mask out these ids
    fix_branching=True, # (True) higher quality branches at speed cost
    fix_borders=True, # (True) Enable easy stitching of 1 voxel overlapping tasks 
    dust_threshold=1000, # Don't skeletonize below this physical distance
    progress=False, # Show a progress bar
    parallel=1, # Number of parallel processes to use (more useful locally)
  )
tq = MockTaskQueue()
tq.insert_all(tasks1)

the output as following:

Connected Components Error: Label 34856 cannot be mapped to union-find array of length 34856.

and i saw the google segmentation results is encode by Hex, should i change them to decimal?

and i also change the info file like the following:

{
  "@type": "neuroglancer_multiscale_volume",
  "data_type": "uint64",
  "mesh": "mesh",
  "num_channels": 1,
  "scales": [
    {
      "chunk_sizes": [
        [
          64,
          64,
          64
        ]
      ],
      "compressed_segmentation_block_size": [
        8,
        8,
        8
      ],
      "encoding": "compressed_segmentation",
      "key": "512.0x512.0x640.0",
      "resolution": [
        512,
        512,
        640
      ],
      "sharding": {
        "@type": "neuroglancer_uint64_sharded_v1",
        "data_encoding": "gzip",
        "hash": "identity",
        "minishard_bits": 4,
        "minishard_index_encoding": "gzip",
        "preshift_bits": 9,
        "shard_bits": 0
      },
      "size": [
        1944,
        1048,
        442
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    }
  ],
  "skeletons": "skeletons_mip_0",
  "type": "segmentation"
}

Neuroglancer precomputed skeleton source

Hi,

I am trying to serve precomputed skeleton data in a Neuroglancer viewer instance. I have followed the steps mentioned in the precomputed skeleton guidelines and taken inspiration from Igneous' SkeletonTask.
Here is how the info file looks like:

{
        "@type": "neuroglancer_skeletons",
        "transform": [  
            1, 
            0, 
            0, 
            0,
            0, 
            1, 
            0, 
            0,
            0, 
            0, 
            1, 
            0
        ],
        "vertex_attributes": [],
        "sharding": None,
        "spatial_index": None
}

In my case, there are no vertex attributes that have to be specified. The encoded binary data is served through a <segment-id> endpoint like this:

data = [
    np.uint32(num_vertices), # N
    np.uint32(num_edges), # M
    vertex_positions, # (N,3) float32 array
    edges # (M,2) uint32 array
]
encoded_skeleton = b''.join([array.tobytes('C') for array in data])

Both the info and binary data are served through the endpoint "[HOST]:[PORT]/skeletons/". The info file is fetched alright but the encoded skeleton endpoint is never hit. I have followed the same approach for the meshes and they are rendered fine! Just to be sure that the skeletons are correct, neuroglancer.skeleton.SkeletonSource works well with my skeletons.

Thanks,
Hashir

Are preemptible nodes and autoscaling acceptable?

This is likely something that can be mentioned in the documentation somewhere, and I'm happy to do it. I'm thinking ahead to a situation with a k8s pod auto-scaling on top of an auto-scaling node pool with pre-emptible instances.

If I use pre-emptible nodes, which are much cheaper, will I run into any issues dropping messages as nodes come and go?

Would it work to put a horizontal pod scaler, like the following, in the deployment yaml? If so, what is a decent target CPU, based on prior experience?

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: igneous
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: igneous
minReplicas: 1
maxReplicas: 320
targetCPUUtilizationPercentage: 80

Optimize MeshManifestTask for Filesystem

The prefix strategy may not be necessary when there are few enough meshes. A simple directory scan would be enough. Maybe this could be an option or another task creation function?

2x2x2 downsampling issues

I tried to downsample the volume 2x2x2 with a very simple Python script:

from taskqueue import LocalTaskQueue
import igneous.task_creation as tc

src_layer_path = 'file://output'
dest_layer_path = 'file://output2'

with LocalTaskQueue(parallel=8) as tq:
  tasks = tc.create_transfer_tasks(
    src_layer_path, dest_layer_path, 
    chunk_size=(64,64,16), skip_downsamples=True, compress='gzip'
  )
  tq.insert_all(tasks)
  tasks = tc.create_downsampling_tasks(
    dest_layer_path, factor=(2,2,2), compress='gzip', num_mips=1
  )
  tq.insert_all(tasks)

print("Done!")

However, after the task finished I have a 4_4_4 folder (mip 0) and an 8_8_4 folder (mip 1), although the contents in the 8_8_4 folder are indeed 2x2x2 downsampled so just the folder name is wrong. In addition, the info file generated is also incorrect:

{
  "data_type": "uint8",
  "num_channels": 1,
  "scales": [
    {
      "chunk_sizes": [
        [
          64,
          64,
          16
        ]
      ],
      "encoding": "raw",
      "key": "4_4_4",
      "resolution": [
        4,
        4,
        4
      ],
      "size": [
        412,
        914,
        800
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    },
    {
      "chunk_sizes": [
        [
          64,
          64,
          16
        ]
      ],
      "encoding": "raw",
      "key": "8_8_4",
      "resolution": [
        8,
        8,
        4
      ],
      "size": [
        206,
        457,
        800
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    },
    {
      "chunk_sizes": [
        [
          64,
          64,
          16
        ]
      ],
      "encoding": "raw",
      "key": "16_16_4",
      "resolution": [
        16,
        16,
        4
      ],
      "size": [
        103,
        229,
        800
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    },
    {
      "chunk_sizes": [
        [
          64,
          64,
          16
        ]
      ],
      "encoding": "raw",
      "key": "8_8_8",
      "resolution": [
        8,
        8,
        8
      ],
      "size": [
        206,
        457,
        400
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    }
  ],
  "type": "image"
}

As it contains many non-existent layers. After manually correcting the folder name to 8_8_8 and fixing the info file, the volume appears to be correct and can be visualized with Neuroglancer. All of my packages (Igneous, CloudVolume, etc.) are up-to-date as of right now.

Add Tests for LuminanceLevelsTask

Also contrast normalization tasks.

Improve Soma Skeletonizations

We could conceivably add a third pass to the skeleton construction by using much higher mip levels to find the somata with a large enough context that it's nbd. We would then find a way to merge the resulting good skeletons with the mess that results from many smaller fields of view. @jabae had some thoughts about this.

Problems

avocados (use fill holes and fix_avocados in Kimimaro)
selecting the right mip level
selecting the right stride
merging the new skeletons with the old ones

Fixing the pinky100 skeletonization would be a great target.

kimamaro breaks with latest igneous reqs

\"/app/src/igneous/igneous/tasks/skeletonization.py\", line 18, in <module>"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"    import kimimaro"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"  File \"/usr/local/lib/python3.6/site-packages/kimimaro/__init__.py\", line 19, in <module>"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"    from .postprocess import postprocess, join_close_components"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"  File \"/usr/local/lib/python3.6/site-packages/kimimaro/postprocess.py\", line 36, in <module>"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"    from cloudvolume import Skeleton, Bbox"}
{"source":"unknown","time":"--/app/T::uwsgi.ini.000Z","severity":"error","message":"ImportError: cannot import name 'Skeleton'"}
{"source":"unknown","time":"-```

[feature request] change parameter in command line while execution

Sometimes, I would like to change parameters at the middle of a run. For example, I would like to switch to 4 core machines from 2 core machines and I need to change the parameter of network file, but the parameter was hard coded in the tasks.

It would be nice to have this capability. It is not urgent though.

Client code as a separate package

I have one application that installs/import igneous just to get to the igneous.task_creation.create_transfer_tasks method. A separate cluster, based on the igneous Docker image, is responsible for handling tasks from several applications and performing transfer/downsampling for viewers.

In your README.md, you mention being amenable to breaking out pieces of code as separate libraries. Would the task handling code be a candidate for being a separate library?

I'd like to shed the dependencies on libraries like kimimaro/tinybrain/zmesh.

Igneous doesn't support floating point resolutions

e.g. [4,4,0.4] breaks downsampling

Somewhat unusual workflow

We are planning to manually segment some data, and to reduce the amount of time for reconstruction we are going to segment a downsampled volume. The workflow I'm currently thinking is:

Raw EM data (4x4x4 nm) -> downsample to 16x16x16 nm -> segment at 16x16x16 nm -> upscale segmentation to 4x4x4 nm w/ nearest neighbor -> generate meshes at 4x4x4 nm -> combine upscaled segmentation and meshes with raw EM data and visualize in Neuroglancer

I'm hoping to use Igneous for many of these steps, and I have a few questions:

I saw that a new option factor is being added to the downsampling function. Does that mean I can directly downsample from 4x4x4 to 16x16x16 for the raw EM data?
I don't think I saw an upscaling function in Igneous, could you confirm that there isn't such a function?
Is it possible to directly generate meshes with 16x16x16 segmentation data and display them properly in the 4x4x4 dataset? (instead of upscaling and then generating meshes at 4x4x4)

Thank you very much for your help!

Issues building deflate library

Hi Will, I've had trouble installing igneous's prerequisites – I'm sorry to bother you with something that has its roots in non-Seung lab code, but Tommy recommended I post an issue here and that you might be interested in it and able to help. Specifically pip install deflate fails on Ubuntu 16.04.4 and CentOS Linux release 7.7.1908 (but succeeds on MacOS 10.13.6 High Sierra). The full output is below. (Here I show it as resulting from pip install deflate but it's the same thing when trying to pip install -r requirements.txt from a cloned igneous folder.)

Collecting deflate
  Downloading deflate-0.1.0.tar.gz (140 kB)
     |████████████████████████████████| 140 kB 6.2 MB/s 
Building wheels for collected packages: deflate
  Building wheel for deflate (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/jtm23/.virtualenvs/igneous/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-tzd1qja3
       cwd: /tmp/pip-install-29a0i252/deflate/
  Complete output (46 lines):
  running bdist_wheel
  running build
  running build_ext
    CC       lib/deflate_decompress.o
    CC       lib/utils.o
    CC       lib/arm/cpu_features.o
    CC       lib/x86/cpu_features.o
    CC       lib/deflate_compress.o
    CC       lib/adler32.o
    CC       lib/zlib_decompress.o
    CC       lib/zlib_compress.o
    CC       lib/crc32.o
    CC       lib/gzip_decompress.o
    CC       lib/gzip_compress.o
    AR       libdeflate.a
    CC       lib/deflate_decompress.shlib.o
    CC       lib/utils.shlib.o
    CC       lib/arm/cpu_features.shlib.o
    CC       lib/x86/cpu_features.shlib.o
    CC       lib/deflate_compress.shlib.o
    CC       lib/adler32.shlib.o
    CC       lib/zlib_decompress.shlib.o
    CC       lib/zlib_compress.shlib.o
    CC       lib/crc32.shlib.o
    CC       lib/gzip_decompress.shlib.o
    CC       lib/gzip_compress.shlib.o
    CCLD     libdeflate.so.0
    LN       libdeflate.so
    GEN      programs/config.h
    CC       programs/gzip.o
    CC       programs/prog_util.o
    CC       programs/tgetopt.o
    CCLD     gzip
    LN       gunzip
  building 'deflate' extension
  creating build
  creating build/temp.linux-x86_64-3.7
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/n/app/libffi/3.2.1/lib/libffi-3.2.1/include -I/n/app/libffi/3.2.1/lib/libffi-3.2.1/include -fPIC -I/n/app/python/3.7.4/include/python3.7m -c deflate.c -o build/temp.linux-x86_64-3.7/deflate.o
  creating build/lib.linux-x86_64-3.7
  gcc -pthread -shared -L/n/app/libffi/3.2.1/lib64 -L/n/app/libffi/3.2.1/lib64 build/temp.linux-x86_64-3.7/deflate.o libdeflate/libdeflate.a -L/n/app/python/3.7.4/lib -lpython3.7m -o build/lib.linux-x86_64-3.7/deflate.cpython-37m-x86_64-linux-gnu.so
  /usr/bin/ld: libdeflate/libdeflate.a(deflate_decompress.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
  /usr/bin/ld: libdeflate/libdeflate.a(deflate_compress.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
  /usr/bin/ld: libdeflate/libdeflate.a(crc32.o): relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
  /usr/bin/ld: final link failed: Nonrepresentable section on output
  collect2: error: ld returned 1 exit status
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for deflate
  Running setup.py clean for deflate
Failed to build deflate
Installing collected packages: deflate
    Running setup.py install for deflate ... error
    ERROR: Command errored out with exit status 1:
     command: /home/jtm23/.virtualenvs/igneous/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gyn3qb3j/install-record.txt --single-version-externally-managed --compile --install-headers /home/jtm23/.virtualenvs/igneous/include/site/python3.7/deflate
         cwd: /tmp/pip-install-29a0i252/deflate/
    Complete output (15 lines):
    running install
    running build
    running build_ext
    building 'deflate' extension
    creating build
    creating build/temp.linux-x86_64-3.7
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/n/app/libffi/3.2.1/lib/libffi-3.2.1/include -I/n/app/libffi/3.2.1/lib/libffi-3.2.1/include -fPIC -I/n/app/python/3.7.4/include/python3.7m -c deflate.c -o build/temp.linux-x86_64-3.7/deflate.o
    creating build/lib.linux-x86_64-3.7
    gcc -pthread -shared -L/n/app/libffi/3.2.1/lib64 -L/n/app/libffi/3.2.1/lib64 build/temp.linux-x86_64-3.7/deflate.o libdeflate/libdeflate.a -L/n/app/python/3.7.4/lib -lpython3.7m -o build/lib.linux-x86_64-3.7/deflate.cpython-37m-x86_64-linux-gnu.so
    /usr/bin/ld: libdeflate/libdeflate.a(deflate_decompress.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
    /usr/bin/ld: libdeflate/libdeflate.a(deflate_compress.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
    /usr/bin/ld: libdeflate/libdeflate.a(crc32.o): relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
    /usr/bin/ld: final link failed: Nonrepresentable section on output
    collect2: error: ld returned 1 exit status
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/jtm23/.virtualenvs/igneous/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29a0i252/deflate/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gyn3qb3j/install-record.txt --single-version-externally-managed --compile --install-headers /home/jtm23/.virtualenvs/igneous/include/site/python3.7/deflate Check the logs for full command output.

Looks like deflate tries to build libdeflate during installation. I tried cloning libdeflate and makeing it, and that succeeded just fine. So it seems like an issue with how the deflate package tries to link the libdeflate c libraries. I don't know much about c/c++ or compilers so I'm a bit useless here, but I tried adding both extra_compile_args=["-fPIC"] and extra_link_args=["-fPIC"] to deflate/setup.py and then running python setup.py install on that updated script. That succeeded in inserting an -fPIC argument into the gcc command that gets run during install, but didn't solve the problem and the same error message was output. That's as far as I got. Do you have any idea how to get past this install issue? Would love to start running igneous tasks but this is holding me up.

Failure reading tensorstore sharded volume

edited

In playing around with the sharded format, I had a unit test fail when igneous downsampling a certain volume shape. I tracked it down Cloudvolume fails reading a tensorstore generated volume.

The following demonstrates the issue on a scaled down volume.

import numpy as np
import cloudvolume
import tensorstore as ts

# Base array with ratios 4:4:1.
arr = np.arange(128, dtype=np.uint16).reshape((8, 8, 2, 1))
chunk_size = (2,2,2)

# Single shard with single minishard
sharding = {
    "@type": "neuroglancer_uint64_sharded_v1",
    "preshift_bits": 4,
    "hash": "identity",
    "minishard_bits": 0,
    "shard_bits": 0,
    "minishard_index_encoding": "gzip",
    "data_encoding": "gzip",
}


# tensorstore written in sharded format
spec = {
        "driver": "neuroglancer_precomputed",
        "kvstore": {
            "driver": "file",
            "path": "/tmp/tensorstore",
        },
        "multiscale_metadata": {
            "type": "image",
            "data_type": arr.dtype.name,
            "num_channels": arr.shape[3],
        },
        "scale_metadata": {
            "size": arr.shape[:3],
            "chunk_size": chunk_size,
            "resolution": (1,1,1),
            "encoding": "raw",
            "sharding": sharding,
        },
        "create": True,
        "delete_existing": True,
    }
dataset_future = ts.open(spec)
ds = dataset_future.result()
ds[:] = arr


vol = cloudvolume.CloudVolume("file:///tmp/tensorstore")
vol[()]

Stack trace

EmptyVolumeException                      Traceback (most recent call last)
<ipython-input-6-b02c9221f2ec> in <module>
     47 
     48 vol = cloudvolume.CloudVolume("file:///tmp/tensorstore")
---> 49 vol[()]

~/.local/share/virtualenvs/starmap-T47byR32/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py in __getitem__(self, slices)
    527     requested_bbox = Bbox.from_slices(slices)
    528 
--> 529     img = self.download(requested_bbox, self.mip)
    530     return img[::steps.x, ::steps.y, ::steps.z, channel_slice]
    531 

~/.local/share/virtualenvs/starmap-T47byR32/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py in download(self, bbox, mip, parallel, segids, preserve_zeros, agglomerate, timestamp, stop_layer, renumber)
    575       parallel = self.parallel
    576 
--> 577     tup = self.image.download(bbox, mip, parallel=parallel, renumber=bool(renumber))
    578     if renumber:
    579       img, remap = tup

~/.local/share/virtualenvs/starmap-T47byR32/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/image/__init__.py in download(self, bbox, mip, parallel, location, retain, use_shared_memory, use_file, order, renumber)
    149 
    150       spec = sharding.ShardingSpecification.from_dict(scale['sharding'])
--> 151       return rx.download_sharded(
    152         bbox, mip,
    153         self.meta, self.cache, spec,

~/.local/share/virtualenvs/starmap-T47byR32/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/image/rx.py in download_sharded(requested_bbox, mip, meta, cache, spec, compress, progress, fill_missing, order)
     76         chunkdata = None
     77       else:
---> 78         raise EmptyVolumeException(cutout_bbox)
     79 
     80     img3d = decode(

EmptyVolumeException: Bbox([0, 4, 0],[2, 6, 2], dtype=int32)

user email hard coded

the user email should change by the user? it is hard coded now. I am not sure how to change this though.
https://github.com/seung-lab/igneous/blob/master/igneous/task_creation.py#L31

Replace oauth2client with google-auth

oauth2client has been deprecated in favor of google-auth and should be upgraded.

raised 'NotImplementedError' object has no attribute 'lease'

I am trying to run GPU inference for mitochondria map, but get some errors, any idea? @william-silversmith
I have tried to upgrade taskqueue, but still have this problem. I am using the chunkflow branch though. do I need to merge the master?

(jwu) ~/workspace/igneous/igneous$ export SQS_URL=https://sqs.us-east-1.amazonaws.com/098703261575/jwu-igneous; export LEASE_SECONDS=3600; python task_execution.py --qurl https://sqs.us-east-1.amazonaws.com/098703261575/jwu-igneous --loop
Pulling from pull-queue://https://sqs.us-east-1.amazonaws.com/098703261575/jwu-igneous
raised 'NotImplementedError' object has no attribute 'lease'
 Traceback (most recent call last):
  File "task_execution.py", line 71, in execute
    task = tq.lease(tag=tag, seconds=int(LEASE_SECONDS))
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/taskqueue/taskqueue.py", line 152, in lease                                                         
    tasks = self._api.lease(
AttributeError: 'NotImplementedError' object has no attribute 'lease'

 undefined task
 on host seungworkstation20
Traceback (most recent call last):
  File "task_execution.py", line 94, in <module>
    command()
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/click/core.py", line 722, in __call__                                                               
    return self.main(*args, **kwargs)
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/click/core.py", line 697, in main                                                                   
    rv = self.invoke(ctx)
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/click/core.py", line 895, in invoke                                                                 
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/click/core.py", line 535, in invoke                                                                 
    return callback(*args, **kwargs)
  File "task_execution.py", line 34, in command
    execute(tag, queue, server, qurl, loop)
  File "task_execution.py", line 71, in execute
    task = tq.lease(tag=tag, seconds=int(LEASE_SECONDS))
  File "/usr/people/jingpeng/workspace/igneous/jwu/lib/python3.5/site-packages/taskqueue/taskqueue.py", line 152, in lease                                                         
    tasks = self._api.lease(
AttributeError: 'NotImplementedError' object has no attribute 'lease'

DeleteTask follows redirects, potentially deleting important data

With the current behavior, the delete task just keeps following the redirect key specified in the info file. That's definitely bad - if you want to delete a dataset, the path provided should be unambiguous. And it prevents deleting the dataset that's masked by the redirect. (We do have such datasets)

Opening this issue to determine what the best approach is.
I would just pass max_redirects=0 for create_deletion_task as well as DeleteTask.execute(), but maybe there is a better way?

libboost-all-dev is big

in the dockerfile, all the boost libraries were installed, which is pretty big.
which library needs this?
maybe we can downsize to a few boost package?