Giter Site home page Giter Site logo

sage-bionetworks / synapsepythonclient Goto Github PK

View Code? Open in Web Editor NEW
65.0 27.0 67.0 18.91 MB

Programmatic interface to Synapse services for Python

Home Page: https://www.synapse.org

License: Apache License 2.0

Python 99.97% Dockerfile 0.02% Shell 0.01%
synapse python

synapsepythonclient's Issues

Let the user specify the number of allowed threads

Operating system

Any

Client version

2.4.0

Description of the problem

synapseclient spawns too many computational threads.

Relevant lines of the code
synapseclient/client.py:from synapseclient.core.pool_provider import DEFAULT_NUM_THREADS
synapseclient/client.py: 'max_threads': DEFAULT_NUM_THREADS,
synapseclient/core/upload/multipart_upload.py: max_threads = pool_provider.DEFAULT_NUM_THREADS
synapseclient/core/pool_provider.py:DEFAULT_NUM_THREADS = multiprocessing.cpu_count() + 4

cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.

Expected behavior

A synapseclient.Synapse attribute to set the number of threads and allowing the pool_provider to read an environment variable to set the number of threads would help with this issue.

Actual behavior

cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.

Certificate has no subjectAltName, falling back to check for a commonName for now

Upgrading to synapseclient 1.6.1, I am now getting the following warning multiple times:

/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for file-prod.prod.sagebase.org has no `subjectAltName` , falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)

No path found for syn.get

Using python 2.7, I am able to get the example matrix of syn1901033, but when I use an actual SynapseID (syn5511449 in this case), I receive the error:

## retrieve a 100 by 4 matrix
matrix = syn.get('syn5511449')

## inspect its properties
print(matrix.name)
print(matrix.description)
print(matrix.path)

## load the data matrix into a dictionary with an entry for each column
with open(matrix.path, 'r') as f:
    labels = f.readline().strip().split('\t')
    data = {label: [] for label in labels}
    for line in f:
        values = [float(x) for x in line.strip().split('\t')]
        for i in range(len(labels)):
            data[labels[i]].append(values[i])

Walking Activity
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-43-88b0cdf2c38e> in <module>()
      4 ## inspect its properties
      5 print(matrix.name)
----> 6 print(matrix.description)
      7 print(matrix.path)
      8 

/Users/ajenkins/anaconda/lib/python2.7/site-packages/synapseclient/entity.pyc in __getattr__(self, key)
    366             ## about what exceptions it catches. In Python3, hasattr catches
    367             ## only AttributeError
--> 368             raise AttributeError(key)
    369 
    370 

AttributeError: description

Is there a reason why when I use an actual SynapaseID, that I am not able to get a path?

Data are uploaded in duplicate if rows are added and the schema changes simulatneously

As discussed in the RTI/Synapse call, we are seeing duplicated rows in data uploaded to the server while uploading batched data to Synapse. For each table, we load flattened JSON data into a pandas dataframe after processing every 100 records, the data gets saved to Synapse by calling store(). The issue arrises after the initial upload of the table, when during the second upload rows are added and the schema changes.

To reproduce the issue, run the dup_test.py file in our github repo: synapse-span-table

Operating system

  • Docker image (Ubuntu Linux) running on AWS or OSX

Client version

2.3.1

Slow uploads of data with single records

As discussed in the RTI/Synapse call:

We are using Synapse tables for the storage and curation of data for a multi-site study. Our data lives in a document data store in JSON files. We process the data and flatten it into a data table structure for upload to Synapse. Most of the documents have many entries which create more than 152 columns. We wrote a python module that splits the data into 152 column sections and uploads the data to Synapse in columns with type STRING and 50 characters in length.

We are processing documents one-at-a-time as they are received in the document store database. Even when only one row is being uploaded, we see long delays in the API call (multiple seconds in most cases). With more than 120,000 to process, our upload strategy became untenable as the processing time reached almost a month.

To reproduce the issue, run the python3 test.py in our synapse-span-table module.

Is there any improvement to the use of the API you suggest that will speed up the process?

We understand that Synapse is many used and optimized for uploading batched records, but have run into issues with that strategy as well (see: Issue 867)

support custom Session objects (feature requests)

I have a feature request related to my specific use case. I have a large synapse project where the files themselves are hosted on google drive. The files on synapse are direct link-outs. Unfortunately google drive caps direct download for files over 50MB, instead redirecting the user to download a link with a random confirmation code in the url's query string.

Therefore, the basic url request doesn't quite work for me. I need to stream the file extract the confirmation code, and make a second request, while retaining the cookie from the original request.

I can do all of this in a custom get method (see my gist for a derivative request.Session class) but I need a way of getting this object into a Synapse client object. See PR #713 for a simple example.

Example usage:

import synapseclient
import synapseutils
from gdrivesession import GDriveSession

session = GDriveSession()
syn = synapseclient.Synapse(session=session)
syn.login()
files = synapseutils.syncFromSynapse(syn, "syn20844101")

I understand that overwriting get methods for requests might expose the user to security issues or simple user error, but perhaps this is of general interest since there are many urls that are not simply open on the web. At the very least you might want some type/integrity checking on the session object.

Thank you for the consideration.

downloadTableFile behaviour change in 1.7.1

Hi, I've been using the above method to download mPower files for some time. Just upgraded to synclient 1.7.1 (from 1.6.2 ) and things broke. The method no longer returns a dict , returns string (of the path) instead . Also, if you specify a downloadLocation of "." as per the docs , fails with ' cannot find path "" '. if you leave downloadLocation out, it defaults to the cache, as you'd expect. both fairly minor , perhaps just doc update required ?

Implement a verbose mode

I'm trying to download a large file and I can't tell if it's going successfully or not. It would be great to get more diagnostic information from the Synapse client to confirm that the download has begun and, ideally, progress information as well.

download speed, unnecessary REST calls

Bug Report

Operating system

ubuntu 18.04
4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Client version

Output of:

import synapseclient
synapseclient.__version__

'1.9.3'

Description of the problem

I am trying to download a synapse storage, via synapseutils.syncFromSynapse, but the progress is very slow. The project contains many subfolders(~10k) with 3 files per folder. The download speed is not the problem, rather a REST API request which seems to be made per file in: Synapse::getProvenance.
This function is called in every recursive invocation of synapseutils.syncFromSynapse on all members of allFiles array. Where the allFiles array contains all previously processed files.
One REST-call amounts to t=~100-200ms per call, leading to a duration of (n * t)! for n files.
With ! denoting the factorial.

Expected behavior

Faster download, do not repeat REST request for all files.

Actual behavior

What actually happened? Provide output or error messages from the console, if applicable.

Is there some fast workaround?

Add unit test for synapseclient.core.utils#printTransferProgress

Issue for use with weekly Code Review:

def printTransferProgress(transferred, toBeTransferred, prefix='', postfix='', isBytes=True, dt=None,
                          previouslyTransferred=0):
    """Prints a progress bar
    :param transferred:             a number of items/bytes completed
    :param toBeTransferred:         total number of items/bytes when completed
    :param prefix:                  String printed before progress bar
    :param postfix:                 String printed after progress bar
    :param isBytes:                 A boolean indicating whether to convert bytes to kB, MB, GB etc.
    :param dt:                      The time in seconds that has passed since transfer started is used to calculate rate
    :param previouslyTransferred:   the number of bytes that were already transferred before this transfer began
                                    (e.g. someone ctrl+c'd out of an upload and restarted it later)
    """
    if not sys.stdout.isatty():
        return
    barLength = 20  # Modify this to change the length of the progress bar
    status = ''
    rate = ''
    if dt is not None and dt != 0:
        rate = (transferred - previouslyTransferred)/float(dt)
        rate = '(%s/s)' % humanizeBytes(rate) if isBytes else rate
    if toBeTransferred < 0:
        defaultToBeTransferred = (barLength*1*MB)
        if transferred > defaultToBeTransferred:
            progress = float(transferred % defaultToBeTransferred) / defaultToBeTransferred
        else:
            progress = float(transferred) / defaultToBeTransferred
    elif toBeTransferred == 0:  # There is nothing to be transferred
        progress = 1
        status = "Done...\n"
    else:
        progress = float(transferred) / toBeTransferred
        if progress >= 1:
            progress = 1
            status = "Done...\n"
    block = int(round(barLength*progress))
    nbytes = humanizeBytes(transferred) if isBytes else transferred
    if toBeTransferred > 0:
        outOf = "/%s" % (humanizeBytes(toBeTransferred) if isBytes else toBeTransferred)
        percentage = "%4.2f%%" % (progress*100)
    else:
        outOf = ""
        percentage = ""
    text = "\r%s [%s]%s   %s%s %s %s %s    " % (prefix,
                                                "#"*block + "-"*(barLength-block),
                                                percentage,
                                                nbytes, outOf, rate,
                                                postfix, status)
    sys.stdout.write(text)
    sys.stdout.flush()

https://github.com/Sage-Bionetworks/synapsePythonClient/blob/develop/synapseclient/core/utils.py#L596

How to list a folder?

I'm not sure how to list a folder. Am I missing something obvious? Seems like along with get/store, list is one of the most important file system actions.

I dug into the code and found _list(), but it's too complicated for me to understand.

Allow overriding the cache location via Synapse() constructor

Bug Report

Operating system

macOS

Client version

2.1.0

Description of the problem

Provide a description of the problem, and if possible a minimal reproducible example.

I would like to download one dataset to one directory, and a second dataset to a different directory, but these need to be downloaded using the syn.downloadTableColumns function.

This function automatically downloads to the cache location, so this is not possible without updating the config file in between.

It would be great if the cache location could be set in the Synapse() constructor directly.

Developer interested in helping our project

Hello I am a college student that is interested in helping the project. I have found this project off the mozilla website where do I start to help improve the project itself?

Slow synapse get on large projects

On large projects with many files, the synapse get command is very slow.
One suggestion I can make to speeding it up is using multi-threading / multi-processing for making the calls to 'get' concurrent.

A simple patch that suggests one way to do this is pasted below (sorry, for some reason github wouldn't let me upload it, just save to a txt file and apply) -

From d6ae4c2 Mon Sep 17 00:00:00 2001
From: fidlr [email protected]
Date: Tue, 18 Oct 2016 10:35:51 +0300
Subject: [PATCH] multi-threaded get


synapseutils/sync.py | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/synapseutils/sync.py b/synapseutils/sync.py
index dfbfab8..de399a8 100644
--- a/synapseutils/sync.py
+++ b/synapseutils/sync.py
@@ -2,6 +2,13 @@ import errno
from synapseclient.entity import is_container
from synapseclient.utils import id_of
import os
+from concurrent.futures import ThreadPoolExecutor
+
+pool = ThreadPoolExecutor(max_workers=3) # Synapse allows up to 3 concurrent requests
+
+def getOneEntity(syn, entity_id, downloadLocation, ifcollision, allFilesList):

  • ent = syn.get(entity_id, downloadLocation=downloadLocation, ifcollision=ifcollision)
  • allFilesList.append(ent) # lists are thread-safe

def syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFiles = None):
@@ -36,7 +43,11 @@ def syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFi
for f in entities:
print(f.path)
"""

  • if allFiles is None: allFiles = list()
  • global pool
  • wait_at_finish = False
  • if allFiles is None: # initial call
  •    allFiles = list()
    
  •    wait_at_finish = True
    
    id = id_of(entity)
    results = syn.chunkedQuery("select id, name, nodeType from entity where entity.parentId=='%s'" %id)
    for result in results:
    @@ -53,6 +64,12 @@ def syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFi
    new_path = None
    syncFromSynapse(syn, result['entity.id'], new_path, ifcollision, allFiles)
    else:
  •        ent = syn.get(result['entity.id'], downloadLocation = path, ifcollision = ifcollision)
    
  •        allFiles.append(ent)
    
  •        # use multi-threaded get function
    
  •        pool.submit(getOneEntity, syn, result['entity.id'], path, ifcollision, allFiles)
    
  •        # ent = syn.get(result['entity.id'], downloadLocation = path, ifcollision = ifcollision)
    
  •        # allFiles.append(ent)
    
  • if wait_at_finish:
  •    pool.shutdown(wait=True)  # wait till all objects were downloaded before returning
    

    return allFiles

    2.7.4

import error

Bug Report

Operating system

Ubuntu 14.04/18.04

Client version

Python 3.7.4
synapseclient 1.9.3

Description of the problem

>>> import synapseclient
ImportError: cannot import name 'csv' from 'backports' (/app/easybuild/software/Python/3.7.4-foss-2016b-fh1/lib/python3.7/site-packages/backports/__init__.py)

Why use backports with Python 3.x ?

Have an exclusion/inclusion list for syncFromSynapse (feature request)

Apologies if this is already possible, but I could not find it in the documentation.

When using syncFromSynapse() you can not exclude files from download. For example, I do not want to download the *.bam files. It would be great if there was a parameter for syncFromSynapse() with an exclude (or include) list of files.

Downloading files/folders from synapse with additional jamboree credentials fails

Bug Report

Operating system

MacOS Catalina version 10.15.7

Client version

Output of:

import synapseclient
synapseclient.__version__

'2.2.0'

Description of the problem

Downloading files/folders from synapse with additional jamboree credentials fails.

I am trying to download a folder from synapse, where if I were to do it manually I would click to download each file and then supply my jamboree access key & secret key. I was hoping to do this with the python client because there are a lot of files, but the python client never prompts me for the jamboree keys. Instead each file download silently fails, resulting in an empty list of files.

import synapseclient
import synapseutils 
 
syn = synapseclient.Synapse() 
syn.login('synapse_username','password') 
files = synapseutils.syncFromSynapse(syn, 'synID')

After running this I don't get any errors, but files is empty

Expected behavior

I expected the files in the folder associated with the synapse ID to be downloaded

Actual behavior

No error, but also no successful downloads.

>>> files
[]

SSLError when trying to create a Synapse instance

Dear all,

I've got the following issue since a couple of days when trying to create a synase client with:

import synapseclient
s = synapseclient.Synapse()

I've tried with the newest released package of synapseclient from Pypi.

Here is the error message.
Thanks a lot
Thomas


SSLError Traceback (most recent call last)
in ()
----> 1 s = synapseclient.Synapse(debug=True)

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/synapseclient/client.pyc in init(self, repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, debug, skip_checks)
149 raise
150
--> 151 self.setEndpoints(repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, skip_checks)
152
153 ## TODO: rename to defaultHeaders ?

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/synapseclient/client.pyc in setEndpoints(self, repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, skip_checks)
206 # Update endpoints if we get redirected
207 if not skip_checks:
--> 208 response = requests.get(endpoints[point], allow_redirects=False, headers=synapseclient.USER_AGENT)
209 if response.status_code == 301:
210 endpoints[point] = response.headers['location']

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/api.pyc in get(url, *_kwargs)
53
54 kwargs.setdefault('allow_redirects', True)
---> 55 return request('get', url, *_kwargs)
56
57

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/api.pyc in request(method, url, *_kwargs)
42
43 session = sessions.Session()
---> 44 return session.request(method=method, url=url, *_kwargs)
45
46

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
359 'allow_redirects': allow_redirects,
360 }
--> 361 resp = self.send(prep, **send_kwargs)
362
363 return resp

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, *_kwargs)
462 start = datetime.utcnow()
463 # Send the request
--> 464 r = adapter.send(request, *_kwargs)
465 # Total elapsed time of the request (approximately)
466 r.elapsed = datetime.utcnow() - start

/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
361 except (_SSLError, _HTTPError) as e:
362 if isinstance(e, _SSLError):
--> 363 raise SSLError(e)
364 elif isinstance(e, TimeoutError):
365 raise Timeout(e)

SSLError: [Errno 1] _ssl.c:504: error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown group

Re-uploads a file when there are no changes

Bug Report

Operating system

Ubuntu Linux 18.04

Client version

1.9.2

Description of the problem

If a file already exists in a Project it will be uploaded even when the file has not changed. If you upload a second time it works as expected (it doesn't upload the file again).

Repro. Steps:

  • Create a new Project.
  • Upload a file to the Project through the Synapse website.
  • Do not make any changes to the local or remote file...
  • Re-upload the file: synapse add --parentid syn123456 test_file.txt The file will be uploaded.

Expected behavior

  • The file will NOT be uploaded since it has not changed.

Actual behavior

  • The file is uploaded even though no change was made to it.

pip install dev does not work

sudo pip install git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop
Downloading/unpacking git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop
  Cloning https://github.com/Sage-Bionetworks/synapsePythonClient.git (to develop) to /tmp/pip-ouux3d-build
  Running setup.py (path:/tmp/pip-ouux3d-build/setup.py) egg_info for package from git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop

Requirement already satisfied (use --upgrade to upgrade): requests>=1.2 in /usr/lib/python2.7/dist-packages (from synapseclient==1.5.2.dev1)
Requirement already satisfied (use --upgrade to upgrade): six in /usr/lib/python2.7/dist-packages (from synapseclient==1.5.2.dev1)
Downloading/unpacking future (from synapseclient==1.5.2.dev1)
  Downloading future-0.15.2.tar.gz (1.6MB): 1.6MB downloaded
  Running setup.py (path:/tmp/pip_build_root/future/setup.py) egg_info for package future

    warning: no files found matching '*.au' under directory 'tests'
    warning: no files found matching '*.gif' under directory 'tests'
    warning: no files found matching '*.txt' under directory 'tests'
Downloading/unpacking backports.csv (from synapseclient==1.5.2.dev1)
  Downloading backports.csv-1.0.1-py2.py3-none-any.whl
Installing collected packages: future, backports.csv, synapseclient
  Running setup.py install for future

    warning: no files found matching '*.au' under directory 'tests'
    warning: no files found matching '*.gif' under directory 'tests'
    warning: no files found matching '*.txt' under directory 'tests'
    Installing pasteurize script to /usr/local/bin
    Installing futurize script to /usr/local/bin
  Running setup.py install for synapseclient

    Installing synapse script to /usr/local/bin
Successfully installed future backports.csv synapseclient
Cleaning up...

I suspect it is mainly because of this line : Running setup.py install for synapseclient. Even when we clone the develop, if you python setup.py install it will not install the dev branch. You must do python setup.py develop to install the dev branch.

Connecting to Synapse documentation

Client version

2.1.1

Description of the problem

Use of the api key is buried in the reference documentation. Recommend use of api key in Connecting to Synapse. Users will first arrive to this location before searching the reference documentation.

synapseutils.syncFromSynapse fails on empty folder

syncFromSynapse throws ValueError: The provided id: synMyFolderId is was neither a container nor a File when it hits an empty folder.

Folder Structure:

-Folder-1
  -Folder-2
  -some-file.txt

synapseutils.syncFromSynapse(syn, 'Folder-1-id')

No matrix.path

When I type:

print (matrix.path) returns me an error. When i search for the .path extension there is none in the package. Is the documentation incorrect?

cache.py:retrieve_local_file_info:'file' variable not defined.

In cache.py, function retrieve_local_file_info has if file not None as a part of the condition. file is a function is python2 and is not None, but it's removed in python3, and I guess the coder didn't mean to check if the function file exists here. Should it be removed from the if clause?

Make CACHE_ROOT_DIR customizable

On AWS Lambda you can only write to /tmp. We need a way to change the CACHE_ROOT_DIR.

My current workaround for this is:

synapseclient.cache.CACHE_ROOT_DIR = os.path.join(tempfile.gettempdir(), 'synapseCache')

from backports import csv

csv is available natively with Python 2 and 3.
Can you change: from backports import csv
too: import csv

[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import synapseclient
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/__init__.py", line 308, in <module>
    from .client import Synapse, login
  File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/client.py", line 86, in <module>
    from .table import Schema, Column, TableQueryResult, CsvFileTable
  File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/table.py", line 276, in <module>
    from backports import csv
ImportError: cannot import name 'csv'
>>> import csv
>>> from backports import csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'csv'
>>>
[GCC 5.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from backpots import csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named backpots
>>> import csv
>>>

SynapseFileCacheError on Ubuntu Server edition

Bug Report

Operating system

Ubuntu Server 18.04

Client version

1.9.1

Description of the problem

Something related to mime when attempting to store data on Synapse. Dependency issue on Ubuntu Server edition? On the desktop editions of Ubuntu 18.04 or Debian 9 the issue is absent using the same synapseclient version and same file to upload.

In [1] import synapseclient
In [2] syn = synapseclient.login()
In [3] f ="/path/to/file"
In [4] syn.store(synapseclient.File(f, parent = "syn17931318"))

##################################################
 Uploading file to Synapse storage
##################################################

---------------------------------------------------------------------------
SynapseFileCacheError                     Traceback (most recent call last)
<ipython-input-4-cbd9cbaa63f3> in <module>
----> 1 syn.store(synapseclient.File(f, parent = "syn17931318"))

~/.local/lib/python3.6/site-packages/synapseclient/client.py in store(self, obj, **kwargs)
    969                                                 md5=local_state_fh.get('contentMd5'),
    970                                                 file_size=local_state_fh.get('contentSize'),
--> 971                                                 mimetype=local_state_fh.get('contentType'))
    972                 properties['dataFileHandleId'] = fileHandle['id']
    973                 local_state['_file_handle'] = fileHandle

~/.local/lib/python3.6/site-packages/synapseclient/upload_functions.py in upload_file_handle(syn, parent_entity, path, synapseStore, md5, file_size, mimetype)
     65         syn.logger.info('\n' + '#' * 50 + '\n Uploading file to ' + storageString + ' storage \n' + '#' * 50 + '\n')
     66
---> 67         return upload_synapse_s3(syn, expanded_upload_path, location['storageLocationId'], mimetype=mimetype)
     68     # external file handle (sftp)
     69     elif upload_destination_type == concrete_types.EXTERNAL_UPLOAD_DESTINATION:

~/.local/lib/python3.6/site-packages/synapseclient/upload_functions.py in upload_synapse_s3(syn, file_path, storageLocationId, mimetype)
    125 def upload_synapse_s3(syn, file_path, storageLocationId=None, mimetype=None):
    126     file_handle_id = multipart_upload(syn, file_path, contentType=mimetype, storageLocationId=storageLocationId)
--> 127     syn.cache.add(file_handle_id, file_path)
    128
    129     return syn._getFileHandle(file_handle_id)

~/.local/lib/python3.6/site-packages/synapseclient/cache.py in add(self, file_handle_id, path)
    218
    219         cache_dir = self.get_cache_dir(file_handle_id)
--> 220         with Lock(self.cache_map_file_name, dir=cache_dir):
    221             cache_map = self._read_cache_map(cache_dir)
    222

~/.local/lib/python3.6/site-packages/synapseclient/lock.py in __enter__(self)
     97     # Make the lock object a Context Manager
     98     def __enter__(self):
---> 99         self.blocking_acquire()
    100
    101     def __exit__(self, exc_type, exc_value, traceback):

~/.local/lib/python3.6/site-packages/synapseclient/lock.py in blocking_acquire(self, timeout, break_old_locks)
     83         if not lock_acquired:
     84             raise SynapseFileCacheError("Could not obtain a lock on the file cache within timeout: %s  "
---> 85                                         "Please try again later" % str(timeout))
     86
     87     def release(self):

SynapseFileCacheError: Could not obtain a lock on the file cache within timeout: 0:01:10  Please try again later

KeyError when trying to download using Synapse Client 1.8.1

I am trying to download syn3157325 using files = synapseutils.syncFromSynapse(syn, 'syn3157325', path = 'ROSMAP/'). I get the error

Traceback (most recent call last):
  File "download_data.py", line 32, in <module>
    files = synapseutils.syncFromSynapse(syn, 'syn3157325', path = 'ROSMAP/')
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 104, in syncFromSynapse
    generateManifest(syn, allFiles, filename)
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 116, in generateManifest
    keys, data = _extract_file_entity_metadata(syn, allFiles)
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 135, in _extract_file_entity_metadata
    row.update(_get_file_entity_provenance_dict(syn, entity))
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 152, in _get_file_entity_provenance_dict
    'executed' : ';'.join(prov._getExecutedStringList()),
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseclient/activity.py", line 339, in _getExecutedStringList
    return self._getStringList(wasExecuted=True)
  File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseclient/activity.py", line 329, in _getStringList
    usedList.append(source['name'])

printing usedList:

{'wasExecuted': True, 'concreteType': 'org.sagebionetworks.repo.model.provenance.UsedURL', 'url': 'https://github.com/Sage-Bionetworks/ampAdScripts/blob/master/Broad-Rush/migrateROSMAPGenotypesFeb2015.R'}

I put a try/except around usedList.append(source['name']), as far as I double check it allowed me to download all the data correctly.

UnicodeDecodeError on special characters when storing file

Bug Report

Operating system

Ubuntu 18.04

Client version

1.9.2

Description of the problem

Throws exception when uploading a file where the file path contains special characters.

This is a blocking issue for us.

Repro Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import synapseclient

filename = "TestûTest.txt"

with open(filename, mode='w') as f:
    f.write('test text')

syn = synapseclient.Synapse()
syn.login()
syn.store(synapseclient.File(path=filename, parent="syn18521874"))

Expected behavior

Does not error. Uploads file.

Actual behavior

Throws exception. Does not upload file.

Traceback (most recent call last):
  File "./bug.py", line 14, in <module>
    syn.store(synapseclient.File(path=filename, parent="syn18521874"))
  File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/entity.py", line 578, in __init__
    kwargs['name'] = utils.guess_file_name(path)
  File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/utils.py", line 243, in guess_file_name
    tokens = [x for x in path.split('/') if x != '']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)

synapseutils.sync.syncFromSynapse throws error when syncing a Table object

There appears to be a bug in synapseutils.sync.syncFromSynapse. I am attempting to sync the 'Wondrous Research Example' (syn1901847) to my local filesystem. The syncFromSynapse function is throwing this error:

Traceback (most recent call last):
  ...
  File "import_synapse.py", line 55, in import_synapse_files
    synapseutils.sync.syncFromSynapse(synapse_client, syn_id, output_path)
  File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 82, in syncFromSynapse
    syncFromSynapse(syn, result['entity.id'], new_path, ifcollision, allFiles)
  File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 90, in syncFromSynapse
    generateManifest(syn, allFiles, filename)
  File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 107, in generateManifest
    row = {'parent': entity['parentId'], 'path': entity.path, 'name': entity.name,
  File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseclient/entity.py", line 362, in __getattr__
    raise AttributeError(key)
AttributeError: path

I added a print statement to the entity to determine which resource was causing this error, and this appears to be the culprit:

Schema: Synapse Table Demo (syn3079449)
  columns_to_store=None
properties:
  accessControlList=/repo/v1/entity/syn3079449/acl
  annotations=/repo/v1/entity/syn3079449/annotations
  columnIds=[u'36450', u'36451', u'36452', u'36453']
  concreteType=org.sagebionetworks.repo.model.table.TableEntity
  createdBy=273979
  createdOn=2015-01-09T20:49:19.646Z
  entityType=org.sagebionetworks.repo.model.table.TableEntity
  etag=b6d017c7-18a8-47e7-8bd7-497bf8b1a512
  id=syn3079449
  modifiedBy=273979
  modifiedOn=2015-01-09T20:49:19.646Z
  name=Synapse Table Demo
  parentId=syn1901847
  uri=/repo/v1/entity/syn3079449
  versionLabel=1
  versionNumber=1
  versionUrl=/repo/v1/entity/syn3079449/version/1
  versions=/repo/v1/entity/syn3079449/version
annotations:

Getting an empty Provenance

Does it make sense to throw an error when calling syn.getProvenance("syn1234567") if syn1234567 has no associated provenance?

i.e., the above call returns an error like:

SynapseHTTPError: 404 Client Error: Not Found
No activity

Whereas calling syn.getProvenance("syn7654321"), where syn7654321 does have associated provenance gives:

{u'createdBy': u'3342492',
 u'createdOn': u'2016-08-17T00:23:09.498Z',
 u'etag': u'3425b097-1016-4a67-934d-31258a42be2a',
 u'id': u'7123748',
 u'modifiedBy': u'3342492',
 u'modifiedOn': u'2016-08-17T00:23:09.498Z',
 u'used': [{u'concreteType': u'org.sagebionetworks.repo.model.provenance.UsedEntity',
   u'reference': {u'targetId': u'syn5406913', u'targetVersionNumber': 2},
   u'wasExecuted': False},
  {u'concreteType': u'org.sagebionetworks.repo.model.provenance.UsedURL',
   u'name': u'https://github.com/taoliu/MACS/',
   u'url': u'https://github.com/taoliu/MACS/',
   u'wasExecuted': True}]}

Which makes me expect a result more like this when calling syn.get("syn1234567"):

{u'createdBy': u'3342492',
 u'createdOn': u'2016-08-17T00:23:09.498Z',
 u'etag': u'3425b097-1016-4a67-934d-31258a42be2a',
 u'id': u'7123748',
 u'modifiedBy': u'3342492',
 u'modifiedOn': u'2016-08-17T00:23:09.498Z',
 u'used': []}

Though I'm guessing files uploaded without Provenance currently have no Provenance attached, rather than an empty provenance like I've tried to represent here.

Passing a pandas dataframe with a column called "read" breaks the type parsing in as_table_columns()

Bug Report

Operating system

MacOSX

Client version

2.3.1

Description of the problem

Symptom:
Passing a Pandas Dataframe with the column labeled "read" to as_table_columns() throws a 'TypeError' when calling _csv_to_pandas_df().

Bug:
The code tries to parse the value as a string instead of a Pandas DF in this code here:

    # filename of a csv file
    # in Python 3, we can check that the values is instanceof io.IOBase
    # for now, check if values has attr `read`
    if isinstance(values, str) or hasattr(values, "read"):   <----- hasattr(values, "read") is True!
        df = _csv_to_pandas_df(values)               <----- _csv_to_pandas_df() returns a TypeError
    # pandas DataFrame
    if isinstance(values, pd.DataFrame):
        df = values                                        <----- Should assign df here instead

Catching this in the debugger, I see that the input parameter values has the attr read and so the code tries to parse it as a string in _csv_to_pandas_df:

>>>values["read"]
0    
Name: read, dtype: object
>>>isinstance(values, str)
False
>>>hasattr(values, "read")
True

To Reproduce

Note, 'Table(schema, df)' calls as_table_columns() internally:

import pandas as pd
from synapseclient import Schema, Column, Table, Row, RowSet, as_table_columns, build_table, table

project = 'synXXXXXXXX'
df = pd.DataFrame([{'read': '0'}])
columns = []
for column in df.columns:
     columns.append(Column(name=column, columnType='STRING'))
schema = Schema('TEST_TABLE', columns, parent=project)
table = Table(schema, df)

Expected behavior

Users should be able to pass a pandas dataframe with a column called "read" to the function

Actual behavior

If you care to see the error:

  File "/Users/esurface/opt/miniconda2/envs/py3/lib/python3.9/site-packages/pandas/io/common.py", line 554, in get_handle
    if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
  File "/Users/esurface/opt/miniconda2/envs/py3/lib/python3.9/site-packages/pandas/io/common.py", line 859, in _is_binary_mode
    return isinstance(handle, binary_classes) or "b" in getattr(handle, "mode", mode)
TypeError: argument of type 'method' is not iterable

FileEntity 'path' property has wrong separator in Windows.

Bug Report

Operating system

Windows 10 Pro

Client version

1.9.2

Description of the problem

On Windows (win32) the path separator in the FileEntity is wrong.

Repro. Steps:

  • Upload a file and look at the returned object's path property.

Expected behavior

  • Path separator is \ and the character casing is correct.
    Correct path: C:\\Users\\John\\AppData\\Local\\Temp\\tmpi7kpbq0s\\data\\core\\core_file_ace2.csv

Actual behavior

  • Path separator is / and character casing is incorrect.
    Incorrect path: c:/users/john/appdata/local/temp/tmpi7kpbq0s/data/core/core_file_ace2.csv

Error when getting EntityViewSchema that does not exist

Synapse Client 1.8.2

This request fails when the table doesn't exist.

syn.get(EntityViewSchema(name='my_view', parent=my_project), downloadFile=False)

Error:

File "synapseclient/client.py", line 626, in get
    self._check_entity_restrictions(bundle['restrictionInformation'], entity, kwargs.get('downloadFile', True))
TypeError: 'NoneType' object has no attribute '__getitem__'

Synapse login not the same with `authToken` and `apiKey`

Bug Report

Operating system

MacOS Big Sur

Client version

Versions 2.2.2 and 2.3.1.

Description of the problem

When using the .synapseConfig file (with the apiKey attribute) as in (for example) synapseclient==2.2.2) the synapseclient.Synapse.login() method works perfectly. However, when using the .synapseConfig (with the authToken attribute) as in (for example), synapseclient==2.3.1), the login method doesn't work as expected.

A minimal reproducible example:

  • Install version 2.2.2 of the synapseclient
$ pip install synapseclient==2.2.2

$ python
>>> import synapseclient
>>> syn = synapseclient.Synapse(configPath='/Users/spatil/Desktop/schematic/.synapseConfig')
>>> syn.login(silent=True)
  • Repeat the above with version 2.3.1
  • Observe the differences in behaviour

Note: Make sure to you the right versions of the .synapseConfig file too.

Expected behavior

User should be logged in successfully.

Actual behavior

No output to console when testing with synapseclient==2.3.1 and using .synapseConfig file with authToken.

When downloading using Python 2.7 backport csv writer gives an UnicodeEncodeError

When trying to download syn3163039 with files = synapseutils.syncFromSynapse(syn, "syn3163039", path='syn3163039/") I get the below error. It works correctly when using Python 3.

Traceback (most recent call last):
File "download_data.py", line 12, in
files = synapseutils.syncFromSynapse(syn, "syn3163039", path='syn3163039/")
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 105, in syncFromSynapse
generateManifest(syn, allFiles, filename)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 145, in generateManifest
csvWriter.writerow(row)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/backports/csv.py", line 685, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/backports/csv.py", line 204, in writerow
return self.fileobj.write(line)

Command line download of tables

The ability to download tables into TSVs using the command line client would be very helpful.
Right now:

synapse get syn3156503

returns


WARNING: No files associated with entity syn3156503

Schema: RNA-Seq Metadata (syn3156503)
  columns_to_store=[]
properties:
  accessControlList=/repo/v1/entity/syn3156503/acl
  annotations=/repo/v1/entity/syn3156503/annotations
  columnIds=[u'4071', u'4192', u'4099', u'5449', u'4152', u'4077', u'4078', u'4079', u'35396', u'4073', u'35397', u'35399', u'35400', u'4124', u'4344', u'4225', u'4226', u'4158', u'4159', u'4227', u'4234', u'4228', u'4166', u'4229', u'4042', u'35398', u'4021', u'4023', u'4242', u'4243', u'4026', u'4233', u'4028', u'4043', u'4044', u'4030', u'4031', u'4032', u'4244', u'4034', u'4035', u'4036', u'4037', u'4045', u'4038', u'4046', u'4047', u'4039', u'4048', u'4245', u'4519', u'4521', u'4162', u'5518', u'4155', u'5515', u'5519', u'5520', u'4528', u'7673', u'7674', u'7705', u'7707']
  concreteType=org.sagebionetworks.repo.model.table.TableEntity
  createdBy=3323072
  createdOn=2015-01-28T18:46:07.159Z
  entityType=org.sagebionetworks.repo.model.table.TableEntity
  etag=aaa03a73-1847-4ec8-b8f0-80305c1adc7a
  id=syn3156503
  modifiedBy=3323072
  modifiedOn=2015-06-05T23:40:51.119Z
  name=RNA-Seq Metadata
  parentId=syn1773109
  uri=/repo/v1/entity/syn3156503
  versionLabel=10
  versionNumber=10
  versionUrl=/repo/v1/entity/syn3156503/version/10
  versions=/repo/v1/entity/syn3156503/version
annotations:


AttributeError: path

There is a download button on the page, but nothing for the command line tool.

deepcopy() of a synapseclient.Synapse object broken after upgrade to 2.2.2

Bug Report

Operating system

Ubuntu 20.04

Client version

2.2.2

Description of the problem

I was running client version 2.0.0. After upgrading to 2.2.2, I'm not able to deepcopy a Synapse object:

import copy
import synapseclient
syn = synapseclient.Synapse()
syn.login()
syn_copy = copy.deepcopy(syn)

Result:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.8/copy.py", line 270, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.8/copy.py", line 270, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/lib/python3.8/copy.py", line 264, in _reconstruct
    y = func(*args)
TypeError: __init__() missing 1 required positional argument: 'max_size'

Should I wipe/move my cache and try again? Figured I'd check before re-pulling 270GB.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.