sage-bionetworks / synapsepythonclient Goto Github PK
View Code? Open in Web Editor NEWProgrammatic interface to Synapse services for Python
Home Page: https://www.synapse.org
License: Apache License 2.0
Programmatic interface to Synapse services for Python
Home Page: https://www.synapse.org
License: Apache License 2.0
Any
2.4.0
synapseclient spawns too many computational threads.
Relevant lines of the code
synapseclient/client.py:from synapseclient.core.pool_provider import DEFAULT_NUM_THREADS
synapseclient/client.py: 'max_threads': DEFAULT_NUM_THREADS,
synapseclient/core/upload/multipart_upload.py: max_threads = pool_provider.DEFAULT_NUM_THREADS
synapseclient/core/pool_provider.py:DEFAULT_NUM_THREADS = multiprocessing.cpu_count() + 4
cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.
A synapseclient.Synapse attribute to set the number of threads and allowing the pool_provider to read an environment variable to set the number of threads would help with this issue.
cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.
Upgrading to synapseclient 1.6.1, I am now getting the following warning multiple times:
/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for file-prod.prod.sagebase.org has no `subjectAltName` , falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
Using python 2.7, I am able to get the example matrix of syn1901033, but when I use an actual SynapseID (syn5511449 in this case), I receive the error:
## retrieve a 100 by 4 matrix
matrix = syn.get('syn5511449')
## inspect its properties
print(matrix.name)
print(matrix.description)
print(matrix.path)
## load the data matrix into a dictionary with an entry for each column
with open(matrix.path, 'r') as f:
labels = f.readline().strip().split('\t')
data = {label: [] for label in labels}
for line in f:
values = [float(x) for x in line.strip().split('\t')]
for i in range(len(labels)):
data[labels[i]].append(values[i])
Walking Activity
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-43-88b0cdf2c38e> in <module>()
4 ## inspect its properties
5 print(matrix.name)
----> 6 print(matrix.description)
7 print(matrix.path)
8
/Users/ajenkins/anaconda/lib/python2.7/site-packages/synapseclient/entity.pyc in __getattr__(self, key)
366 ## about what exceptions it catches. In Python3, hasattr catches
367 ## only AttributeError
--> 368 raise AttributeError(key)
369
370
AttributeError: description
Is there a reason why when I use an actual SynapaseID, that I am not able to get a path?
As discussed in the RTI/Synapse call, we are seeing duplicated rows in data uploaded to the server while uploading batched data to Synapse. For each table, we load flattened JSON data into a pandas dataframe after processing every 100 records, the data gets saved to Synapse by calling store()
. The issue arrises after the initial upload of the table, when during the second upload rows are added and the schema changes.
To reproduce the issue, run the dup_test.py
file in our github repo: synapse-span-table
2.3.1
When I used "source("http://depot.sagebase.org/CRAN.R");pkgInstall("synapseClient")" to install "synapseClient" in my Rstudio, I encountered the attached problem on Timeout. Do you have other install methods? My system is macOS 10.14.6, and my Rstudio is 4.0.3 version.
As discussed in the RTI/Synapse call:
We are using Synapse tables for the storage and curation of data for a multi-site study. Our data lives in a document data store in JSON files. We process the data and flatten it into a data table structure for upload to Synapse. Most of the documents have many entries which create more than 152 columns. We wrote a python module that splits the data into 152 column sections and uploads the data to Synapse in columns with type STRING and 50 characters in length.
We are processing documents one-at-a-time as they are received in the document store database. Even when only one row is being uploaded, we see long delays in the API call (multiple seconds in most cases). With more than 120,000 to process, our upload strategy became untenable as the processing time reached almost a month.
To reproduce the issue, run the python3 test.py
in our synapse-span-table module.
Is there any improvement to the use of the API you suggest that will speed up the process?
We understand that Synapse is many used and optimized for uploading batched records, but have run into issues with that strategy as well (see: Issue 867)
I have a feature request related to my specific use case. I have a large synapse project where the files themselves are hosted on google drive. The files on synapse are direct link-outs. Unfortunately google drive caps direct download for files over 50MB, instead redirecting the user to download a link with a random confirmation code in the url's query string.
Therefore, the basic url request doesn't quite work for me. I need to stream the file extract the confirmation code, and make a second request, while retaining the cookie from the original request.
I can do all of this in a custom get method (see my gist for a derivative request.Session class) but I need a way of getting this object into a Synapse client object. See PR #713 for a simple example.
Example usage:
import synapseclient
import synapseutils
from gdrivesession import GDriveSession
session = GDriveSession()
syn = synapseclient.Synapse(session=session)
syn.login()
files = synapseutils.syncFromSynapse(syn, "syn20844101")
I understand that overwriting get methods for requests might expose the user to security issues or simple user error, but perhaps this is of general interest since there are many urls that are not simply open on the web. At the very least you might want some type/integrity checking on the session object.
Thank you for the consideration.
Hi, I've been using the above method to download mPower files for some time. Just upgraded to synclient 1.7.1 (from 1.6.2 ) and things broke. The method no longer returns a dict , returns string (of the path) instead . Also, if you specify a downloadLocation of "." as per the docs , fails with ' cannot find path "" '. if you leave downloadLocation out, it defaults to the cache, as you'd expect. both fairly minor , perhaps just doc update required ?
I'm trying to download a large file and I can't tell if it's going successfully or not. It would be great to get more diagnostic information from the Synapse client to confirm that the download has begun and, ideally, progress information as well.
ubuntu 18.04
4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Output of:
import synapseclient
synapseclient.__version__
'1.9.3'
I am trying to download a synapse storage, via synapseutils.syncFromSynapse, but the progress is very slow. The project contains many subfolders(~10k) with 3 files per folder. The download speed is not the problem, rather a REST API request which seems to be made per file in: Synapse::getProvenance.
This function is called in every recursive invocation of synapseutils.syncFromSynapse on all members of allFiles array. Where the allFiles array contains all previously processed files.
One REST-call amounts to t=~100-200ms per call, leading to a duration of (n * t)! for n files.
With ! denoting the factorial.
Faster download, do not repeat REST request for all files.
What actually happened? Provide output or error messages from the console, if applicable.
Is there some fast workaround?
Issue for use with weekly Code Review:
def printTransferProgress(transferred, toBeTransferred, prefix='', postfix='', isBytes=True, dt=None,
previouslyTransferred=0):
"""Prints a progress bar
:param transferred: a number of items/bytes completed
:param toBeTransferred: total number of items/bytes when completed
:param prefix: String printed before progress bar
:param postfix: String printed after progress bar
:param isBytes: A boolean indicating whether to convert bytes to kB, MB, GB etc.
:param dt: The time in seconds that has passed since transfer started is used to calculate rate
:param previouslyTransferred: the number of bytes that were already transferred before this transfer began
(e.g. someone ctrl+c'd out of an upload and restarted it later)
"""
if not sys.stdout.isatty():
return
barLength = 20 # Modify this to change the length of the progress bar
status = ''
rate = ''
if dt is not None and dt != 0:
rate = (transferred - previouslyTransferred)/float(dt)
rate = '(%s/s)' % humanizeBytes(rate) if isBytes else rate
if toBeTransferred < 0:
defaultToBeTransferred = (barLength*1*MB)
if transferred > defaultToBeTransferred:
progress = float(transferred % defaultToBeTransferred) / defaultToBeTransferred
else:
progress = float(transferred) / defaultToBeTransferred
elif toBeTransferred == 0: # There is nothing to be transferred
progress = 1
status = "Done...\n"
else:
progress = float(transferred) / toBeTransferred
if progress >= 1:
progress = 1
status = "Done...\n"
block = int(round(barLength*progress))
nbytes = humanizeBytes(transferred) if isBytes else transferred
if toBeTransferred > 0:
outOf = "/%s" % (humanizeBytes(toBeTransferred) if isBytes else toBeTransferred)
percentage = "%4.2f%%" % (progress*100)
else:
outOf = ""
percentage = ""
text = "\r%s [%s]%s %s%s %s %s %s " % (prefix,
"#"*block + "-"*(barLength-block),
percentage,
nbytes, outOf, rate,
postfix, status)
sys.stdout.write(text)
sys.stdout.flush()
I'm not sure how to list a folder. Am I missing something obvious? Seems like along with get/store, list is one of the most important file system actions.
I dug into the code and found _list(), but it's too complicated for me to understand.
macOS
2.1.0
Provide a description of the problem, and if possible a minimal reproducible example.
I would like to download one dataset to one directory, and a second dataset to a different directory, but these need to be downloaded using the syn.downloadTableColumns
function.
This function automatically downloads to the cache location, so this is not possible without updating the config file in between.
It would be great if the cache location could be set in the Synapse()
constructor directly.
Release notes have been posted in Synapse - why aren't they here, with the release?
Hello I am a college student that is interested in helping the project. I have found this project off the mozilla website where do I start to help improve the project itself?
On large projects with many files, the synapse get command is very slow.
One suggestion I can make to speeding it up is using multi-threading / multi-processing for making the calls to 'get' concurrent.
A simple patch that suggests one way to do this is pasted below (sorry, for some reason github wouldn't let me upload it, just save to a txt file and apply) -
From d6ae4c2 Mon Sep 17 00:00:00 2001
From: fidlr [email protected]
Date: Tue, 18 Oct 2016 10:35:51 +0300
Subject: [PATCH] multi-threaded get
synapseutils/sync.py | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/synapseutils/sync.py b/synapseutils/sync.py
index dfbfab8..de399a8 100644
--- a/synapseutils/sync.py
+++ b/synapseutils/sync.py
@@ -2,6 +2,13 @@ import errno
from synapseclient.entity import is_container
from synapseclient.utils import id_of
import os
+from concurrent.futures import ThreadPoolExecutor
+
+pool = ThreadPoolExecutor(max_workers=3) # Synapse allows up to 3 concurrent requests
+
+def getOneEntity(syn, entity_id, downloadLocation, ifcollision, allFilesList):
def syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFiles = None):
@@ -36,7 +43,11 @@ def syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFi
for f in entities:
print(f.path)
"""
allFiles = list()
wait_at_finish = True
ent = syn.get(result['entity.id'], downloadLocation = path, ifcollision = ifcollision)
allFiles.append(ent)
# use multi-threaded get function
pool.submit(getOneEntity, syn, result['entity.id'], path, ifcollision, allFiles)
# ent = syn.get(result['entity.id'], downloadLocation = path, ifcollision = ifcollision)
# allFiles.append(ent)
pool.shutdown(wait=True) # wait till all objects were downloaded before returning
Installing on ubuntu from source, doesn't generate the easy_path file. Then errors out when trying to find the synapseclient package.
Ubuntu 14.04/18.04
Python 3.7.4
synapseclient 1.9.3
>>> import synapseclient
ImportError: cannot import name 'csv' from 'backports' (/app/easybuild/software/Python/3.7.4-foss-2016b-fh1/lib/python3.7/site-packages/backports/__init__.py)
Why use backports with Python 3.x ?
Biomedical Science and Research
This issue was created by @shirishgoyal via Mozilla Science Lab Collaborate
Apologies if this is already possible, but I could not find it in the documentation.
When using syncFromSynapse() you can not exclude files from download. For example, I do not want to download the *.bam files. It would be great if there was a parameter for syncFromSynapse() with an exclude (or include) list of files.
MacOS Catalina version 10.15.7
Output of:
import synapseclient
synapseclient.__version__
'2.2.0'
Downloading files/folders from synapse with additional jamboree credentials fails.
I am trying to download a folder from synapse, where if I were to do it manually I would click to download each file and then supply my jamboree access key & secret key. I was hoping to do this with the python client because there are a lot of files, but the python client never prompts me for the jamboree keys. Instead each file download silently fails, resulting in an empty list of files.
import synapseclient
import synapseutils
syn = synapseclient.Synapse()
syn.login('synapse_username','password')
files = synapseutils.syncFromSynapse(syn, 'synID')
After running this I don't get any errors, but files is empty
I expected the files in the folder associated with the synapse ID to be downloaded
No error, but also no successful downloads.
>>> files
[]
Dear all,
I've got the following issue since a couple of days when trying to create a synase client with:
import synapseclient
s = synapseclient.Synapse()
I've tried with the newest released package of synapseclient from Pypi.
Here is the error message.
Thanks a lot
Thomas
SSLError Traceback (most recent call last)
in ()
----> 1 s = synapseclient.Synapse(debug=True)
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/synapseclient/client.pyc in init(self, repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, debug, skip_checks)
149 raise
150
--> 151 self.setEndpoints(repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, skip_checks)
152
153 ## TODO: rename to defaultHeaders ?
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/synapseclient/client.pyc in setEndpoints(self, repoEndpoint, authEndpoint, fileHandleEndpoint, portalEndpoint, skip_checks)
206 # Update endpoints if we get redirected
207 if not skip_checks:
--> 208 response = requests.get(endpoints[point], allow_redirects=False, headers=synapseclient.USER_AGENT)
209 if response.status_code == 301:
210 endpoints[point] = response.headers['location']
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/api.pyc in get(url, *_kwargs)
53
54 kwargs.setdefault('allow_redirects', True)
---> 55 return request('get', url, *_kwargs)
56
57
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/api.pyc in request(method, url, *_kwargs)
42
43 session = sessions.Session()
---> 44 return session.request(method=method, url=url, *_kwargs)
45
46
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
359 'allow_redirects': allow_redirects,
360 }
--> 361 resp = self.send(prep, **send_kwargs)
362
363 return resp
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, *_kwargs)
462 start = datetime.utcnow()
463 # Send the request
--> 464 r = adapter.send(request, *_kwargs)
465 # Total elapsed time of the request (approximately)
466 r.elapsed = datetime.utcnow() - start
/home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
361 except (_SSLError, _HTTPError) as e:
362 if isinstance(e, _SSLError):
--> 363 raise SSLError(e)
364 elif isinstance(e, TimeoutError):
365 raise Timeout(e)
SSLError: [Errno 1] _ssl.c:504: error:100AE081:elliptic curve routines:EC_GROUP_new_by_curve_name:unknown group
Ubuntu Linux 18.04
1.9.2
If a file already exists in a Project it will be uploaded even when the file has not changed. If you upload a second time it works as expected (it doesn't upload the file again).
Repro. Steps:
synapse add --parentid syn123456 test_file.txt
The file will be uploaded.sudo pip install git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop
Downloading/unpacking git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop
Cloning https://github.com/Sage-Bionetworks/synapsePythonClient.git (to develop) to /tmp/pip-ouux3d-build
Running setup.py (path:/tmp/pip-ouux3d-build/setup.py) egg_info for package from git+https://github.com/Sage-Bionetworks/synapsePythonClient.git@develop
Requirement already satisfied (use --upgrade to upgrade): requests>=1.2 in /usr/lib/python2.7/dist-packages (from synapseclient==1.5.2.dev1)
Requirement already satisfied (use --upgrade to upgrade): six in /usr/lib/python2.7/dist-packages (from synapseclient==1.5.2.dev1)
Downloading/unpacking future (from synapseclient==1.5.2.dev1)
Downloading future-0.15.2.tar.gz (1.6MB): 1.6MB downloaded
Running setup.py (path:/tmp/pip_build_root/future/setup.py) egg_info for package future
warning: no files found matching '*.au' under directory 'tests'
warning: no files found matching '*.gif' under directory 'tests'
warning: no files found matching '*.txt' under directory 'tests'
Downloading/unpacking backports.csv (from synapseclient==1.5.2.dev1)
Downloading backports.csv-1.0.1-py2.py3-none-any.whl
Installing collected packages: future, backports.csv, synapseclient
Running setup.py install for future
warning: no files found matching '*.au' under directory 'tests'
warning: no files found matching '*.gif' under directory 'tests'
warning: no files found matching '*.txt' under directory 'tests'
Installing pasteurize script to /usr/local/bin
Installing futurize script to /usr/local/bin
Running setup.py install for synapseclient
Installing synapse script to /usr/local/bin
Successfully installed future backports.csv synapseclient
Cleaning up...
I suspect it is mainly because of this line : Running setup.py install for synapseclient
. Even when we clone the develop, if you python setup.py install
it will not install the dev branch. You must do python setup.py develop
to install the dev branch.
2.1.1
Use of the api key is buried in the reference documentation. Recommend use of api key in Connecting to Synapse. Users will first arrive to this location before searching the reference documentation.
Issue for use with weekly Code Review:
def humanizeBytes(bytes):
bytes = float(bytes)
units = ['bytes', 'kB', 'MB', 'GB', 'TB', 'PB', 'EB']
for i, unit in enumerate(units):
if bytes < 1024:
return '%3.1f%s' % (bytes, units[i])
else:
bytes /= 1024
return 'Oops larger than Exabytes'
syncFromSynapse
throws ValueError: The provided id: synMyFolderId is was neither a container nor a File
when it hits an empty folder.
Folder Structure:
-Folder-1
-Folder-2
-some-file.txt
synapseutils.syncFromSynapse(syn, 'Folder-1-id')
When I type:
print (matrix.path)
returns me an error. When i search for the .path extension there is none in the package. Is the documentation incorrect?
In cache.py
, function retrieve_local_file_info
has if file not None
as a part of the condition. file
is a function is python2 and is not None, but it's removed in python3, and I guess the coder didn't mean to check if the function file
exists here. Should it be removed from the if
clause?
On AWS Lambda you can only write to /tmp
. We need a way to change the CACHE_ROOT_DIR
.
My current workaround for this is:
synapseclient.cache.CACHE_ROOT_DIR = os.path.join(tempfile.gettempdir(), 'synapseCache')
csv is available natively with Python 2 and 3.
Can you change: from backports import csv
too: import csv
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import synapseclient
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/__init__.py", line 308, in <module>
from .client import Synapse, login
File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/client.py", line 86, in <module>
from .table import Schema, Column, TableQueryResult, CsvFileTable
File "/app/easybuild/software/Python/3.6.4-foss-2016b-fh1/lib/python3.6/site-packages/synapseclient-1.7.3-py3.6.egg/synapseclient/table.py", line 276, in <module>
from backports import csv
ImportError: cannot import name 'csv'
>>> import csv
>>> from backports import csv
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'csv'
>>>
[GCC 5.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from backpots import csv
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named backpots
>>> import csv
>>>
Installation fails using python 3.8 due to check in setup.py
Ubuntu Server 18.04
1.9.1
Something related to mime when attempting to store data on Synapse. Dependency issue on Ubuntu Server edition? On the desktop editions of Ubuntu 18.04 or Debian 9 the issue is absent using the same synapseclient version and same file to upload.
In [1] import synapseclient
In [2] syn = synapseclient.login()
In [3] f ="/path/to/file"
In [4] syn.store(synapseclient.File(f, parent = "syn17931318"))
##################################################
Uploading file to Synapse storage
##################################################
---------------------------------------------------------------------------
SynapseFileCacheError Traceback (most recent call last)
<ipython-input-4-cbd9cbaa63f3> in <module>
----> 1 syn.store(synapseclient.File(f, parent = "syn17931318"))
~/.local/lib/python3.6/site-packages/synapseclient/client.py in store(self, obj, **kwargs)
969 md5=local_state_fh.get('contentMd5'),
970 file_size=local_state_fh.get('contentSize'),
--> 971 mimetype=local_state_fh.get('contentType'))
972 properties['dataFileHandleId'] = fileHandle['id']
973 local_state['_file_handle'] = fileHandle
~/.local/lib/python3.6/site-packages/synapseclient/upload_functions.py in upload_file_handle(syn, parent_entity, path, synapseStore, md5, file_size, mimetype)
65 syn.logger.info('\n' + '#' * 50 + '\n Uploading file to ' + storageString + ' storage \n' + '#' * 50 + '\n')
66
---> 67 return upload_synapse_s3(syn, expanded_upload_path, location['storageLocationId'], mimetype=mimetype)
68 # external file handle (sftp)
69 elif upload_destination_type == concrete_types.EXTERNAL_UPLOAD_DESTINATION:
~/.local/lib/python3.6/site-packages/synapseclient/upload_functions.py in upload_synapse_s3(syn, file_path, storageLocationId, mimetype)
125 def upload_synapse_s3(syn, file_path, storageLocationId=None, mimetype=None):
126 file_handle_id = multipart_upload(syn, file_path, contentType=mimetype, storageLocationId=storageLocationId)
--> 127 syn.cache.add(file_handle_id, file_path)
128
129 return syn._getFileHandle(file_handle_id)
~/.local/lib/python3.6/site-packages/synapseclient/cache.py in add(self, file_handle_id, path)
218
219 cache_dir = self.get_cache_dir(file_handle_id)
--> 220 with Lock(self.cache_map_file_name, dir=cache_dir):
221 cache_map = self._read_cache_map(cache_dir)
222
~/.local/lib/python3.6/site-packages/synapseclient/lock.py in __enter__(self)
97 # Make the lock object a Context Manager
98 def __enter__(self):
---> 99 self.blocking_acquire()
100
101 def __exit__(self, exc_type, exc_value, traceback):
~/.local/lib/python3.6/site-packages/synapseclient/lock.py in blocking_acquire(self, timeout, break_old_locks)
83 if not lock_acquired:
84 raise SynapseFileCacheError("Could not obtain a lock on the file cache within timeout: %s "
---> 85 "Please try again later" % str(timeout))
86
87 def release(self):
SynapseFileCacheError: Could not obtain a lock on the file cache within timeout: 0:01:10 Please try again later
I am trying to download syn3157325 using files = synapseutils.syncFromSynapse(syn, 'syn3157325', path = 'ROSMAP/')
. I get the error
Traceback (most recent call last):
File "download_data.py", line 32, in <module>
files = synapseutils.syncFromSynapse(syn, 'syn3157325', path = 'ROSMAP/')
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 104, in syncFromSynapse
generateManifest(syn, allFiles, filename)
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 116, in generateManifest
keys, data = _extract_file_entity_metadata(syn, allFiles)
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 135, in _extract_file_entity_metadata
row.update(_get_file_entity_provenance_dict(syn, entity))
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseutils/sync.py", line 152, in _get_file_entity_provenance_dict
'executed' : ';'.join(prov._getExecutedStringList()),
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseclient/activity.py", line 339, in _getExecutedStringList
return self._getStringList(wasExecuted=True)
File "/apps/software/Python/3.6.3-foss-2015b/lib/python3.6/site-packages/synapseclient/activity.py", line 329, in _getStringList
usedList.append(source['name'])
printing usedList:
{'wasExecuted': True, 'concreteType': 'org.sagebionetworks.repo.model.provenance.UsedURL', 'url': 'https://github.com/Sage-Bionetworks/ampAdScripts/blob/master/Broad-Rush/migrateROSMAPGenotypesFeb2015.R'}
I put a try/except around usedList.append(source['name'])
, as far as I double check it allowed me to download all the data correctly.
Ubuntu 18.04
1.9.2
Throws exception when uploading a file where the file path contains special characters.
This is a blocking issue for us.
Repro Script:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import synapseclient
filename = "TestûTest.txt"
with open(filename, mode='w') as f:
f.write('test text')
syn = synapseclient.Synapse()
syn.login()
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
Does not error. Uploads file.
Throws exception. Does not upload file.
Traceback (most recent call last):
File "./bug.py", line 14, in <module>
syn.store(synapseclient.File(path=filename, parent="syn18521874"))
File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/entity.py", line 578, in __init__
kwargs['name'] = utils.guess_file_name(path)
File "/home/user/source/.venv/local/lib/python2.7/site-packages/synapseclient/utils.py", line 243, in guess_file_name
tokens = [x for x in path.split('/') if x != '']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)
I am postgraduate bioinformatics student interested in exploring horizons of neuroinformatics.
This issue was created by @SathishKumarNarayanan via Mozilla Science Lab Collaborate
There appears to be a bug in synapseutils.sync.syncFromSynapse. I am attempting to sync the 'Wondrous Research Example' (syn1901847) to my local filesystem. The syncFromSynapse function is throwing this error:
Traceback (most recent call last):
...
File "import_synapse.py", line 55, in import_synapse_files
synapseutils.sync.syncFromSynapse(synapse_client, syn_id, output_path)
File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 82, in syncFromSynapse
syncFromSynapse(syn, result['entity.id'], new_path, ifcollision, allFiles)
File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 90, in syncFromSynapse
generateManifest(syn, allFiles, filename)
File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseutils/sync.py", line 107, in generateManifest
row = {'parent': entity['parentId'], 'path': entity.path, 'name': entity.name,
File "~/anaconda/envs/py27/lib/python2.7/site-packages/synapseclient/entity.py", line 362, in __getattr__
raise AttributeError(key)
AttributeError: path
I added a print statement to the entity to determine which resource was causing this error, and this appears to be the culprit:
Schema: Synapse Table Demo (syn3079449)
columns_to_store=None
properties:
accessControlList=/repo/v1/entity/syn3079449/acl
annotations=/repo/v1/entity/syn3079449/annotations
columnIds=[u'36450', u'36451', u'36452', u'36453']
concreteType=org.sagebionetworks.repo.model.table.TableEntity
createdBy=273979
createdOn=2015-01-09T20:49:19.646Z
entityType=org.sagebionetworks.repo.model.table.TableEntity
etag=b6d017c7-18a8-47e7-8bd7-497bf8b1a512
id=syn3079449
modifiedBy=273979
modifiedOn=2015-01-09T20:49:19.646Z
name=Synapse Table Demo
parentId=syn1901847
uri=/repo/v1/entity/syn3079449
versionLabel=1
versionNumber=1
versionUrl=/repo/v1/entity/syn3079449/version/1
versions=/repo/v1/entity/syn3079449/version
annotations:
Does it make sense to throw an error when calling syn.getProvenance("syn1234567") if syn1234567 has no associated provenance?
i.e., the above call returns an error like:
SynapseHTTPError: 404 Client Error: Not Found
No activity
Whereas calling syn.getProvenance("syn7654321"), where syn7654321 does have associated provenance gives:
{u'createdBy': u'3342492',
u'createdOn': u'2016-08-17T00:23:09.498Z',
u'etag': u'3425b097-1016-4a67-934d-31258a42be2a',
u'id': u'7123748',
u'modifiedBy': u'3342492',
u'modifiedOn': u'2016-08-17T00:23:09.498Z',
u'used': [{u'concreteType': u'org.sagebionetworks.repo.model.provenance.UsedEntity',
u'reference': {u'targetId': u'syn5406913', u'targetVersionNumber': 2},
u'wasExecuted': False},
{u'concreteType': u'org.sagebionetworks.repo.model.provenance.UsedURL',
u'name': u'https://github.com/taoliu/MACS/',
u'url': u'https://github.com/taoliu/MACS/',
u'wasExecuted': True}]}
Which makes me expect a result more like this when calling syn.get("syn1234567"):
{u'createdBy': u'3342492',
u'createdOn': u'2016-08-17T00:23:09.498Z',
u'etag': u'3425b097-1016-4a67-934d-31258a42be2a',
u'id': u'7123748',
u'modifiedBy': u'3342492',
u'modifiedOn': u'2016-08-17T00:23:09.498Z',
u'used': []}
Though I'm guessing files uploaded without Provenance currently have no Provenance attached, rather than an empty provenance like I've tried to represent here.
MacOSX
2.3.1
Symptom:
Passing a Pandas Dataframe with the column labeled "read" to as_table_columns() throws a 'TypeError' when calling _csv_to_pandas_df()
.
Bug:
The code tries to parse the value as a string instead of a Pandas DF in this code here:
# filename of a csv file
# in Python 3, we can check that the values is instanceof io.IOBase
# for now, check if values has attr `read`
if isinstance(values, str) or hasattr(values, "read"): <----- hasattr(values, "read") is True!
df = _csv_to_pandas_df(values) <----- _csv_to_pandas_df() returns a TypeError
# pandas DataFrame
if isinstance(values, pd.DataFrame):
df = values <----- Should assign df here instead
Catching this in the debugger, I see that the input parameter values
has the attr read
and so the code tries to parse it as a string in _csv_to_pandas_df:
>>>values["read"]
0
Name: read, dtype: object
>>>isinstance(values, str)
False
>>>hasattr(values, "read")
True
Note, 'Table(schema, df)' calls as_table_columns() internally:
import pandas as pd
from synapseclient import Schema, Column, Table, Row, RowSet, as_table_columns, build_table, table
project = 'synXXXXXXXX'
df = pd.DataFrame([{'read': '0'}])
columns = []
for column in df.columns:
columns.append(Column(name=column, columnType='STRING'))
schema = Schema('TEST_TABLE', columns, parent=project)
table = Table(schema, df)
Users should be able to pass a pandas dataframe with a column called "read" to the function
If you care to see the error:
File "/Users/esurface/opt/miniconda2/envs/py3/lib/python3.9/site-packages/pandas/io/common.py", line 554, in get_handle
if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
File "/Users/esurface/opt/miniconda2/envs/py3/lib/python3.9/site-packages/pandas/io/common.py", line 859, in _is_binary_mode
return isinstance(handle, binary_classes) or "b" in getattr(handle, "mode", mode)
TypeError: argument of type 'method' is not iterable
In Python3.7 os.errno
is removed. To make synapseclient library compatible,
add import errno
to client.py
and change line https://github.com/Sage-Bionetworks/synapsePythonClient/blob/master/synapseclient/client.py#L1856 to
if exception.errno != errno.EEXIST:
Windows 10 Pro
1.9.2
On Windows (win32) the path separator in the FileEntity is wrong.
Repro. Steps:
path
property.\
and the character casing is correct.C:\\Users\\John\\AppData\\Local\\Temp\\tmpi7kpbq0s\\data\\core\\core_file_ace2.csv
/
and character casing is incorrect.c:/users/john/appdata/local/temp/tmpi7kpbq0s/data/core/core_file_ace2.csv
synapsePythonClient/synapseclient/client.py
Line 1934 in 8573429
This is the line in Multipart upload:
Synapse Client 1.8.2
This request fails when the table doesn't exist.
syn.get(EntityViewSchema(name='my_view', parent=my_project), downloadFile=False)
Error:
File "synapseclient/client.py", line 626, in get
self._check_entity_restrictions(bundle['restrictionInformation'], entity, kwargs.get('downloadFile', True))
TypeError: 'NoneType' object has no attribute '__getitem__'
MacOS Big Sur
Versions 2.2.2
and 2.3.1
.
When using the .synapseConfig
file (with the apiKey
attribute) as in (for example) synapseclient==2.2.2
) the synapseclient.Synapse.login()
method works perfectly. However, when using the .synapseConfig
(with the authToken
attribute) as in (for example), synapseclient==2.3.1
), the login
method doesn't work as expected.
A minimal reproducible example:
2.2.2
of the synapseclient
$ pip install synapseclient==2.2.2
$ python
>>> import synapseclient
>>> syn = synapseclient.Synapse(configPath='/Users/spatil/Desktop/schematic/.synapseConfig')
>>> syn.login(silent=True)
2.3.1
Note: Make sure to you the right versions of the .synapseConfig
file too.
User should be logged in successfully.
No output to console when testing with synapseclient==2.3.1
and using .synapseConfig
file with authToken
.
I am postgraduate bioinformatics student interested in exploring horizons of neuroinformatics.
This issue was created by @SathishKumarNarayanan via Mozilla Science Lab Collaborate
1.7.4 is failing to install into my CI environment, however 1.7.3 works fine. It looks related to 923c141#diff-2eeaed663bd0d25b7e608891384b7298
$ env/bin/pip install synapseclient==1.7.4
<snip>
running install_data
error: can't copy '.synapseConfig': doesn't exist or not a regular file
When trying to download syn3163039 with files = synapseutils.syncFromSynapse(syn, "syn3163039", path='syn3163039/")
I get the below error. It works correctly when using Python 3.
Traceback (most recent call last):
File "download_data.py", line 12, in
files = synapseutils.syncFromSynapse(syn, "syn3163039", path='syn3163039/")
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 85, in syncFromSynapse
syncFromSynapse(syn, result['id'], new_path, ifcollision, allFiles, followLink=followLink)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 105, in syncFromSynapse
generateManifest(syn, allFiles, filename)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/synapseutils/sync.py", line 145, in generateManifest
csvWriter.writerow(row)
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/backports/csv.py", line 685, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/apps/software/Python/2.7.11-foss-2015b/lib/python2.7/site-packages/backports/csv.py", line 204, in writerow
return self.fileobj.write(line)
The ability to download tables into TSVs using the command line client would be very helpful.
Right now:
synapse get syn3156503
returns
WARNING: No files associated with entity syn3156503
Schema: RNA-Seq Metadata (syn3156503)
columns_to_store=[]
properties:
accessControlList=/repo/v1/entity/syn3156503/acl
annotations=/repo/v1/entity/syn3156503/annotations
columnIds=[u'4071', u'4192', u'4099', u'5449', u'4152', u'4077', u'4078', u'4079', u'35396', u'4073', u'35397', u'35399', u'35400', u'4124', u'4344', u'4225', u'4226', u'4158', u'4159', u'4227', u'4234', u'4228', u'4166', u'4229', u'4042', u'35398', u'4021', u'4023', u'4242', u'4243', u'4026', u'4233', u'4028', u'4043', u'4044', u'4030', u'4031', u'4032', u'4244', u'4034', u'4035', u'4036', u'4037', u'4045', u'4038', u'4046', u'4047', u'4039', u'4048', u'4245', u'4519', u'4521', u'4162', u'5518', u'4155', u'5515', u'5519', u'5520', u'4528', u'7673', u'7674', u'7705', u'7707']
concreteType=org.sagebionetworks.repo.model.table.TableEntity
createdBy=3323072
createdOn=2015-01-28T18:46:07.159Z
entityType=org.sagebionetworks.repo.model.table.TableEntity
etag=aaa03a73-1847-4ec8-b8f0-80305c1adc7a
id=syn3156503
modifiedBy=3323072
modifiedOn=2015-06-05T23:40:51.119Z
name=RNA-Seq Metadata
parentId=syn1773109
uri=/repo/v1/entity/syn3156503
versionLabel=10
versionNumber=10
versionUrl=/repo/v1/entity/syn3156503/version/10
versions=/repo/v1/entity/syn3156503/version
annotations:
AttributeError: path
There is a download button on the page, but nothing for the command line tool.
Ubuntu 20.04
2.2.2
I was running client version 2.0.0. After upgrading to 2.2.2, I'm not able to deepcopy
a Synapse object:
import copy
import synapseclient
syn = synapseclient.Synapse()
syn.login()
syn_copy = copy.deepcopy(syn)
Result:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.8/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.8/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.8/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.8/copy.py", line 264, in _reconstruct
y = func(*args)
TypeError: __init__() missing 1 required positional argument: 'max_size'
Should I wipe/move my cache and try again? Figured I'd check before re-pulling 270GB.
When doing recursive download especially, would be good to have the Synapse ID shown (had one fail and was a pain to track it down and see why).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.