Giter Site home page Giter Site logo

mano's Introduction

Mano

Build Status

Mano is a simple Python library that lets you write applications that interact with the Beiwe Research Platform. You can request lists of studies, users, device settings, download files (with or without encryption) and more! (actually, not much more)

Table of contents

  1. Requirements
  2. Mac OS X Notes
  3. Installation
  4. Initial setup
  5. API for keyring access
  6. API for accessing study information
  7. API for downloading data

Requirements

This software works with Python 2.6+ and 3 and has been tested on various flavors of macOS, Linux, and Linux Subsystem on Windows 10.

macOS SSL note

I've encountered old versions of OpenSSL on some macOS distrubitions that cause issues interacting with Beiwe over HTTPS. The simplest solution I found was to install one of the Miniconda Python distributions which bundles a more recent version of OpenSSL (download link).

Installation

The simplest way to install mano is with pip

pip install mano

Initial setup

To interact with Beiwe and download files you will need your Beiwe Research Platform url, username, password, access key, and secret key in a JSON file. Don't worry, we're going to eventually encrypt this file

{
    "beiwe.onnela": {
        "URL": "...",
        "USERNAME": "...",
        "PASSWORD": "...",
        "ACCESS_KEY": "...",
        "SECRET_KEY": "..."
    }
}

Note that you can also use environment variables BEIWE_URL, BEIWE_USERNAME, BEIWE_PASSWORD, BEIWE_ACCESS_KEY, and BEIWE_SECRET_KEY to store these variables and load your keyring using mano.keyring(None). You won't be able to use an environment variable for storing study-specific secrets (described next). But depending on your situation you may not even need study-specific secrets.

If you intend to use mano to encrypt certain downloaded data stream files at rest, you will want to add study-specific passphrases (which you're responsible for generating) to a special SECRETS section

{
    "beiwe.onnela": {
        "URL": "...",
        "USERNAME": "...",
        "PASSWORD": "...",
        "ACCESS_KEY": "...",
        "SECRET_KEY": "...",
        "SECRETS": {
            "FAS Buckner": "...",
        }
    }
}

I'm guessing that you don't want this file sitting around in plain text, so for now this entire JSON blob must be passphrase protected using the crypt.py utility from the cryptease library which should be automatically installed along with the mano package

$ crypt.py --encrypt ~/.nrg-keyring.json --output-file ~/.nrg-keyring.enc

I'll leave it up to the reader to decide where to produce the encrypted version of this file, but I would highly recommend discarding the unencrypted version.

API for keyring access

Before making any API calls, you need to read in your keyring file. The first parameter should be the name of the keyring section as shown above

import mano

Keyring = mano.keyring('beiwe.onnela')

You can pass keyring passphrase as an argument to this function, or it will look for your keyring passphrase within a special NRG_KEYRING_PASS environment variable, or it will fallback on prompting you for the passphrase. This last strategy could cause non-interactive invocations to hang, so watch out.

API for accessing study information

With your Keyring loaded, you can now access information about your studies, users (a.k.a. participants or subjects), and device settings using simple functions defined within the mano module

for study in mano.studies(Keyring):
    print(study)

_, study_id = study # get the last printed study id

for user_id in mano.users(Keyring, study_id):
    print(user_id)

for setting in mano.device_settings(Keyring, study_id):
    print(setting)

API for downloading data

With your Keyring loaded, you can also download data from your Beiwe server and extract it to your filesystem using the mano.sync module. And while we're at it, let's turn on more verbose logging so we can actually see what's happening

import logging
import mano.sync as msync

logging.basicConfig(level=logging.INFO)

output_folder = '/tmp/beiwe-data'

zf = msync.download(Keyring, study_id, user_id, data_streams=['identifiers'])

zf.extractall(output_folder)

Notice that I passed data_streams=['identifiers'] to msync.download. By default, that function will request all data for all data streams if you omit that parameter. Check out the backfill section for more information.

The msync.download function will hand back a standard Python zipfile.ZipFile object which you can extract to the filesystem as shown above. Easy.

encrypt files at rest

You can also pass the ZipFile object to msync.save if you wish to encrypt data stream files at rest

lock_streams = ['gps', 'audio_recordings']

zf = msync.download(Keyring, study_id, user_id)

passphrase = Keyring['SECRETS']['FAS Buckner']

msync.save(Keyring, zf, user_id, output_folder, lock=lock_streams, passphrase=passphrase)

backfill

By default msync.download will attempt to download all of the data for the specified user_id which could end up being prohibitively large either for you or the Beiwe server. For this reason, the msync.download function exposes parameters for data_streams, time_start, and time_end. Using these parameters you can download only certain data streams between certain start and end times

data_streams = ['accel', 'ios_log', 'gps']

time_start = '2015-10-01T00:00:00'

time_end = '2015-12-01T00:00:00'

zf = msync.download(Keyring, study_id, user_id, data_streams=data_streams, time_start=time_start, time_end=time_end)

zf.extractall(output_folder)

Eventually you may find yourself day-dreaming about a backfill function that will slide a window from some aribitrary starting point to the present time in order to download all of your data in more digestible chunnks. You'll be happy to know that the mano.sync module already exposes a function for this

start_date = '2015-01-01T00:00:00'

msync.backfill(Keyring, study_id, user_id, output_folder, start_date=start_date, lock=lock_streams, passphrase=passphrase)

Note that if you don't pass anything for the lock argument, you will not need passphrase either.

mano's People

Contributors

tokeefe avatar biblicabeebli avatar zen-slug avatar

Stargazers

Zachary Clement avatar Nelson Roque avatar C.M. Bosma avatar

Watchers

James Cloos avatar  avatar  avatar  avatar

mano's Issues

Request for access, possibly shifting code to Onnela Lab

@tokeefe (This is Eli) I'm going to put together some cleanup commits and would like to have admin-or-whatever privilages on this repository so I can make a new branch?

Onnela Lab is interested in taking over maintenance and development of Mano, including the ability to publish it on pypi. It is the primary codebase many researchers are using, including us internally.

I'll reach out over email too.

download() calling non-existent URL

I am attempting to use the Mano library to download data from an independent study using the Beiwe platform.

When using the download() function, it attempts to connect a URL that does not exist. The URL associated with my study is https://mainemood.com (which is specified in the JSON file), and the function tries to access a URL named "mainemood.com/get-data/v1. Because it cannot find the URL, the function returns 'None'. Below are the lines (130-143) in the 'init.py' file where I think the issues lies:

try:
        Keyring['URL'] = os.environ['BEIWE_URL']
        Keyring['USERNAME'] = os.environ['BEIWE_USERNAME']
        Keyring['PASSWORD'] = os.environ['BEIWE_PASSWORD']
        Keyring['ACCESS_KEY'] = os.environ['BEIWE_ACCESS_KEY']
        Keyring['SECRET_KEY'] = os.environ['BEIWE_SECRET_KEY']
    except KeyError as e:
        raise KeyringError('environment variable not found: {0}'.format(e))
    return Keyring

class KeyringError(Exception):
    pass

def expand_study_id(Keyring, segment):
    '''
    Expand a Study ID segment to the full Study ID

Here is the Python code I used (edited for privacy):

# Assign and decrypt keyring from JSON
import mano

Keyring = mano.keyring('beiwe.onnela',
 keyring_file='/Users/name/github_projects/nrg-keyring.enc',
 passphrase='password')

# Download data
import logging
import mano.sync as msync

logging.basicConfig(level=logging.INFO)

output_folder = '/Users/name/tmp/Beiwe_Data'

study_id = 'EVER_Study'
user_id = 'xxxxxxxx'
zf = msync.download(Keyring, study_id, user_id, data_streams=['identifiers'])
print(zf)
# Prints 'None'

zf.extractall(output_folder)
# No output

Please let me know if you require additional information to troubleshoot this issue. Thank you.

Possible Security Flaw in mano Package

Hello,

I am helping a friend get your platform set up, so was going through the code. I noticed in the mano/sync/__init__.py in the download() function on lines 130 - 141 you send all the information required to download patient data in clear text (i.e. all the POST data).

It seems this could lead to security breach if a malicious actor is able to sniff the packets containing this information. If I remember correctly you require SSH protection for sites using your platform, but you may also want to consider changing this, if possible, as an added layer of protection.

backfill FileNotFoundError .backfill

I get FileNotFoundError error from backfill function:

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/sk/c14blb7s493_7lgzdq0m3sjm0000gn/T/tmpt163bi70/<user ID removed for security>/.backfill'

I do not think I am supposed to have the .../.backfill file created before running the backfill function, hence I think it might be a bug.

Below, I provide an example that is reproducible except (a) the Keyring setup is my local-machine specific; (b) I use a Beiwe test study (https://staging.beiwe.org/view_study/68) whose study ID has been determined as appropriate to share in GitHub issue code; as with all Beiwe studies, access authorization is needed to download these data.

import mano
import sys  
import os
import mano.sync as msync
import tempfile
from datetime import date

# set up the keyring
sys.path.insert(0, '/Users/martakaras/Documents/data_beiwe_settings')
import keyring_studies_MK_staging
Keyring = mano.keyring(None)

# define study ID and user ID
study_id = 'QrDrgyGFyH6CmTEOCCnZVw1o'
user_id = list(mano.users(Keyring, study_id))[5]

Show I can download, say, gps data using "traditional" mano approach (here, 51 gps files are downloaded)

# set up temporary dir to download the data to
temp_dir_1 = tempfile.TemporaryDirectory()
output_folder = temp_dir_1.name

# download and extract gps data with msync.download
zf = msync.download(Keyring, study_id, user_id, data_streams = ['gps'])
zf.extractall(output_folder)

# see there is data 
print(len(os.listdir(os.path.join(output_folder, user_id, "gps"))))
51 
# remove the temp directory
temp_dir_1.cleanup()

Show error when an attempt to download data using backfill function

# set up temporary dir to download the data to
temp_dir_1 = tempfile.TemporaryDirectory()
output_folder = temp_dir_1.name

# download and extract gps data with msync.download
start_date = '2021-04-06T00:00:00'
msync.backfill(Keyring, study_id, user_id, output_folder, start_date = start_date,  data_streams = ['gps'])
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/var/folders/sk/c14blb7s493_7lgzdq0m3sjm0000gn/T/ipykernel_2349/260581604.py in <module>
      5 # download and extract gps data with msync.download
      6 start_date = '2021-04-06T00:00:00'
----> 7 msync.backfill(Keyring, study_id, user_id, output_folder, start_date = start_date,  data_streams = ['gps'])

~/opt/anaconda3/envs/forest_gh/lib/python3.8/site-packages/mano/sync/__init__.py in backfill(Keyring, study_id, user_id, output_dir, start_date, data_streams, lock, passphrase)
     42         backfill_file = os.path.join(output_dir, user_id, '.backfill')
     43         logger.info('reading backfill file %s', backfill_file)
---> 44         with open(backfill_file, 'a+') as fo:
     45             fo.seek(0)
     46             timestamp = fo.read().strip()

FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/sk/c14blb7s493_7lgzdq0m3sjm0000gn/T/tmpt163bi70/<user ID removed for security>/.backfill'
# remove the temp directory
temp_dir_1.cleanup()

Future announcement - Beiwe will eventually be changing some endpionts

(Hi Tim!)

There are some updates to the Beiwe data access endpoints that I'm going to get to in the currently-distant future. Wondering if I could get repo access for when I eventually get to those.

(Probably new /v2 targets, possible credentialing changes, and possibly adding compression back in to data access - that last one is not trivial.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.