Giter Site home page Giter Site logo

d3b-utils-python's Introduction

Collection of reusable python utilities

Requires

Python >= 3.6

How to install

Using pip

pip install -U git+https://github.com/d3b-center/d3b-utils-python.git@latest-release

Requests with retries

from d3b_utils.requests_retry import Session

response = Session().get("https://www.foo.com")

The python requests library doesn't retry on connection errors unless you add your own custom transport adapter. Many connection errors are intermittent and self-correct, so we should definitely retry them.

Don't use the requests library directly. Use this instead.

S3 contents metadata

Fetch S3 bucket contents metadata using fetch_bucket_obj_info

List all the items in a bucket

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info("my_bucket")

List the items in selected subpaths of a bucket

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info(
  "my_bucket",
  search_prefixes=["source/pics/", "source/uploads/"]
)

Drop folders (keys ending in "/") from the list of returned objects

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info(
  "my_bucket",
  drop_folders=True,
)

Write the contents to a delimited file

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info(
  "my_bucket",
  search_prefixes="source/pics/",
  drop_folders=True,
  output_filename="my_bucket_contents.tsv"
)

Specify the AWS Profile

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info(
  "my_bucket",
  profile="user1"
)

Get Object versions and delete markers

from d3b_utils.s3_contents import fetch_bucket_obj_info

contents = fetch_bucket_obj_info(
  "my_bucket",
  all_versions=True
)

Fetch S3 metadata for a list of files using fetch_obj_list_info

from d3b_utils.s3_contents import fetch_obj_list_info

contents = fetch_obj_list_info(
  ["s3://bucket1/path1", "bucket2/path2"],
  profile="user1",
  all_versions=False
)

d3b-utils-python's People

Contributors

fiendish avatar chris-s-friedman avatar znatty22 avatar gsantia avatar

Stargazers

 avatar Charles Haynes avatar  avatar

Watchers

 avatar James Cloos avatar Allison Heath avatar Yuankun Zhu avatar  avatar Meen Chul Kim avatar  avatar  avatar  avatar

d3b-utils-python's Issues

Fetch metadata in parallel for a given list of S3 paths

We want the ingest library to automatically get S3 data during the ingestion process (issue here). Part of this process is fetching the metadata for all S3 paths in a given list, which might be useful in other contexts. It'd be best to abstract it out as a utility function.

sort_dicts only available in python>=3.8

File "/Users/vankurenn/projects/d3b-warehouse-redcap/venv/lib/python3.7/site-packages/d3b_utils/requests_retry.py", line 26, in init
sort_dicts=False
TypeError: pformat() got an unexpected keyword argument 'sort_dicts'

Consider adding an option to sort by key in _s3meta_to_file() for use case of fetch_aws_bucket_obj_info()

Is your feature request related to a problem? Please describe.
Noting that s3 scrapes currently generate columns for which the ordering is not fixed across consecutive calls to d3b_utils.aws_bucket_contents.fetch_aws_bucket_obj_info(). This can require subsequent downstream steps to re-order the s3 scrape into a consistent format, as is particularly useful when diffing across multiple scrapes (e.g., to ensure that particular files have been deleted, or even restored).

Describe the solution you'd like
It would be great if... we could introduce an option to _s3meta_to_file() that would allow a consistent ordering of the output columns, whether through a supplied list of desired column ordering, or merely by a built-in alphanumeric sort.

Describe alternatives you've considered
A rudimentary case-sensitive alphanumeric sort might be implemented by merely wrapping sorted() around the assignment the following:

keys = set(chain(*(d.keys() for d in content)))

such as:

        keys = sorted(set(chain(*(d.keys() for d in content))))

Additional context
Using the following code:

#!/usr/bin/python3

from d3b_utils.aws_bucket_contents import fetch_aws_bucket_obj_info as abc

contents= abc (
    bucket_name="kf-strides-study-us-east-1-prd-sd-r0eprsgs" ,
    search_prefixes=["source/"],
    drop_folders=True,
    output_filename="s3_scrape_marazita.csv",
    profile=None,
)

four different calls to the same above code resulted in the following four different headers:

$ head -1 *.csv
==> s3_scrape_marazita_2022oct24_2104UTC.csv <==
Key,ETag,StorageClass,Bucket,LastModified,Size

==> s3_scrape_marazita_2022oct25_1509UTC.csv <==
Bucket,Size,Key,ETag,StorageClass,LastModified

==> s3_scrape_marazita_2022oct25_1518UTC.csv <==
ETag,Size,LastModified,Key,StorageClass,Bucket

==> s3_scrape_marazita_2022oct25_1519UTC.csv <==
StorageClass,Key,ETag,Size,Bucket,LastModified

==> s3_scrape_marazita_2022oct25_1730UTC.csv <==
StorageClass,Size,ETag,LastModified,Bucket,Key

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.