Giter Site home page Giter Site logo

b2blaze's Introduction

b2blaze

CircleCI Code Coverage

Welcome to the b2blaze library for Python.

Backblaze B2 provides the cheapest cloud object storage and transfer available on the internet. Comparatively, AWS S3 is 320% more expensive to store and 400% more expensive to transfer to the internet.

This library will allow you to easily interact with B2 buckets and files as first class objects in Python 2 and 3. It is licensed under the MIT license so feel free to use it anywhere! If you enjoy it, please feel free to contribute or request features.

Installation

To install b2blaze, run the following command in the proper environment:

pip install b2blaze

Setup

You will need a key_id and an application_key to run b2blaze. You can obtain these in the B2 portal. Then, either pass them into B2() or set the environment variables B2_KEY_ID and B2_APPLICATION_KEY.

Example Usage

b2blaze is built around OOP principles and as such all buckets and files are objects which you can interact with. Let's see an example where we list all of our files in a bucket:

from b2blaze import B2
b2 = B2()
bucket = b2.buckets.get('test_bucket')
files = bucket.files.all()

Files will be a list of B2File objects with all of their properties which can then be downloaded by running:

content = files[0].download()

This is a BytesIO object that you can manipulate in any way include saving locally or serving on a website.

Guide

The B2 Object

from b2blaze import B2
b2 = B2()

The B2 object is how you access b2blaze's functionality. You can optionally pass in "key_id" and "application_key" as named arguments but you should probably set them as environment variable as described above.

Buckets

Buckets are essentially the highest level folders in B2, similar to how buckets are used in AWS S3.

Bucket Properties

bucket_id
bucket_name
bucket_type
bucket_info
lifecycle_rules
revision
cors_rules
deleted

List All Buckets

buckets = b2.buckets.all()

Create a Bucket

bucket = b2.buckets.create('test_bucket', security=b2.buckets.public)

Buckets can either be public or private. This does not change the functionality of the library other than that you will need to manually authorize when using file URLs (see below).

Retrieve a bucket

bucket_by_name = b2.buckets.get('test_bucket')
bucket_by_id = b2.buckets.get(bucket_id='abcd')

Delete a bucket

bucket.delete()

This will delete both the bucket and all files within it. There is no confirmation. Use carefully.

Files

Files are the same files you store locally. They can be stored inside folders placed in buckets but this means they simply have a name like "folder/test.txt". There is no distinction between folders and files.

File Properties

file_id
file_name
content_sha1
content_length
content_type
file_info
action
uploadTimestamp
deleted

List All Files in a Bucket

bucket.files.all()

NOTE: There may be tens of thousands of files (or more) in a bucket. This operation will get information and create objects for all of them. It may take quite some time and be computationally expensive to run.

Upload a File

text_file = open('hello.txt', 'rb')
new_file = bucket.files.upload(contents=text_file, file_name='folder/hello.txt')

NOTE: You don't have to call .read() and instead can send the file directly to contents. This will allow the file buffer directly over HTTP to B2 and save a significant amount of memory. Also, contents must be binary or a binary stream.

Upload a Large File

large_file = open('large_file.bin', 'rb')
new_file = bucket.files.upload_large_file(contents=large_file, file_name='folder/large_file.bin', num_threads=4)

NOTE: You cannot call .read() on the file because the function will seek and buffer the file over num_threads for you. Per Backblaze recommendation, part_size defaults to recommendedPartSize from b2_authorize_account (typically 100MB). num_threads defaults to 4 threads. The minimum part size is 5MB and you must have must have at least 2 parts.

Retrieve a File's Information (Necessary before Downloading)

file_by_name = bucket.files.get(file_name='folder/hello.txt')
file_by_id = bucket.files.get(file_id='abcd1234')

Download a file

file = bucket.files.get(file_name='folder/hello.txt')
downloaded_file = file.download()

This returns a BytesIO object which you can manipulate in Python using a tool like PIL, serve on a website, or easily save like this:

save_file = open('save_pic.jpg', 'wb')
save_file.write(downloaded_file.read())
save_file.close()

Delete a file version

file.delete()

This deletes a single version of a file. (See the docs on File Versions at Backblaze for explanation)

Hide (aka "Soft-delete") a file

file.hide()

This hides a file (aka "soft-delete") so that downloading by name will not find the file, but previous versions of the file are still stored. (See the docs on Hiding file at Backblaze for details)

Testing

Unit testing with pytest Before running, you must set the environment variables: B2_KEY_ID and B2_APPLICATION_KEY

** Run tests **

python3 ./tests.py

LICENSE

MIT License

Copyright (c) 2018 George Sibble

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

b2blaze's People

Contributors

fieldse avatar g-mc avatar mitchellhuang avatar raphaelyancey avatar sfermigier avatar sibblegp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

b2blaze's Issues

Getting http.client.RemoteDisconnected while uploading certain files

I'm able to upload most of my jpeg files just fine, but certain ones cause me issues.

Trying to upload this image with the following code:

p1000977

from b2blaze import B2

b2 = B2(key_id="bla", application_key="bla")
bucket = b2.buckets.get('test-bucket-name')

input = open('/PATH/TO/IMAGE.JPG', 'rb')
bucket.files.upload(contents=input, file_name='test.jpg', mime_content_type='image/jpeg')

File upload fails with Connection Error

Uploading a file returns the following error:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
I've been debugging for a while and can't figure out whether it's a problem with the module or Backblaze's services.

Deeper Unit Testing

Add more unit tests and test more details of results including types and properties. Test more error cases. Strive to get to 100% coverage.

File-like object API support.

Or; "please make your file like object file-like" ;P

Glad a friend pointed me at this project; I love the idea, B2 is epic. I'm very strongly in favour of following Python standard object conventions for file-likes, that is, supplying File-like methods as described by Python 3's io package. Ref: https://docs.python.org/3/library/io.html#io.RawIOBase

An example of a Python 2 file-like compatible object can be found in Pymongo's GridFS implementation; for DB-distributed file chunking. (They went with distinct reader and writer objects.)

Thanks for consideration!

Delete file: B2RequestError: 400 - File not present

Hey there. Another one for today.

Edit: Looks like this is an issue on filenames with spaces. Adjusted original post accordingly.

Delete file fails (for filename containing spaces) with B2RequestError: 400 - File not present

Is this normal/expected behavior with object storage? (ie: Do we always need to clean filenames? Or should this be considered a bug?)

Code:

from b2blaze import B2
b2 = B2(key_id=id, application_key=key)
bucket = b2.buckets.get('some_bucket')
files = bucket.files.all()
>>> files[0].file_name
'a text file.1.txt'  # Note spaces in filename

>>> files[0].delete()
Resetting dropped connection: api002.backblazeb2.com
https://api002.backblazeb2.com:443 "POST /b2api/v1/b2_delete_file_version HTTP/1.1" 400 189
Traceback (most recent call last):
   [...]
    raise B2RequestError(decode_error(response))
b2blaze.b2_exceptions.B2RequestError: 400 - {'code': 'file_not_present', 'status': 400, 'message': 'File not present: a%20text%20file.1.txt [snip...file id] '}

Pypi installation is broken for 0.1.7

Missing models:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/b2blaze/__init__.py", line 1, in <module>
    from b2blaze.b2lib import B2
  File "/usr/local/lib/python3.7/site-packages/b2blaze/b2lib.py", line 9, in <module>
    from b2blaze.models.bucket_list import B2Buckets
ModuleNotFoundError: No module named 'b2blaze.models'

Listing the directory:

Successfully installed b2blaze-0.1.7
baf197a81c1e# ls /usr/local/lib/python3.7/site-packages/b2blaze
__init__.py  b2_exceptions.py  connector.py
__pycache__  b2lib.py          utilities.py

Upload large file fails with 'Connection aborted.', RemoteDisconnected

Hi there. Thanks for making this sweet library!

I've found that upload_large_file fails on my end with a Connection aborted error.

 'Connection aborted.', RemoteDisconnected('Remote end closed connection without response')

My code looks something like this:

            fobj = open(file, 'rb')         

            # Prepend directory if given
            target = os.path.join(directory, filename) if directory else filename
            part_size = 5 * 1024 * 1024    # 5 Mb part size, for testing

            # Upload large
            bucket.files.upload_large_file(
                contents=fobj, 
                file_name=target, 
                part_size=part_size, 
                num_threads=4)

I've tested with file sizes of 5+, 10+, and 100+ Mb.
I don't know if it's something wrong on my end (maybe the small part size?) or a problem with upload_large_file.

Let me know what I can do to help research or debug this issue!
Thanks!

Upload files without opening them

In its current implementation, it looks like you have to read in the entire file before uploading, which is memory intensive/prohibitive when working with large files. It would be nice to be able to pass an open file object that is streamed to the server.

Requests already supports file objects, the big change will be calculating the sha1 hash.

cheapest?

well, obviously you write about "object storage" which you assume is cheapest in B2, but if you consider cloud storage, than B2 is VERY costly.

According to B2 pricing:
https://www.backblaze.com/b2/cloud-storage-pricing.html#
The cost of just storing 10TB would be more about $700 per year.

In OVH with their solution "hubic", the cost of operations at 10TB scale, per year is only Eur 50.
https://hubic.com/en/offers/

Besides of that, your library is excellent on its own, so if you would consider in the future adding another cloud providers (i.e. hubic) it would be more than superb.

File Modification Time and/or other metadata

Currently, there is no support for file modification time. I know that the b2blaze api reads a byte stream so it would have the be specified. It seems like B2 supports setting the mod time with the X-Bz-Info-src_last_modified_millis header so that could be an option. And/or the ability to specify other key-value pairs as noted with X-Bz-Info-* would be great

requests upgrade

requirements use an older version of requests which gets flagged as potential security risk

can we upgrade to 2.21 with out issue?

File list prefix

Nice work here. It's fun to see the project grow.

Looking through the api for b2_list_file_names , I see that one can send a prefix over to avoid getting more files back than necessary.

Often such a prefix just provides a convenience. But it looks like a prefix is necessary if your credentials don't give you access to an entire bucket.

What do you think about exposing the prefix feature in this api? One could either extend B2FileList.all or add a sister method (perhaps B2.FileList.filter?).

503 Error: too busy

Backblaze make it clear that the 503 Too Busy error is normal and expected. One is supposed to request a new upload URL .... how can we do this?

Reading Returned Object

file_by_name = bucket.files.get(file_name='folder/hello.txt')

<b2blaze.models.b2_file.B2File object at 0x7f1249800000>

How do I read the information like file name, URL etc?

401 - unauthorized on v0.2.x

Got some simple code:

b2 = B2(B2_KEY_ID, B2_APP_KEY)
bucket = b2.buckets.get(B2_BUCKET)

That returns a 401 - unauthorized:

This happens in latest version 0.2.1 but if I roll-back to 0.1.10 there is no issue with the same code. Am I missing a change or is there an underlying issue with 0.2.1?

Upload with known SHA1

How hard would it be to modify the upload code so that if I already know the SHA1 (since I need it for my app already), it can save computing it again? It looks like it would be around StreamWithHashProgress object. Do you expect there to be any issues in modifying this with a known SHA1? Or do you think it is worth it just to keep the code as is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.