marcel / aws-s3 Goto Github PK

AWS-S3 is a Ruby implementation of Amazon's S3 REST API

License: MIT License

Ruby 99.27% CSS 0.73%

aws-s3's Introduction

AWS::S3¶ ↑

AWS::S3 is a Ruby library for Amazon’s Simple Storage Service’s REST API (aws.amazon.com/s3). Full documentation of the currently supported API can be found at docs.amazonwebservices.com/AmazonS3/2006-03-01.

Getting started¶ ↑

To get started you need to require ‘aws/s3’:

% irb -rubygems
irb(main):001:0> require 'aws/s3'
# => true

The AWS::S3 library ships with an interactive shell called s3sh. From within it, you have access to all the operations the library exposes from the command line.

% s3sh
>> Version

Before you can do anything, you must establish a connection using Base.establish_connection!. A basic connection would look something like this:

AWS::S3::Base.establish_connection!(
  :access_key_id     => 'abc', 
  :secret_access_key => '123'
)

The minimum connection options that you must specify are your access key id and your secret access key.

(If you don’t already have your access keys, all you need to sign up for the S3 service is an account at Amazon. You can sign up for S3 and get access keys by visiting aws.amazon.com/s3.)

For convenience, if you set two special environment variables with the value of your access keys, the console will automatically create a default connection for you. For example:

% cat .amazon_keys
export AMAZON_ACCESS_KEY_ID='abcdefghijklmnop'
export AMAZON_SECRET_ACCESS_KEY='1234567891012345'

Then load it in your shell’s rc file.

% cat .zshrc
if [[ -f "$HOME/.amazon_keys" ]]; then
  source "$HOME/.amazon_keys";
fi

See more connection details at AWS::S3::Connection::Management::ClassMethods.

AWS::S3 Basics¶ ↑

The service, buckets and objects¶ ↑

The three main concepts of S3 are the service, buckets and objects.

The service¶ ↑

The service lets you find out general information about your account, like what buckets you have.

Service.buckets
# => []

Buckets¶ ↑

Buckets are containers for objects (the files you store on S3). To create a new bucket you just specify its name.

# Pick a unique name, or else you'll get an error
# if the name is already taken.
Bucket.create('jukebox')

Bucket names must be unique across the entire S3 system, sort of like domain names across the internet. If you try to create a bucket with a name that is already taken, you will get an error.

Assuming the name you chose isn’t already taken, your new bucket will now appear in the bucket list:

Service.buckets
# => [#<AWS::S3::Bucket @attributes={"name"=>"jukebox"}>]

Once you have succesfully created a bucket you can fetch it by name using Bucket.find.

music_bucket = Bucket.find('jukebox')

The bucket that is returned will contain a listing of all the objects in the bucket.

music_bucket.objects.size
# => 0

If all you are interested in is the objects of the bucket, you can get to them directly using Bucket.objects.

Bucket.objects('jukebox').size
# => 0

By default all objects will be returned, though there are several options you can use to limit what is returned, such as specifying that only objects whose name is after a certain place in the alphabet be returned, and etc. Details about these options can be found in the documentation for Bucket.find.

To add an object to a bucket you specify the name of the object, its value, and the bucket to put it in.

file = 'black-flowers.mp3'
S3Object.store(file, open(file), 'jukebox')

You’ll see your file has been added to it:

music_bucket.objects
# => [#<AWS::S3::S3Object '/jukebox/black-flowers.mp3'>]

You can treat your bucket like a hash and access objects by name:

jukebox['black-flowers.mp3']
# => #<AWS::S3::S3Object '/jukebox/black-flowers.mp3'>

In the event that you want to delete a bucket, you can use Bucket.delete.

Bucket.delete('jukebox')

Keep in mind, like unix directories, you can not delete a bucket unless it is empty. Trying to delete a bucket that contains objects will raise a BucketNotEmpty exception.

Passing the :force => true option to delete will take care of deleting all the bucket’s objects for you.

Bucket.delete('photos', :force => true)
# => true

Objects¶ ↑

S3Objects represent the data you store on S3. They have a key (their name) and a value (their data). All objects belong to a bucket.

You can store an object on S3 by specifying a key, its data and the name of the bucket you want to put it in:

S3Object.store('me.jpg', open('headshot.jpg'), 'photos')

The content type of the object will be inferred by its extension. If the appropriate content type can not be inferred, S3 defaults to binary/octet-stream.

If you want to override this, you can explicitly indicate what content type the object should have with the :content_type option:

file = 'black-flowers.m4a'
S3Object.store(
  file,
  open(file),
  'jukebox',
  :content_type => 'audio/mp4a-latm'
)

You can read more about storing files on S3 in the documentation for S3Object.store.

If you just want to fetch an object you’ve stored on S3, you just specify its name and its bucket:

picture = S3Object.find 'headshot.jpg', 'photos'

N.B. The actual data for the file is not downloaded in both the example where the file appeared in the bucket and when fetched directly. You get the data for the file like this:

picture.value

You can fetch just the object’s data directly:

S3Object.value 'headshot.jpg', 'photos'

Or stream it by passing a block to stream:

open('song.mp3', 'w') do |file|
  S3Object.stream('song.mp3', 'jukebox') do |chunk|
    file.write chunk
  end
end

The data of the file, once download, is cached, so subsequent calls to value won’t redownload the file unless you tell the object to reload its value:

# Redownloads the file's data
song.value(:reload)

Other functionality includes:

# Check if an object exists?
S3Object.exists? 'headshot.jpg', 'photos'

# Copying an object
S3Object.copy 'headshot.jpg', 'headshot2.jpg', 'photos'

# Renaming an object
S3Object.rename 'headshot.jpg', 'portrait.jpg', 'photos'

# Deleting an object
S3Object.delete 'headshot.jpg', 'photos'

More about objects and their metadata¶ ↑

You can find out the content type of your object with the content_type method:

song.content_type
# => "audio/mpeg"

You can change the content type as well if you like:

song.content_type = 'application/pdf'
song.store

(Keep in mind that due to limitiations in S3’s exposed API, the only way to change things like the content_type is to PUT the object onto S3 again. In the case of large files, this will result in fully re-uploading the file.)

A bevie of information about an object can be had using the about method:

pp song.about
{"last-modified"    => "Sat, 28 Oct 2006 21:29:26 GMT",
 "content-type"     => "binary/octet-stream",
 "etag"             => "\"dc629038ffc674bee6f62eb64ff3a\"",
 "date"             => "Sat, 28 Oct 2006 21:30:41 GMT",
 "x-amz-request-id" => "B7BC68F55495B1C8",
 "server"           => "AmazonS3",
 "content-length"   => "3418766"}

You can get and set metadata for an object:

song.metadata
# => {}
song.metadata[:album] = "A River Ain't Too Much To Love"
# => "A River Ain't Too Much To Love"
song.metadata[:released] = 2005
pp song.metadata
{"x-amz-meta-released" => 2005, 
  "x-amz-meta-album"   => "A River Ain't Too Much To Love"}
song.store

That metadata will be saved in S3 and is hence forth available from that object:

song = S3Object.find('black-flowers.mp3', 'jukebox')
pp song.metadata
{"x-amz-meta-released" => "2005", 
  "x-amz-meta-album"   => "A River Ain't Too Much To Love"}
song.metadata[:released]
# => "2005"
song.metadata[:released] = 2006
pp song.metadata
{"x-amz-meta-released" => 2006, 
 "x-amz-meta-album"    => "A River Ain't Too Much To Love"}

Streaming uploads¶ ↑

When storing an object on the S3 servers using S3Object.store, the data argument can be a string or an I/O stream. If data is an I/O stream it will be read in segments and written to the socket incrementally. This approach may be desirable for very large files so they are not read into memory all at once.

# Non streamed upload
S3Object.store('greeting.txt', 'hello world!', 'marcel')

# Streamed upload
S3Object.store('roots.mpeg', open('roots.mpeg'), 'marcel')

Setting the current bucket¶ ↑

Scoping operations to a specific bucket¶ ↑

If you plan on always using a specific bucket for certain files, you can skip always having to specify the bucket by creating a subclass of Bucket or S3Object and telling it what bucket to use:

class JukeBoxSong < AWS::S3::S3Object
  set_current_bucket_to 'jukebox'
end

For all methods that take a bucket name as an argument, the current bucket will be used if the bucket name argument is omitted.

other_song = 'baby-please-come-home.mp3'
JukeBoxSong.store(other_song, open(other_song))

This time we didn’t have to explicitly pass in the bucket name, as the JukeBoxSong class knows that it will always use the ‘jukebox’ bucket.

“Astute readers”, as they say, may have noticed that we used the third parameter to pass in the content type, rather than the fourth parameter as we had the last time we created an object. If the bucket can be inferred, or is explicitly set, as we’ve done in the JukeBoxSong class, then the third argument can be used to pass in options.

Now all operations that would have required a bucket name no longer do.

other_song = JukeBoxSong.find('baby-please-come-home.mp3')

BitTorrent¶ ↑

Another way to download large files¶ ↑

Objects on S3 can be distributed via the BitTorrent file sharing protocol.

You can get a torrent file for an object by calling torrent_for:

S3Object.torrent_for 'kiss.jpg', 'marcel'

Or just call the torrent method if you already have the object:

song = S3Object.find 'kiss.jpg', 'marcel'
song.torrent

Calling grant_torrent_access_to on a object will allow anyone to anonymously fetch the torrent file for that object:

S3Object.grant_torrent_access_to 'kiss.jpg', 'marcel'

Anonymous requests to

http://s3.amazonaws.com/marcel/kiss.jpg?torrent

will serve up the torrent file for that object.

Access control¶ ↑

Using canned access control policies¶ ↑

By default buckets are private. This means that only the owner has access rights to the bucket and its objects. Objects in that bucket inherit the permission of the bucket unless otherwise specified. When an object is private, the owner can generate a signed url that exposes the object to anyone who has that url. Alternatively, buckets and objects can be given other access levels. Several canned access levels are defined:

:private - Owner gets FULL_CONTROL. No one else has any access rights. This is the default.
:public_read - Owner gets FULL_CONTROL and the anonymous principal is granted READ access. If this policy is used on an object, it can be read from a browser with no authentication.
:public_read_write - Owner gets FULL_CONTROL, the anonymous principal is granted READ and WRITE access. This is a useful policy to apply to a bucket, if you intend for any anonymous user to PUT objects into the bucket.
:authenticated_read - Owner gets FULL_CONTROL, and any principal authenticated as a registered Amazon S3 user is granted READ access.

You can set a canned access level when you create a bucket or an object by using the :access option:

S3Object.store(
  'kiss.jpg', 
  data, 
  'marcel', 
  :access => :public_read
)

Since the image we created is publicly readable, we can access it directly from a browser by going to the corresponding bucket name and specifying the object’s key without a special authenticated url:

http://s3.amazonaws.com/marcel/kiss.jpg

Building custum access policies¶ ↑

For both buckets and objects, you can use the acl method to see its access control policy:

policy = S3Object.acl('kiss.jpg', 'marcel')
pp policy.grants
[#<AWS::S3::ACL::Grant FULL_CONTROL to noradio>,
 #<AWS::S3::ACL::Grant READ to AllUsers Group>]

Policies are made up of one or more grants which grant a specific permission to some grantee. Here we see the default FULL_CONTROL grant to the owner of this object. There is also READ permission granted to the Allusers Group, which means anyone has read access for the object.

Say we wanted to grant access to anyone to read the access policy of this object. The current READ permission only grants them permission to read the object itself (for example, from a browser) but it does not allow them to read the access policy. For that we will need to grant the AllUsers group the READ_ACP permission.

First we’ll create a new grant object:

grant = ACL::Grant.new
# => #<AWS::S3::ACL::Grant (permission) to (grantee)>
grant.permission = 'READ_ACP'

Now we need to indicate who this grant is for. In other words, who the grantee is:

grantee = ACL::Grantee.new
# => #<AWS::S3::ACL::Grantee (xsi not set yet)>

There are three ways to specify a grantee: 1) by their internal amazon id, such as the one returned with an object’s Owner, 2) by their Amazon account email address or 3) by specifying a group. As of this writing you can not create custom groups, but Amazon does provide three already: AllUsers, Authenticated and LogDelivery. In this case we want to provide the grant to all users. This effectively means “anyone”.

grantee.group = 'AllUsers'

Now that our grantee is setup, we’ll associate it with the grant:

grant.grantee = grantee
grant
# => #<AWS::S3::ACL::Grant READ_ACP to AllUsers Group>

Are grant has all the information we need. Now that it’s ready, we’ll add it on to the object’s access control policy’s list of grants:

policy.grants << grant
pp policy.grants
[#<AWS::S3::ACL::Grant FULL_CONTROL to noradio>,
 #<AWS::S3::ACL::Grant READ to AllUsers Group>,
 #<AWS::S3::ACL::Grant READ_ACP to AllUsers Group>]

Now that the policy has the new grant, we reuse the acl method to persist the policy change:

S3Object.acl('kiss.jpg', 'marcel', policy)

If we fetch the object’s policy again, we see that the grant has been added:

pp S3Object.acl('kiss.jpg', 'marcel').grants
[#<AWS::S3::ACL::Grant FULL_CONTROL to noradio>,
 #<AWS::S3::ACL::Grant READ to AllUsers Group>,
 #<AWS::S3::ACL::Grant READ_ACP to AllUsers Group>]

If we were to access this object’s acl url from a browser:

http://s3.amazonaws.com/marcel/kiss.jpg?acl

we would be shown its access control policy.

Pre-prepared grants¶ ↑

Alternatively, the ACL::Grant class defines a set of stock grant policies that you can fetch by name. In most cases, you can just use one of these pre-prepared grants rather than building grants by hand. Two of these stock policies are :public_read and :public_read_acp, which happen to be the two grants that we built by hand above. In this case we could have simply written:

policy.grants << ACL::Grant.grant(:public_read)
policy.grants << ACL::Grant.grant(:public_read_acp)
S3Object.acl('kiss.jpg', 'marcel', policy)

The full details can be found in ACL::Policy, ACL::Grant and ACL::Grantee.

Accessing private objects from a browser¶ ↑

All private objects are accessible via an authenticated GET request to the S3 servers. You can generate an authenticated url for an object like this:

S3Object.url_for('beluga_baby.jpg', 'marcel_molina')

By default authenticated urls expire 5 minutes after they were generated.

Expiration options can be specified either with an absolute time since the epoch with the :expires options, or with a number of seconds relative to now with the :expires_in options:

# Absolute expiration date
# (Expires January 18th, 2038)
doomsday = Time.mktime(2038, 1, 18).to_i
S3Object.url_for('beluga_baby.jpg', 
                 'marcel', 
                 :expires => doomsday)

# Expiration relative to now specified in seconds
# (Expires in 3 hours)
S3Object.url_for('beluga_baby.jpg', 
                 'marcel', 
                 :expires_in => 60 * 60 * 3)

You can specify whether the url should go over SSL with the :use_ssl option:

# Url will use https protocol
S3Object.url_for('beluga_baby.jpg', 
                 'marcel', 
                 :use_ssl => true)

By default, the ssl settings for the current connection will be used.

If you have an object handy, you can use its url method with the same objects:

song.url(:expires_in => 30)

To get an unauthenticated url for the object, such as in the case when the object is publicly readable, pass the :authenticated option with a value of false.

S3Object.url_for('beluga_baby.jpg',
                 'marcel',
                 :authenticated => false)
# => http://s3.amazonaws.com/marcel/beluga_baby.jpg

Logging¶ ↑

Tracking requests made on a bucket¶ ↑

A bucket can be set to log the requests made on it. By default logging is turned off. You can check if a bucket has logging enabled:

Bucket.logging_enabled_for? 'jukebox'
# => false

Enabling it is easy:

Bucket.enable_logging_for('jukebox')

Unless you specify otherwise, logs will be written to the bucket you want to log. The logs are just like any other object. By default they will start with the prefix ‘log-’. You can customize what bucket you want the logs to be delivered to, as well as customize what the log objects’ key is prefixed with by setting the target_bucket and target_prefix option:

Bucket.enable_logging_for(
  'jukebox', 'target_bucket' => 'jukebox-logs'
)

Now instead of logging right into the jukebox bucket, the logs will go into the bucket called jukebox-logs.

Once logs have accumulated, you can access them using the logs method:

pp Bucket.logs('jukebox')
[#<AWS::S3::Logging::Log '/jukebox-logs/log-2006-11-14-07-15-24-2061C35880A310A1'>,
 #<AWS::S3::Logging::Log '/jukebox-logs/log-2006-11-14-08-15-27-D8EEF536EC09E6B3'>,
 #<AWS::S3::Logging::Log '/jukebox-logs/log-2006-11-14-08-15-29-355812B2B15BD789'>]

Each log has a lines method that gives you information about each request in that log. All the fields are available as named methods. More information is available in Logging::Log::Line.

logs = Bucket.logs('jukebox')
log  = logs.first
line = log.lines.first
line.operation
# => 'REST.GET.LOGGING_STATUS'
line.request_uri
# => 'GET /jukebox?logging HTTP/1.1'
line.remote_ip
# => "67.165.183.125"

Disabling logging is just as simple as enabling it:

Bucket.disable_logging_for('jukebox')

Errors¶ ↑

When things go wrong¶ ↑

Anything you do that makes a request to S3 could result in an error. If it does, the AWS::S3 library will raise an exception specific to the error. All exception that are raised as a result of a request returning an error response inherit from the ResponseError exception. So should you choose to rescue any such exception, you can simple rescue ResponseError.

Say you go to delete a bucket, but the bucket turns out to not be empty. This results in a BucketNotEmpty error (one of the many errors listed at docs.amazonwebservices.com/AmazonS3/2006-03-01/ErrorCodeList.html):

begin
  Bucket.delete('jukebox')
rescue ResponseError => error
  # ...
end

Once you’ve captured the exception, you can extract the error message from S3, as well as the full error response, which includes things like the HTTP response code:

error
# => #<AWS::S3::BucketNotEmpty The bucket you tried to delete is not empty>
error.message
# => "The bucket you tried to delete is not empty"
error.response.code
# => 409

You could use this information to redisplay the error in a way you see fit, or just to log the error and continue on.

Accessing the last request’s response¶ ↑

Sometimes methods that make requests to the S3 servers return some object, like a Bucket or an S3Object. Othertimes they return just true. Other times they raise an exception that you may want to rescue. Despite all these possible outcomes, every method that makes a request stores its response object for you in Service.response. You can always get to the last request’s response via Service.response.

objects = Bucket.objects('jukebox')
Service.response.success?
# => true

This is also useful when an error exception is raised in the console which you weren’t expecting. You can root around in the response to get more details of what might have gone wrong.

aws-s3's People

Stargazers

Watchers

Forkers

yevgeny packagethief dctanner mattjamieson spatten jasherai whittle bill mza duecorda jgre vladr bryanwoods dvdplm cyx dsboulder eric eisokant dbourguignon ilpoldo samlown kewe worst skarayan gulbrand vinbarnes dudeman seanhess jwarchol duonoid danielb2 averyj wiseleyb chef lparry thomasjoyce chebyte zach pyrat haifeng jmoses pkieltyka ekoontz karmajunkie gtd michaeldwan djtek pivotal-creationmix jkrall reillyse crowdint densitypop valakirka nhemsley jakimowicz atomicobject maljub01 aguilarsoto darkslategrey paperlesspost edenspiekermann johnschult dougal havenwood seifertd jacqui jezc marklazz emk alex-knaub kbaum chrisumbel groupsite lantrix bastien prabode fotomoto jobandtalent efrendiaz karlherler binarymolecule ebertech splice cannikin crystalneth ys lucasuyezu gsa tinygrasshopper baloney jacobdam bookis maykelrr jwagener easylodge ooleem zeevex shenie codders rolandoam

aws-s3's Issues

Random method_missing called on unexpected T_ZOMBIE object

I'm getting some random method_missing called on unexpected T_ZOMBIE object. I'm using RVM + 1.9.2-p180 on SUSE ES 11.

I'm establishing the conn and keeping it opened

AWS::S3::Base.establish_connection! # account keys

After a few minutes calling this

bucket = AWS::S3::Bucket.find @bucket
files = bucket.objects

I got this error

method `method_missing' called on unexpected T_ZOMBIE object (0x804e450 flags=0x3e klass=0x0)
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:162:in `block in collapse_text'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:162:in `each'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:162:in `map'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:162:in `collapse_text'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:78:in `collapse'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:83:in `block in collapse'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `each'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `inject'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `collapse'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:83:in `block in collapse'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `each'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `inject'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:81:in `collapse'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:64:in `out'
...gems/aws-s3-0.6.2/support/faster-xml-simple/lib/faster_xml_simple.rb:53:in `xml_in'
...gems/aws-s3-0.6.2/lib/aws/s3/parsing.rb:64:in `parse'
...gems/aws-s3-0.6.2/lib/aws/s3/parsing.rb:55:in `initialize'
...gems/aws-s3-0.6.2/lib/aws/s3/response.rb:55:in `new'
...gems/aws-s3-0.6.2/lib/aws/s3/response.rb:55:in `parsed'
...gems/aws-s3-0.6.2/lib/aws/s3/extensions.rb:177:in `block in parsed'
...gems/aws-s3-0.6.2/lib/aws/s3/extensions.rb:146:in `expirable_memoize'
...gems/aws-s3-0.6.2/lib/aws/s3/extensions.rb:176:in `parsed'
...gems/aws-s3-0.6.2/lib/aws/s3/response.rb:68:in `bucket'
...gems/aws-s3-0.6.2/lib/aws/s3/bucket.rb:102:in `find'

Error during loading of aws-s3 gem

Hi
We have encountered an issue in our project which manifests itself during the starting of the rails server.
Within the s3.rb file, the require_library_or_gem 'xml/libxml' succeeds but the XML::Parser is not defined raising a NameError on what is currently line 54
This appears to happen as a result of our use of the savon gem.

We have validated this in our test project that just uses the aws-s3 gem and this works fine (the require_library_or_gem call fails and thus falls back to XmlSimple)
In our main project, other work brings in the savon gem and the problem is seen
Modifying our test project to require the savon gem then exhibits the problem.
Removing the require from our Gemfile removes the problem.

Allow setting of open_timeout

Sometimes we get timeouts from S3 that are in the order of 10 minutes. We'd much prefer to have timeouts set to a number less than a minute so that our application can deal with the failure and move on.

mybucket['file_name'] returns nil while S3Object.find('file_name', 'bucket_name') returns the file

This bug is not happening consistently. When I put this gem under a capacity test of 100 requests/hour. This bug will happen around 5% of times.

I have a file on S3 and I am sure it is there and valid. If I do mybucket['file_name'] I get nil. But for the same file if I do S3Object.find('file_name', 'bucket_name') it returns the file.

The documentation on http://amazon.rubyforge.org/ gave me the impression that these I can use these two methods the same and expect the same.

Please add support for versioning

list all versions of an object
get a specific version of an object
enable, disable, and test a bucket's versioning state
etc

Testimonials - only two sizes are needed

When images are uploaded , only two sizes are needed to be created. Now four are getting created.

S3 0.6.1 aborts with wrong number of arguments

Code snippet (a Rake task):
task :boog => :environment do
require 'aws/s3'
include AWS::S3

  Base.establish_connection!(
    :access_key_id     => S3[:access_key_id],
    :secret_access_key => S3[:secret_access_key]
  )

  begin
    puts "Checking delete..."
    S3Object.delete("TEST", S3[:bucket])
    puts "...Success"
  rescue S3Exception => e
    p e.message
    puts "Fail: Could not delete"
  end
end

stack trace:
trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/extensions.rb:137:in __method__' trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/extensions.rb:137:inexpirable_memoize'
trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/extensions.rb:176:in canonical_string' trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/authentication.rb:72:inencoded_canonical'
trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/authentication.rb:94:in initialize' trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/connection.rb:130:innew'
trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/connection.rb:130:in authenticate!' trunk/vendor/gems/aws-s3-0.6.1/lib/aws/s3/connection.rb:34:inrequest'

Want: Bucket to bucket copy in official release

It's been done here in another fork, I would love to see this part of the master so I can have this functionality without extending AWS outside of the gem.

Here's the commit.
http://github.com/dbourguignon/aws-s3/commit/87a7be9447b9623886ec6c54a8805bd6a83dc67b#diff-0

S3Object.url_for does not work correctly with CNAME host-specifications.

:001 > AWS::S3::Base.establish_connection! :access_key_id => 'key', :secret_access_key => 'secret', :server => 'my.cname'
 => #<AWS::S3::Connection:0x3317060 @http=#<Net::HTTP my.cname:80 open=false>, @options={:access_key_id=>"key", :secret_access_key=>"secret", :server=>"my.cname", :port=>80}, @secret_access_key="secret", @access_key_id="key"> 
:002 > AWS::S3::S3Object.url_for 'some-file.jpg', 'my.cname'
 => "http://my.cname/my.cname/some-file.jpg?AWSAccessKeyId=key&Expires=1304600700&Signature=fCWKsMFoWhul3JNlU8eR7VPVHRs%3D" 
:003 > AWS::S3::S3Object.url_for 'some-file.jpg', nil
AWS::S3::CurrentBucketNotSpecified: No bucket name can be inferred from your current connection's address (`my.cname')
    from /usr/local/rvm/gems/ree-1.8.7-2011.03@global/gems/aws-s3-0.6.2/bin/../lib/aws/s3/base.rb:107:in `current_bucket'
    from /usr/local/rvm/gems/ree-1.8.7-2011.03@global/gems/aws-s3-0.6.2/bin/../lib/aws/s3/base.rb:179:in `bucket_name'
    from /usr/local/rvm/gems/ree-1.8.7-2011.03@global/gems/aws-s3-0.6.2/bin/../lib/aws/s3/object.rb:300:in `path!'
    from /usr/local/rvm/gems/ree-1.8.7-2011.03@global/gems/aws-s3-0.6.2/bin/../lib/aws/s3/object.rb:291:in `url_for'
    from (irb):3
:004 > AWS::S3::Version
 => "0.6.2"

The URL returned from :002 (and presumably :003 also) should be:
http://my.cname/some-file.jpg?...
and not
http://my.cname/my.cname/some-file.jpg?...
because the bucket-name should not appear in the path since it can be inferred from the host.

License

Is there a license for this? If so I can't seem to find it.

problem with installation

In my bundler I typed: gem 'aws-s3', :git => 'git://github.com/marcel/aws-s3.git' and this message occurs instead of installing.

Could not find gem 'aws-s3 (>= 0) ruby' in git://github.com/marcel/aws-s3.git (at master).
Source does not contain any versions of 'aws-s3 (>= 0) ruby'

Anyone know what is wrong?

AWS::S3::Bucket.object

I have a bucket on s3 cloud with 5 directories in it. When I tr to get all the objects from the bucket from rails console using

AWS::S3::Bucket.object(bucket_name) it gives me objects of only first directory. I do not get objects of the remaining 4 directories.

Can you please help me out..

Invalid multi byte escape error in Ruby 2.0.0-p0

Getting this error in the following line

.rvm/gems/ruby-2.0.0-p0/gems/aws-s3-0.6.2/lib/aws/s3/extensions.rb:84: invalid multibyte escape: /[\x80-\xFF]/ (SyntaxError)

Ruby - 2.0.0-p0
Rails - 3.2.11

AWS::S3::Connection.prepare_path doesn't properly escape plus sign '+' in key

AWS::S3::Connection.prepare_path isn't properly escaping the plus sign in a key. This is because the plus sign is sometimes used to represent a space (at least according to the Ruby devs) i.e. the URI.escape() doesn't convert '+' to '%2B'...

Solution: Add a special gsub on line 11 i.e. URI.escape(path) to

URI.escape(path).gsub('+', '%2B')

Cannot set metadata on creation

It would ne nice if we were able to add metadata on object creation.

If I do something like:

obj = bucket.new_object
obj.key = "test.txt"
obj.value = "some text"
obj.metadata[:subject] = "My subject"

I get
ArgumentError: wrong number of arguments (0 for 1)
aws-s3-0.6.2/lib/aws/s3/object.rb:513:in `initialize'

After that if I try to fetch this object using
obj = bucket["test.txt"]
then I do
obj.metadata, I get the same error.

I didn't find any way to modify the metadata of this object without restarting the app.

Multipart Feature

Is there a way to use the new s3 multipart upload feature right now with aws-s3?
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?uploadobjusingmpu.html

Object#returning has been deprecated in favor of Object#tap

I have an authenticated download method in my asset model as follows:

AWS::S3::S3Object.url_for(upload.path(style || upload.default_style), upload.bucket_name, :use_ssl => upload.s3_protocol == 'https', :expires_in => expires_in)

This generates a secure, expiring link for the asset. After upgrading from Ruby 1.8.7/Rails 2.3.5 to Ruby 1.9.2/Rails 3.0.3 I get the warning in the subject of this issue.

P.S. Is there a more active fork of this gem? A bit disconcerting that Paperclip uses it for S3 when it hasn't been updated in so long. Cheers.

#previous! depending on Ruby version

Starting on line 29 in /lib/aws/s3/extensions.rb:

  if RUBY_VERSION <= '1.9'
    def previous!
      self[-1] -= 1
      self
    end
  else
    def previous!
      self[-1] = (self[-1].ord - 1).chr
      self
    end
  end

Would subtracting 1 from a string ever work in Ruby 1.9? I'm guessing this is a feature in 2.0, but this bit seems to be breaking the S3 gem for us while we're running 1.9.3.

S3Object.store $stdin

Hey,

I've been trying to build a tool that allows me to put some data from STDIN into S3, and I have hit a brick wall with your S3 library. When I try to call AWS::S3::S3Object.store(path, $stdin, bucket) I get an exception thrown...

/Library/Ruby/Gems/1.8/gems/aws-s3-0.6.2/lib/aws/s3/connection.rb:41:in `request': undefined method `size' for #<IO:0x106464bd0> (NoMethodError)
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:543:in `start'
    from /Library/Ruby/Gems/1.8/gems/aws-s3-0.6.2/lib/aws/s3/connection.rb:52:in `request'
    from /Library/Ruby/Gems/1.8/gems/aws-s3-0.6.2/lib/aws/s3/base.rb:69:in `request'
    from /Library/Ruby/Gems/1.8/gems/aws-s3-0.6.2/lib/aws/s3/base.rb:88:in `put'
    from /Library/Ruby/Gems/1.8/gems/aws-s3-0.6.2/lib/aws/s3/object.rb:241:in `store'
    from ./s3pipe.rb:115

Everything works great if I call $stdin.read however the data i'm piping into my ruby program is upwards of 2GB, and i'm running this on low memory machines.

By the looks of things, your relying on a few methods that are only defined in the File class, and not handling raw IO objects. It would be aweseomeee if you could fix this bug :-)! I can't wait to open source this little ruby file i'm writing!

Thanks.

AWS::S3::S3Object.url_for return access denied with rails 3.0.10

Blank URL should not exist as an S3Object

Hello,

I think this is personal taste so I would not count this as an issue, but here is my doubt:

AWS::S3::S3Object.exists?( "", "mybucket")

I don't think this should return true because there is no S3Object.
I understand the logic behind it as the URL exists, this is the address of the bucket itself.
But in the way it is accessed here, it does not make much sense.
Especially when the find method raises an error:

AWS::S3::S3Object.find( "", "mybucket")   #=> NoMethodError: undefined method `-' for nil:NilClass

At least both should be consistent in terms of failure-or-not.
Also I understand that looking for a blank filename is stupid, but it makes sense if the name is generated.
I personally encountered this while building some tests/specs.

My test was originally having nil as a filename and it raises an error so I thought I could just pass it through Nil#to_s and that is when I had my test failing while it was imho correct.

What do you guys think about that?
I'm looking forward to hearing your opinion.

Thank you for reading
Mig

Key with leading slash

Hi - was wondering whether anyone come across an issue where if you key starts with "/" , you get back AWS::S3::NoSuchKey .

Thanks,

Patrick

Updating to 0.6.4 (or later)?

It looks like there are several pretty active forks of this package, some of which seem to be forks of 37signals' fork, which updated the version number to 0.6.4 in August 2010. Would it be possible to get their fork merged into this repo... or are things at a point where amazazon.rubyforge.com should point to a different branch?

multithreaded environment

Can this library work in a multithreaded environment? Can a connection object be created rather than establishing an interpreter-global connection with AWS::S3::Base.establish_connection! ?

AWS::S3::S3Object.store doesn't support transfer encoding chunked

When using AWS::S3::S3Object.store with a generic IO object - say a IO.pipe, store should use transfer encoding chunked since it can't compute the size of the read end of the pipe. This would allow people to stream uploads. The example use case is you are downloading files from one machine via net/http net/scp or net/ftp... instead of downloading the full document and writing it to disk it would be nice to stream those files down in chunks and upload the chunks directly to S3 rather than download them completely and than upload them.

AWS-S3 doesnt validate SSL certificates correctly

I what would otherwise be considered a security vulnerability, AWS-S3 does not properly validate the S3 server certificate. See [1] and [2] for details.

Connection reset by peer and slow uploads

I have about 100 000 files which I'm migrating to S3. Most of the files are under 1MB.

I'm using the aws-s3 gem with Rails 3.1. The bucket is located on the EU server and this is the contents of initializers/s3.rb

require 'aws/s3'

AWS::S3::DEFAULT_HOST.replace "s3-eu-west-1.amazonaws.com"

AWS::S3::Base.establish_connection!(
   :access_key_id     => '...',
   :secret_access_key => '...'
)

The background task responsible for migrating the files simply loops the database rows and uploads the associated files to S3:

for image in Image.unmigrated.limit(20)
  AWS::S3::S3Object.store(image.relative_path, open(image.absolute_path), "thebucket")
  if AWS::S3::Service.response.success?
    image.update_attribute(:s3, true)

    File.delete(image.absolute_path)
  end
end

I run this through a rake task and two bad things happen:

At some point (can be after just a few, or even 500 images) the "Connection reset by peer" error comes
Sometimes the upload speed becomes really slow (sometimes it recovers, sometimes it gives the above error)

I tried the two fixes suggested here https://forums.aws.amazon.com/message.jspa?messageID=86028 , that is, to change the TCP window scaling as well as changing the above code to use

bytes = nil
File.open(image.absolute_path, "rb") { |f| bytes = f.read }
AWS::S3::S3Object.store(image.relative_path, bytes, "thebucket")

However, I still keep getting the same fatal error. Any ideas as what I could try?

Invalid group uri with Grantee.group = 'Authenticated'

Following this documentation i tried different ways of specifying groups for updating object ACL

http://amazon.rubyforge.org/doc/classes/AWS/S3/ACL/Grantee.html

grantee.group = 'Authenticated' => "Invalid group uri"

grantee.id = '75........2a0' => "The XML you provided was not well-formed or did not validate against our published schema"

grantee.group = 'AllUsers' => success

AWS::S3::S3Object.copy method doesn't work with additional options

AWS::S3::S3Object.copy method doesn't merge options hash passed to it properly with the store method used by it.

steps to reproduce:

AWS::S3::S3Object.copy(source_path, copy_path, bucket, :access => :public_read)
creates copy of file at copy_path
trying to access the copy_path file will return "access denied" xml in Firefox

workaround:
use AWS::S3::S3Object.store(copy_path, open(AWS::S3::S3Object.url_for(source_path,bucket), bucket, :access => :public_read)

Storing objects should automatically provide and compare MD5 header

When objects are stored, the library should provide the possibility of automatically filling the Content-MD5 header and throwing an error if the integrity check fails.

AWS::S3 Incompatible with Contacts gem and maybe others due to conflicting Hash#to_query_string

recommended fix: don't monkeypatch Hash. The method is only used once! I would submit a patch but I've already wasted enough time figuring this out and I'm under a tight deadline here.

Question about usage

I have an empty folder (bar) in my s3 bucket

bucket = Bucket.find("my.bucket", :prefix => "bar")
bucket.object.size # => 1000

I'm fairly certain this should return 0, since there aren't any files that match :prefix, yet I'm getting back a bunch of files from other folders on s3.

Is this expected behavior?

Numeric Keys

Buckets that have numeric keys are unable to list the contents of the bucket. For example, a key equal to '20092521' will not work correctly, but '20092521.txt' will.

:marker option no longer seems to work with AWS::S3::Bucket.objects()

These both return the same set of 1000 elements, despite all filenames starting with 'p' and using 'q' as the :marker option.

ruby-1.9.3-p125 :012 > AWS::S3::Bucket.objects(@bucket_label, :prefix => @remote_path).first
=> #<AWS::S3::S3Object:0x193416160 '/lumos-data-dump-prod01/reports/purchase-events/part-00000'>

ruby-1.9.3-p125 :013 > AWS::S3::Bucket.objects(@bucket_label, :prefix => @remote_path, :marker => "q").first
=> #<AWS::S3::S3Object:0x196253020 '/lumos-data-dump-prod01/reports/purchase-events/part-00000'>

Rename should copy acl too by default

I'm not sure if that's issue or feature request, but I believe that when you use the S3Object#rename method I believe that it should pass the :copy_acl=>true option to the copy operation.

And anyway, it should be documented...

Conflict with right_http_connection-1.2.4

... right_http_connection-1.2.4/lib/net_fix.rb

... aws-s3-0.6.2/lib/aws/s3/extensions.rb

Original version of send_request_with_body_stream takes 5 arguments, but rewritten one takes only 4.

Bad 'expire' timestamp on authentication?

I noticed that when no 'date' value is provided on the request, Time.now is being used to build the time value: http://github.com/marcel/aws-s3/blob/master/lib/aws/s3/authentication.rb#L85

Since the server making the request Timezone may differ by many hours from the S3 one, the returned expire time may have passed already, making the url obtained in url_for useless.

Changing this to Time.now.utc fixed the problem for me.

"Invalid group: uri" while setting :authenticated_read

My object was private and I wanted to grant the READ permission to authenticated users:

policy = S3Object.acl(name, bucket)
policy.grants << ACL::Grant.grant(:authenticated_read)
# persist it:
S3Object.acl(name, bucket, policy)  # BOOM! doesn't work

The exception is from the server: "Invalid group uri". This is what it was trying to send (for the offending grant):

<Grantee xsi:type="Group" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <URI>http://acs.amazonaws.com/groups/global/Authenticated</URI>
</Grantee>

I think it Amazon choked because it wanted this URI instead:

http://acs.amazonaws.com/groups/global/AuthenticatedUsers

Super Weird Class loading when I have a class ending in Bucket

Hi guys,

I had wrote a class: ConversionsByBucket, and weird things stared to happen.

After much hand wringing I tracked it down s3/extensions.rb overriding const_missing and hijacking modules/classes ending in 'Bucket'.

So:

At the very least add a warning message, maybe:
puts "Transforming #{sym.to_s} into a AWS::S3::Bucket, if using AWS/S3, ending modules/class with Bucket is reserved"
Is the overriding of const_missing really necessary -- seem heavy handed.

Thanks,

Jonathan

Add invalidation for S3 backed Cloudfront files

I would like to see the ability to mark S3 backed Cloudfront files as invalid so that Cloudfront will replace them sooner rather than later.

The API call might look something like this:
AWS::S3::S3Object.invalidate(file_key, 'my_bucket')

Perhaps it could also be an option to delete or upload as well. Something like this:
AWS::S3::S3Object.delete(file_key, 'my_bucket', :invalidate => true)

Here is more information on the new invalidation feature from Amazon:
http://aws.amazon.com/about-aws/whats-new/2010/08/31/cloudfront-adds-invalidation-feature/

Find doesn't work on buckets with many thousand items (patch included)

The current find wasn't working for me on a bucket with a few thousand items that I'm using to cache documents. I was basically trying to find a doc, then store it if it didn't exist. But I could never find the document.

If I switched to a brand new bucket with zero items, it saved items properly.

Digging through the find code, it seemed like we didn't find things because we weren't in the first 'chunk' of the bucket.

Attached is a patch that will go through each chunk until it finds the item, or raise a NoSuchKey error if the item isn't found.

It fixed my problem locally.

Support for deleting multiple objects at once

Since it takes ages to delete a folder or a bucket with some decent amount of files in it, it would be nice to make use of the multi-object deletion:

http://docs.amazonwebservices.com/AmazonS3/latest/API/multiobjectdeleteapi.html

Segmentation fault

~/.bundler/ruby/1.8/gems/aws-s3-0.6.2/lib/aws/../../support/faster-xml-simple/lib/faster_xml_simple.rb:162: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2009.10

Kernel method collision

Just a heads up, I think there's a collision with ruby facets on the Kernel method patching. If anybody gets a 1 for 0 argument error, chances are something is loading facets and patching Kernel after aws-s3 has been loaded up.

POST Object restore support

Hi.

I accidentally backed up my whole bucket to glacier. This is great, except that the files are no longer in S3. I need to restore them all to S3 and since there are a lot - I would prefer to do this programmatically.

I have tried hacking together a method on S3Object called restore_form_glacier but I am having no luck with the signature.

Any plans to support this in the gem? Any ideas why the signature is not working for my custom method?

proper handling of errors for s3 copy

From the Documentation:

There are two opportunities for a copy request to return an error. One can occur when Amazon S3 receives the copy request and the other can occur while Amazon S3 is copying the files. If the error occurs before the copy operation starts, you receive a standard Amazon S3 error. If the error occurs during the copy operation, the error response is embedded in the 200 OK response. This means that a 200 OK response can contain either a success or an error. Make sure to design your application to parse the contents of the response and handle it appropriately.

Current error checking is implemented as:

def error?
  !success? && response['content-type'] == 'application/xml' && parsed.root == 'error'
end

where success? only returns true for response code 200..299

Basically, it looks like aws-s3 won't handle the case specified in the documentation

Ruby 1.9 Encoding Issue

Adding the following as the first line of lib/aws/s3/extensions.rb fixes a problem where the gem will not load in Ruby 1.9:

# encoding: BINARY

Without this you will get the error:

gems/aws-s3-0.6.2/lib/aws/s3/extensions.rb:84: invalid multibyte escape: /[\x80-\xFF]/

HTTP SSL Options Error

AWS::S3::Connection.create_connection has the following bit of logic...

http.use_ssl = !options[:use_ssl].nil? || options[:port] == 443

:use_ssl => false thus turns SSL on - which is incorrect/unexpected behaviour

copy does not work for keys with non-ascii characters

This one was a bit of a pain to debug. I was getting AWS::S3::SignatureDoesNotMatch when copying certain keys. Turns out the x-amz-copy-source header should be escaped as documented at http://docs.amazonwebservices.com/AmazonS3/index.html?RESTObjectCOPY.html.

http://github.com/dasil003/aws-s3/commit/e88f3fdf11080891811d7cd8da7adaeb5dd11fcb

Writing to buckets located in Europe

Currently there's problem with writing to buckets that are located in Europe. That's because we need the bucket name as part of the domain name, rather than the path itself.

We have a fork of this library that supports writing to buckets. Is it possible to have a merge of those changes?

http://github.com/isaacfeliu/aws-s3/tree/master