Giter Site home page Giter Site logo

Comments (2)

grant-olson avatar grant-olson commented on July 17, 2024

Okay, I guess I can't attach...

From a3203122831aadbdd7ec641057c949cd7f65db3b Mon Sep 17 00:00:00 2001
From: Grant Olson 
Date: Mon, 19 Jul 2010 17:02:24 -0400
Subject: [PATCH] Find that works for buckets with many thousand items

---
 lib/aws/s3/object.rb |   50 +++++++++++++++++++++-----------------------------
 1 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/lib/aws/s3/object.rb b/lib/aws/s3/object.rb
index bcdf9e1..95b5296 100644
--- a/lib/aws/s3/object.rb
+++ b/lib/aws/s3/object.rb
@@ -143,41 +143,33 @@ module AWS
         # Returns the object whose key is name in the specified bucket. If the specified key does not
         # exist, a NoSuchKey exception will be raised.
         def find(key, bucket = nil)
-          # N.B. This is arguably a hack. From what the current S3 API exposes, when you retrieve a bucket, it
-          # provides a listing of all the files in that bucket (assuming you haven't limited the scope of what it returns).
-          # Each file in the listing contains information about that file. It is from this information that an S3Object is built.
+          # Bucket results come in chunks, 1000 by default.
+          # if the key isn't in the first chunk, we need to look through
+          # subsequent chunks until we find it.
           #
-          # If you know the specific file that you want, S3 allows you to make a get request for that specific file and it returns
-          # the value of that file in its response body. This response body is used to build an S3Object::Value object. 
-          # If you want information about that file, you can make a head request and the headers of the response will contain 
-          # information about that file. There is no way, though, to say, give me the representation of just this given file the same 
-          # way that it would appear in a bucket listing.
-          #
-          # When fetching a bucket, you can provide options which narrow the scope of what files should be returned in that listing.
-          # Of those options, one is marker which is a string and instructs the bucket to return only object's who's key comes after
-          # the specified marker according to alphabetic order. Another option is max-keys which defaults to 1000 but allows you
-          # to dictate how many objects should be returned in the listing. With a combination of marker and max-keys you can
-          # *almost* specify exactly which file you'd like it to return, but marker is not inclusive. In other words, if there is a bucket
-          # which contains three objects who's keys are respectively 'a', 'b' and 'c', then fetching a bucket listing with marker set to 'b' will only
-          # return 'c', not 'b'. 
-          #
-          # Given all that, my hack to fetch a bucket with only one specific file, is to set the marker to the result of calling String#previous on
-          # the desired object's key, which functionally makes the key ordered one degree higher than the desired object key according to 
-          # alphabetic ordering. This is a hack, but it should work around 99% of the time. I can't think of a scenario where it would return
-          # something incorrect.
-          
           # We need to ensure the key doesn't have extended characters but not uri escape it before doing the lookup and comparing since if the object exists, 
           # the key on S3 will have been normalized
-          key    = key.remove_extended unless key.valid_utf8?
-          bucket = Bucket.find(bucket_name(bucket), :marker => key.previous, :max_keys => 1)
-          # If our heuristic failed, trigger a NoSuchKey exception
-          if (object = bucket.objects.first) && object.key == key
-            object 
-          else 
-            raise NoSuchKey.new("No such key `#{key}'", bucket)
+          key = key.remove_extended unless key.valid_utf8?
+          bkt_name = bucket_name bucket
+          partial_bucket = Bucket.find(bkt_name)
+          
+          while not partial_bucket.nil?
+            last_key = nil
+            partial_bucket.each do |s3object|
+              last_key = s3object.key
+              return s3object if last_key == key.to_s
+            end
+            if partial_bucket.is_truncated
+              partial_bucket = Bucket.find(bkt_name, :marker => last_key)
+            else
+              partial_bucket = nil
+            end
           end
+          
+          raise NoSuchKey.new("No such key `#{key}'", bucket)
         end
         
+
         # Makes a copy of the object with key to copy_key, preserving the ACL of the existing object if the :copy_acl option is true (default false).
         def copy(key, copy_key, bucket = nil, options = {})
           bucket          = bucket_name(bucket)
-- 
1.6.5.1

from aws-s3.

dfl avatar dfl commented on July 17, 2024

I found this article to be helpful in getting over the 1000 limit http://jakanapes.com/blog/2010/11/01/s3s-object-limit/

from aws-s3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.