Giter Site home page Giter Site logo

Comments (7)

srikris avatar srikris commented on July 2, 2024

Sorry you are having trouble with this.

Can you tell us some more about how you created this dataset? By any chance, was this created by appending many small SFrames? It would also help to know what are the contents (how many rows, how many columns, what are the types of columns).

from turicreate.

teaglin avatar teaglin commented on July 2, 2024

Thanks for the quick response!

I am doing object detection, but I created the dataset using existing code I had for tensorflow. All the image paths + classes are stored in a db and the images are stored in a directory. I do some additional processing to the image so each image is loaded individually via worker queues and then a single worker would write it out sequentially to a training dataset file.

This worked well for larger datasets because tensorflow allows writing out an individual training example, I wasn't able to find anything like that in the sframe docs. I initially tried appending separate sframes, but I could never get that to work. So I store them all in a single list and when all the processing is done I create the sframe and write it out to a file. This is only possible because I have enough RAM.

My sframe structure is exactly like the object detection documentation. 2 columns, but well over 100k rows.

+------------------------+-------------------------------+
| image | annotations |
+------------------------+-------------------------------+
| Height: 375 Width: 500 | [{'coordinates': {'y': 204... |
| Height: 375 Width: 500 | [{'coordinates': {'y': 148... |
| Height: 334 Width: 500 | [{'coordinates': {'y': 146... |
| Height: 500 Width: 345 | [{'coordinates': {'y': 321... |
| Height: 480 Width: 500 | [{'coordinates': {'y': 301... |
| Height: 375 Width: 500 | [{'coordinates': {'y': 121... |
| Height: 335 Width: 500 | [{'coordinates': {'y': 119... |
| Height: 335 Width: 500 | [{'coordinates': {'y': 150... |
| Height: 500 Width: 333 | [{'coordinates': {'y': 235... |
| Height: 333 Width: 500 | [{'coordinates': {'y': 120... |
+------------------------+-------------------------------+

from turicreate.

srikris avatar srikris commented on July 2, 2024

This is well within the limits of the SFrame and it should not cause any trouble. Can you share your snippet of code you are using to create and write out the SFrame to a file? That could help us better identify why this be happening.

from turicreate.

teaglin avatar teaglin commented on July 2, 2024

This is what I ended up having to do to get it to work as I mentioned in my previous post.

class WriteWorker(Thread):
    def __init__(self, savePath):
        Thread.__init__(self)
        self.images = []
        self.annotations = []
        self.queue = Queue()
        self.daemon = True
        self.savePath = savePath
        self.start()

    def run(self):
        while True:
            r = self.queue.get()
            self.images.append(r['image'])
            self.annotations.append(r['annotations'])
            self.queue.task_done()

    def wait(self):
        self.queue.join()
        sf = tc.SFrame({'image':self.images, 'annotations':self.annotations})
        sf.save(self.savePath)

from turicreate.

gustavla avatar gustavla commented on July 2, 2024

Hi @teaglin, the way you are currently creating the SFrame by first building Python lists and then feeding them to the SFrame constructor, means Python needs to keep all your data in memory before it is even handed off to the SFrame. I suspect this could be why your RAM usage is so high.

Instead, you could try using the SFrameBuilder, which is a helper class exactly created for this purpose: building up an SFrame row by row.

Let us know how it goes! I'm also curious to hear about your experience with the object detector, so feel free to drop us a line here when you get some results.

from turicreate.

teaglin avatar teaglin commented on July 2, 2024

@gustavla I tried out that method it did help, but the real trick was setting the cache config.

tc.config.set_runtime_config('TURI_CACHE_FILE_LOCATIONS', network_location)

My only criticism is that this seems like a very important piece for building large SFrames, but the API is very obscure. Also doing it this way is much much slower than directly writing out to a file. For example in tensorflow you can directly write out a single tfrecord, whereas with SFrame it's basically all built in memory. That memory then gets partially cached to the disk and then after all that it gets written back out to a saved SFrame.

As far as the object detection goes I will give you guys an update once I get some results. Thanks again for the help!

from turicreate.

srikris avatar srikris commented on July 2, 2024

I'll close this and add another issue that points to the specific concern.

from turicreate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.