Comments (7)
Sorry you are having trouble with this.
Can you tell us some more about how you created this dataset? By any chance, was this created by appending many small SFrames? It would also help to know what are the contents (how many rows, how many columns, what are the types of columns).
from turicreate.
Thanks for the quick response!
I am doing object detection, but I created the dataset using existing code I had for tensorflow. All the image paths + classes are stored in a db and the images are stored in a directory. I do some additional processing to the image so each image is loaded individually via worker queues and then a single worker would write it out sequentially to a training dataset file.
This worked well for larger datasets because tensorflow allows writing out an individual training example, I wasn't able to find anything like that in the sframe docs. I initially tried appending separate sframes, but I could never get that to work. So I store them all in a single list and when all the processing is done I create the sframe and write it out to a file. This is only possible because I have enough RAM.
My sframe structure is exactly like the object detection documentation. 2 columns, but well over 100k rows.
+------------------------+-------------------------------+
| image | annotations |
+------------------------+-------------------------------+
| Height: 375 Width: 500 | [{'coordinates': {'y': 204... |
| Height: 375 Width: 500 | [{'coordinates': {'y': 148... |
| Height: 334 Width: 500 | [{'coordinates': {'y': 146... |
| Height: 500 Width: 345 | [{'coordinates': {'y': 321... |
| Height: 480 Width: 500 | [{'coordinates': {'y': 301... |
| Height: 375 Width: 500 | [{'coordinates': {'y': 121... |
| Height: 335 Width: 500 | [{'coordinates': {'y': 119... |
| Height: 335 Width: 500 | [{'coordinates': {'y': 150... |
| Height: 500 Width: 333 | [{'coordinates': {'y': 235... |
| Height: 333 Width: 500 | [{'coordinates': {'y': 120... |
+------------------------+-------------------------------+
from turicreate.
This is well within the limits of the SFrame and it should not cause any trouble. Can you share your snippet of code you are using to create and write out the SFrame to a file? That could help us better identify why this be happening.
from turicreate.
This is what I ended up having to do to get it to work as I mentioned in my previous post.
class WriteWorker(Thread):
def __init__(self, savePath):
Thread.__init__(self)
self.images = []
self.annotations = []
self.queue = Queue()
self.daemon = True
self.savePath = savePath
self.start()
def run(self):
while True:
r = self.queue.get()
self.images.append(r['image'])
self.annotations.append(r['annotations'])
self.queue.task_done()
def wait(self):
self.queue.join()
sf = tc.SFrame({'image':self.images, 'annotations':self.annotations})
sf.save(self.savePath)
from turicreate.
Hi @teaglin, the way you are currently creating the SFrame by first building Python lists and then feeding them to the SFrame constructor, means Python needs to keep all your data in memory before it is even handed off to the SFrame. I suspect this could be why your RAM usage is so high.
Instead, you could try using the SFrameBuilder, which is a helper class exactly created for this purpose: building up an SFrame row by row.
Let us know how it goes! I'm also curious to hear about your experience with the object detector, so feel free to drop us a line here when you get some results.
from turicreate.
@gustavla I tried out that method it did help, but the real trick was setting the cache config.
tc.config.set_runtime_config('TURI_CACHE_FILE_LOCATIONS', network_location)
My only criticism is that this seems like a very important piece for building large SFrames, but the API is very obscure. Also doing it this way is much much slower than directly writing out to a file. For example in tensorflow you can directly write out a single tfrecord, whereas with SFrame it's basically all built in memory. That memory then gets partially cached to the disk and then after all that it gets written back out to a saved SFrame.
As far as the object detection goes I will give you guys an update once I get some results. Thanks again for the help!
from turicreate.
I'll close this and add another issue that points to the specific concern.
from turicreate.
Related Issues (20)
- issue seaborn
- GraphLab Create requires a license to use in linux HOT 1
- SFRAME problem and turicreate HOT 3
- Can you continue training ObjectDetector model?
- Object detection - Segfault after a large number of iterations
- available data sets in turicreate
- Mac M2 model.export_coreml('.mlmodel') Unable to export model HOT 1
- TuriCreate still doesn't work on M1 using rosetta terminal HOT 7
- While training object_detector in colab randomly Using CPU/GPU to create model.
- Trying to create a model on a larger dataset - Loss stuck at the same number and not moving, resulting model predictions detect nothing
- Support Python 3.9 HOT 1
- pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
- Simple Image Classification Model gives different confidence level (Between Coreml UI and iOS App)
- pip dependency conflicts: conda-repo-cli 1.0.20 requires nbformat==5.4.0, but you have nbformat 5.7.3 which is incompatible. HOT 1
- AttributeError: module 'numpy' has no attribute 'typeDict' HOT 1
- Cannot install and import TuriCreate HOT 1
- Columns and DataType Not Explicitly Set on line 611 of sgraph.py
- Error While Installing Turicreate to my Windows via WSL HOT 1
- Benzinga error
- when you planning run it on windows natively (not wsl)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turicreate.