Comments (9)
It depends a lot on your server performance.
I strongly discourage you to use a virtual machine on a cloud infrastructure as Pastec will be very slow on it. Pastec runs a lot better on a dedicated server. Typically, you can load an instance with up to 1 000 000 images on a server with 40GB RAM (yes, it needs RAM...). With a good CPU, it should answer in less than 4 seconds in this case.
from pastec.
I have a few questions too.
- Is there a way to increase performance on adding images to the index? Currently adding an image to the index takes about a second (running in OSX, 2.5 GHz Intel Core i7, 16GB). I may have to index millions of images, and 1s per indexing is not acceptable. Any way to improve this performance?
- Also, is there any cap in terms of amount of images I can store in an index? I have to store, eventually, billions of images. But judging by your last comment, I'm afraid it may not be even possible. Can you suggest anything here?
- Is there a way to estimate the index size, based on the amount of images? Currently I have an index with a few thousands of images, and the index file size is ~23Mb. If it keeps to grow with such a rate, I may end up having an index which weights terrabytes for hundreds of millions of images.
Thanks in advance!
from pastec.
- You can first try to multithread your image insertion code.
There are also some possible optimizations in the code that I need to write correctly and push. - The maximum number of images you can store in an index is set by your compute amount of RAM. Given the signature size, storing billions of images requires indeed a lot of servers... You can however try to rewrite the index to store the signatures on disk but the search will be probably very slow.
from pastec.
I got it, thanks! And what about the 3rd question? :)
from pastec.
3 - It will keep growing at such a rate. Keep also in mind that the size of the index saved on disk is different than its size in the RAM. Besides, you will never be able to index hundreds of millions of images with the current Pastec on a single computer.
from pastec.
Got it, thank you. What about scaling the current Pastec app? Can I scale it across multiple servers?
from pastec.
Le 15/05/2016 21:26, Evgheni C. a écrit :
Got it, thank you. What about scaling the current Pastec app? Can I
scale it across multiple servers?—
You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#14 (comment)
Scaling is currently not supported. You have to manage several instances
on your own.
Adrien Maglo, Ph.D.
Pastec developer
http://www.pastec.io
+33 6 27 94 34 41
from pastec.
Got it, thank you.
So let's assume I figure out how to scale it. But is there a way to reduce the index db file size somehow? I currently indexed only 13K images, and my index size (physical size) is about 70MB. So with this rate, for 1M images it may grow to about 5G.
from pastec.
There is no easy way.
But this makes little sense since, once again, the size of the index in RAM is different than what is written on disk... What is important is the size in the RAM since you have often less RAM than disk space.
from pastec.
Related Issues (20)
- Unix socket support HOT 1
- Pastec Server Not retruns More than 100 records HOT 1
- Installation Error - CMake on mac HOT 2
- Run pastec in https HOT 1
- IMAGE_NOT_ENCODED HOT 4
- Image not encoded HOT 2
- Querying from HTTP HOT 1
- How does image resolution impact results? HOT 4
- Loading the same index on different platforms HOT 5
- Problem when installing pastec on ubuntu 18.04 HOT 5
- Questions regarding pastec HOT 1
- Python wrapper HOT 5
- Feature: VisualWordsORB, but for Danbooru HOT 4
- all the -d commands error, why ? HOT 1
- Segmentation fault (core dumped) HOT 3
- pastec "Could not open the backward index file" HOT 11
- Attach 2 models to the index ?
- Continued developement and problem in weight ranking
- Any publication or text about Pastec? HOT 1
- The URL for setup is not available anymore. Is this project still being maintained?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pastec.