Giter Site home page Giter Site logo

Comments (9)

magwyz avatar magwyz commented on August 16, 2024

It depends a lot on your server performance.
I strongly discourage you to use a virtual machine on a cloud infrastructure as Pastec will be very slow on it. Pastec runs a lot better on a dedicated server. Typically, you can load an instance with up to 1 000 000 images on a server with 40GB RAM (yes, it needs RAM...). With a good CPU, it should answer in less than 4 seconds in this case.

from pastec.

ecdeveloper avatar ecdeveloper commented on August 16, 2024

I have a few questions too.

  1. Is there a way to increase performance on adding images to the index? Currently adding an image to the index takes about a second (running in OSX, 2.5 GHz Intel Core i7, 16GB). I may have to index millions of images, and 1s per indexing is not acceptable. Any way to improve this performance?
  2. Also, is there any cap in terms of amount of images I can store in an index? I have to store, eventually, billions of images. But judging by your last comment, I'm afraid it may not be even possible. Can you suggest anything here?
  3. Is there a way to estimate the index size, based on the amount of images? Currently I have an index with a few thousands of images, and the index file size is ~23Mb. If it keeps to grow with such a rate, I may end up having an index which weights terrabytes for hundreds of millions of images.

Thanks in advance!

from pastec.

magwyz avatar magwyz commented on August 16, 2024
  1. You can first try to multithread your image insertion code.
    There are also some possible optimizations in the code that I need to write correctly and push.
  2. The maximum number of images you can store in an index is set by your compute amount of RAM. Given the signature size, storing billions of images requires indeed a lot of servers... You can however try to rewrite the index to store the signatures on disk but the search will be probably very slow.

from pastec.

ecdeveloper avatar ecdeveloper commented on August 16, 2024

I got it, thanks! And what about the 3rd question? :)

from pastec.

magwyz avatar magwyz commented on August 16, 2024

3 - It will keep growing at such a rate. Keep also in mind that the size of the index saved on disk is different than its size in the RAM. Besides, you will never be able to index hundreds of millions of images with the current Pastec on a single computer.

from pastec.

ecdeveloper avatar ecdeveloper commented on August 16, 2024

Got it, thank you. What about scaling the current Pastec app? Can I scale it across multiple servers?

from pastec.

magwyz avatar magwyz commented on August 16, 2024

Le 15/05/2016 21:26, Evgheni C. a écrit :

Got it, thank you. What about scaling the current Pastec app? Can I
scale it across multiple servers?


You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub
#14 (comment)

Scaling is currently not supported. You have to manage several instances
on your own.

Adrien Maglo, Ph.D.
Pastec developer
http://www.pastec.io
+33 6 27 94 34 41

from pastec.

ecdeveloper avatar ecdeveloper commented on August 16, 2024

Got it, thank you.
So let's assume I figure out how to scale it. But is there a way to reduce the index db file size somehow? I currently indexed only 13K images, and my index size (physical size) is about 70MB. So with this rate, for 1M images it may grow to about 5G.

from pastec.

magwyz avatar magwyz commented on August 16, 2024

There is no easy way.
But this makes little sense since, once again, the size of the index in RAM is different than what is written on disk... What is important is the size in the RAM since you have often less RAM than disk space.

from pastec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.