Giter Site home page Giter Site logo

Comments (8)

rkingsbury avatar rkingsbury commented on May 22, 2024 1

Thank you for clarifying @davidlatwe ! The error message makes a little more sense to me now. In my case, pymongo is definitely installed and I am passing use_bson=True to MontyClient(). But it seems the error is triggered if I try to create more than one MontyClient using the memory storage engine. As a minimal example:

>>> from montydb import MontyClient
>>> mc=MontyClient(":memory:", use_bson=True)
>>> mc2=MontyClient(":memory:", use_bson=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/client.py", line 41, in __init__
    storage_cls = provide_storage(repository)
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/configure.py", line 269, in provide_storage
    _bson_init(_session["use_bson"])
  File "/Users/ryan/miniconda3/envs/md/lib/python3.9/site-packages/montydb/configure.py", line 282, in _bson_init
    raise ConfigurationError(
montydb.errors.ConfigurationError: montydb has been config to use BSON and cannot be changed in current session.

But if I continue in the above session and create clients using flatfile storage, I don't get this error:

>>> mc3=MontyClient(use_bson=True)
>>> mc4=MontyClient(use_bson=True)

from montydb.

davidlatwe avatar davidlatwe commented on May 22, 2024 1

It shouldn't be difficult to change that.

Right now every instance of in-memory MontyClient is read/write data from/to this one OrderedDict at this line.

To change that, we will need to extend in-memory URL from :memory: to something like :memory:any-name, and then maybe, use that as a top level key of that internal _repo object. Same goes to the _config object.

Also needs to be thread-safe.

See if I can implement that and make a new release by the end of this weekend.

from montydb.

rkingsbury avatar rkingsbury commented on May 22, 2024 1

Thanks so much for adding this @davidlatwe ! It works great. I used the following code to test.

from montydb import MontyClient, set_storage

use_bson = True

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=use_bson,
)
mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:2")


# Proving that two clients are using same memory storage
bar1 = mc1.get_database("foo").get_collection("bar")
bar2 = mc2.get_database("foo").get_collection("bar")
bar1.insert_one({"test": "doc"})
assert bar2.count()==0
assert bar1.count()==1

If understanding correctly from reading your PR, I just need to make sure the repo name starts with :memory:, and then as long as the repo names passed to MontyClient are unique, they will point to different instances.

I'm still slightly confused about where to use set_storage vs. the repository kwarg in MontyClient. In the above example, it seems that it's ok to just specify :memory:2 as the repo for the 2nd MontyClient instance, even though I have never pointed set_storage to it.

Anyway, thanks again for addressing! I'll go ahead and close.

from montydb.

davidlatwe avatar davidlatwe commented on May 22, 2024

Hi @rkingsbury , sorry for my late reply 💦

I didn't do the test, but I believe it's possible to operate multiple instances of MontyClient under the same Python session (process).

But the ConfigurationError you get is another story. The message itself is not doing its best job, quite confusing indeed. 😅

The thing is, MongoDB uses BSON, which makes montydb also uses BSON. And usually, one would get BSON from pymongo.

But for minimum dependency, it doesn't make much sense to add pymongo as a dependency of montydb just for the bson module that came with it.

So the choice I made at that time, was to vendoring a small part of the bson into montydb. And since the bson within montydb is not the same as the bson that came with pymongo (lots of fake type, e.g. SON, Code...), we have to pick which bson we are going to use for current Python session, for performance reason.

Now back to the ConfigurationError you get, my guess is this:

  1. a monty database was created with montydb's own bson module
  2. pymongo installed
  3. pymongo's bson was picked by default simply becasue it now exists
  4. Conflict.

Can you confirm this?

By "repository creation" do you mean "on instantiation of `MontyClient"?

Specifically, the first time you initialize a MontyClient for a database that is not yet exists.

Is there any way to set the storage on a per-instance basis

I think, if both databases were created with same BSON configuration (e.g. have pymongo installed then create all the databases you need), we are safe.

Please let me know if anything is unclear or incorrect, this issue is on my radar now. 😊

from montydb.

rkingsbury avatar rkingsbury commented on May 22, 2024

Hi @davidlatwe , any thoughts about how to fix or work around this? For additional context, I'm trying to use MontyStore as a replacement for mongomock in a scientific data management package called maggma. But to do so, I really need to be able to instantiate multiple independent databases in memory.

from montydb.

davidlatwe avatar davidlatwe commented on May 22, 2024

Hey @rkingsbury , sorry again for another late reply, got flooded by works. 😅 And thanks for pinning me!

The use_bson flag shouldn't be set by MontyClient, but set_storage function. Here's the example:

from montydb import MontyClient, set_storage

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=True,
)

mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:")

It's not obvious, but the README did mention this, just not using memory storage as example though. See the example code in Storage section here 😊

Also noted that in memory storage, all clients are sourcing same storage instance.

Here's the full code that I just tested, also checking that montydb is using correct bson module:

from montydb import MontyClient, set_storage

use_bson = True

set_storage(
    repository=":memory:",
    storage="memory",
    use_bson=use_bson,
)
mc1 = MontyClient(":memory:")
mc2 = MontyClient(":memory:")


# Proving that two clients are using same memory storage
bar1 = mc1.get_database("foo").get_collection("bar")
bar2 = mc2.get_database("foo").get_collection("bar")
bar1.insert_one({"test": "doc"})
assert bar2.find_one({"test": "doc"}) == bar1.find_one({"test": "doc"})


# Check which bson module was used.
doc = bar1.find_one({"test": "doc"})
if use_bson:
    assert type(doc["_id"]).__module__ == "bson.objectid"
else:
    assert type(doc["_id"]).__module__ == "montydb.types.objectid"

Hope this helps!

from montydb.

rkingsbury avatar rkingsbury commented on May 22, 2024

Thanks for clarifying @davidlatwe ! So basically there can only be one MontyClient using memory at a time. That's what I needed to know. Would it be difficult to change that, to make it possible to have multiple independent memory repos?

from montydb.

davidlatwe avatar davidlatwe commented on May 22, 2024

Hey @rkingsbury

Got delayed, but the in-memory engine can now have independent repos. Please try it out. See 2.5.2 😃

from montydb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.