Giter Site home page Giter Site logo

Comments (5)

huahaiy avatar huahaiy commented on August 27, 2024

That doesn't seem to be plausible, because the examples are part of the tests.

https://github.com/juji-io/datalevin/blob/master/test/pod/huahaiy/datalevin_test.clj#L132

What exactly have you tried?

from datalevin.

noncom avatar noncom commented on August 27, 2024

I see, yeah, that's very strange. Maybe it's my lack of experience with this kind of DSL, but to me it looks like I haven't been able to establish any clear understanding why the results diverge from what I would expect.

So I work in VSCode Calva, this is my deps.edn:

{:paths ["src" "resources"]
 :deps {org.clojure/clojure {:mvn/version "1.11.3"}
        datalevin/datalevin {:mvn/version "0.9.8"}}
 :aliases
 {:run-m {:main-opts ["-m" "noncom.db-test"]}
  :run-x {:ns-default noncom.db-test
          :exec-fn greet
          :exec-args {:name "Clojure"}}
  :jvm-base
  {:jvm-opts ["-Dclojure.compiler.direct-linking=true"
              "--add-opens" "java.base/java.nio=ALL-UNNAMED"
              "--add-opens" "java.base/java.lang=ALL-UNNAMED"
              "--add-opens" "java.base/sun.nio.ch=ALL-UNNAMED"
              "--add-opens" "java.base/jdk.internal.ref=ALL-UNNAMED"]}
  :build {:deps {io.github.clojure/tools.build
                 {:mvn/version "0.10.3"}}
          :ns-default build}
  :test {:extra-paths ["test"]
         :extra-deps {org.clojure/test.check {:mvn/version "1.1.1"}
                      io.github.cognitect-labs/test-runner
                      {:git/tag "v0.5.1" :git/sha "dfb30dd"}}}}}

Note: I had to add all these :jvm-opts because otherwise it would complain, and I pick that repl config when starting a repl.

And here's the scratchpad file with what I do, and what I get.
I added the evaluation results in the comments after each form.
I run each of these forms exactly one time in a fresh repl, without a priorly existing "/db/test.db" file, exactly in this sequence.
I haven't yet run any data removal commands.

(ns noncom.db-test
  (:require [datalevin.core :as d]))

(def db-path (str (System/getProperty "user.dir") "/db/test.db"))

(def conn (d/get-conn db-path))

(d/transact! conn
             [{:name "Frege", :db/id -1, :nation "France", :aka ["foo" "fred"]}
              {:name "Peirce", :db/id -2, :nation "france"}
              {:name "De Morgan", :db/id -3, :nation "English"}])

;; ---------------------------------------------
;; -------------- find examples ----------------
;; ---------------------------------------------

;; Query the data (original from the readme)
(d/q '[:find ?nation
       :in $ ?alias
       :where
       [?e :aka ?alias]
       [?e :nation ?nation]]
     (d/db conn)
     "fred")
;; Example says => #{["France"]}
;; I get => #{}

;; Altered 1
(d/q '[:find ?alias
       :in $ ?e
       :where
       [?e :aka ?alias]]
     (d/db conn)
     "doesn't matter what I write here, it returns the same value")
;; I expect: nothing because the string doesn't exist in the database, so it can't match anything
;; I get => #{[["foo" "fred"]]}

;; Something 2
(d/q '[:find ?nation
       :in $ ?e
       :where
       [?e :aka ?alias]
       [?e :nation ?nation]]
     (d/db conn)
     "doesn't matter what I write here, it returns the same value")
;; I expect: nothing because the string doesn't exist in the database, so it can't match anything
;; I get => #{["France"]}

;; ---------------------------------------------
;; -------------- pull examples ----------------
;; ---------------------------------------------

;; !!! NOTE I haven't run this: Retract the name attribute of an entity
;; (d/transact! conn [[:db/retract 1 :name "Frege"]])

;; Pull the entity, now the name is gone (original from the readme)
(d/q '[:find (pull ?e [*])
       :in $ ?alias
       :where
       [?e :aka ?alias]]
     (d/db conn)
     "fred")
;; Example says => ([{:db/id 1, :aka ["foo" "fred"], :nation "France"}])
;; I get => ()

;; Altered 1
(d/q '[:find (pull ?e [*])
       :in $ ?alias
       :where
       [?e :aka ?alias]]
     (d/db conn)
     ["foo" "fred"])
;; I expect: This result, because `["foo" "fred"]` matches the `:aka` value, but not inside additional `[]`
;; I get => ([{:db/id 1, :name "Frege", :nation "France", :aka ["foo" "fred"]}])

;; Altered 2
(d/q '[:find (pull ?e [*])
      ; :in $ ?alias
       :in $ ?alias
       :where
       [?e :aka ?alias]]
     (d/db conn)
     ["foo" "fred" "some random stuff that's not in the db"])
;; I expect: an empty result, so this time it works?
;; I get => ()

So the results don't match the example. And also don't really match what I would have expected in some cases, but that might be because I expected the wrong thing. I just played around a bit to verify that at least something works.

from datalevin.

huahaiy avatar huahaiy commented on August 27, 2024

You missed one of the most important piece of input: shema.

(def schema {:aka  {:db/cardinality :db.cardinality/many}
             ;; :db/valueType is optional, if unspecified, the attribute will be
             ;; treated as EDN blobs, and may not be optimal for range queries
             :name {:db/valueType :db.type/string
                    :db/unique    :db.unique/identity}})

(def conn (d/get-conn "/tmp/datalevin/mydb" schema))

You would't get expected results if you don't tell the database what type of data you are giving it.

from datalevin.

noncom avatar noncom commented on August 27, 2024

Oh yes, right, I'm sorry. I just tried with the schema, and it worked as expected.

The comment above the schema:

;; Define an optional schema.
;; Note that pre-defined schema is optional, as Datalevin does schema-on-write.
;; However, attributes requiring special handling need to be defined in schema,
;; e.g. many cardinality, uniqueness constraint, reference type, and so on.

gave an impression that schema is optional in the sense that the engine can perform queries on unstructured data made from standard Clojure types, when here are no additional constraints like ensuring cardinality or uniqueness or etc.

So I was thinking this worked akin to destructuring pattern matching that is capable of extracting data by mapping the query on the data, while implicitly extracting the basic schema from the query and data themselves. Like, for example, a regexp can query a string for sturctures without knowing any schema for the string ahead-of-time. But it looks like that it's not how it works then.

Then I have a follow-up question, to correct my understanding, if you would please be so kind to comment?
Imagine such example scenario:

  • There is an incoming stream of semi-structured JSONs or XMLs, which might have some repeating structures inside of them, but they don't fully follow any predefined schema.
  • The JSONs/XMLs get immediately converted into EDN and saved in the database for further analysis.
  • I would want to query these data structures and extract from them some repeating patterns that I assume might be there.

Essentially I'm looking to perform some data mining from a pool of very diverse and unstructured data, but looking for some patterns that I know. These patterns imply hitting several matches within the stucture, so as I understand, theoretically, a datalog could be able to look for such data. Would that be possible with a datalog database? Or is it something outside of the scope, and I would need to use something else instead?

from datalevin.

huahaiy avatar huahaiy commented on August 27, 2024

Right now, the capability your describe is not implemented, but it is on the roadmap, as part of the document store features:

  • 3.0.0 Automatic document indexing.

When users turn on this option, we would implement automatic path extraction and treat path as keys. That's a rough idea, the details remain to be determined. Of course, this is open source, and we welcome contributions.

from datalevin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.