samvera / valkyrie Goto Github PK
View Code? Open in Web Editor NEWA Data Mapper library to enable multiple backends for storage of files and metadata in Samvera
License: Other
A Data Mapper library to enable multiple backends for storage of files and metadata in Samvera
License: Other
Try really hard to use hydra-derivatives here.
Virtus objects can have metadata attached to properties - ordered: true
seems like one we could add.
However, this will probably be annoyingly difficult to implement for the AF adapter.
Also any references to repository
should be storage_adapter
.
Consistent naming is important.
Create a short list of steps on what it takes to add a new work type (IE Book or Page). Consider a generator.
In bulk migration use cases, it might be more efficient to load up a lot of resources, change them in memory, and then persist them all at once (at least for solr/postgres.) The implementations can sometimes be complex (postgres in particular), and it's not efficient for all adapters (AF for instance). Do we want this?
If the method name is find_by_id
, I don't think a named id
parameter provides any extra clarity.
.find_by_id(id: id)
vs.
.find(id: id)
or
.find_by_id(id)
Would prefer either of the latter two forms.
I think it's more honest.
I have no idea what the interface for this might be like, but the speed was dramatically different and it helped things a lot. A transaction buffer (as is possible now) looks something like this:
memory_adapter = Valkyrie::Persistence::Memory::Adapter.new
adapter = Valkyrie::AdapterContainer.new(
persister: CompositePersister.new(Valkyrie::Adapter.find(:postgres).persister, memory_adapter.persister),
query_service: Valkyrie::Adapter.find(:postgres).query_service
)
## Save a bunch of stuff via adapter.persister.save(model: book)
#
## Now that you have a bunch of saved objects in the database,
# you can DRASTICALLY speed up solr indexing (18 mins -> a few seconds for 2600) by doing them all in one call
Valkyrie::Adapter.find(:solr).persister.save_all(models: memory_adapter.query_service.find_all)
It's in the charter.
It takes work in each persister to navigate back and forth between native ruby datatypes and the data-store. We need to document which data types we support.
Right now all that's supported is Internal IDs, language-tagged RDF Literals, and strings. Dates? Times? Integers? ::RDF::URIs?
test.com
is a real domain โ we should probably use example.com
instead and/or make it easier to configure which URIs are used.
Convert methods to use named parameters in persister classes.
I think our forms have dirty tracking, but our models way don't (on purpose.) We should find a way to document that.
This would be an adapter which is proven to be able to interact with the way Hyrax stores data in Fedora/Solr. It will probably be difficult, and isn't actually part of the charter.
When we work on #53 we're going to need to store the user's identifier. Originally I thought that was going to be the username or email or whatever devise said was the primary key, but I realized it might be better to simply support GlobalIDs as a data type in Valkyrie.
That way you could just have GlobalID turn them into objects if you wanted that, and there'd be a difference between "tpend" and "gid://app/User/1"
For things like NOIDs. I think this is necessary - especially for migration.
I haven't dug into it a lot, but it seems to have a lot of good and similar opinions to Valkyrie, with a lot more work put into it:
In the readme it says to run rake server:development
but this causes an error because the tmp
directory is not part of the git clone.
$ rake server:development
Loading configuration from /Users/jcoyne/.solr_wrapper.yml
Unable to copy /var/folders/9t/rygbnddx0b1ckw6tjs3m18qm0000gq/T/d20170706-12948-s9kond to tmp/blacklight-core: No such file or directory @ dir_s_mkdir - tmp/blacklight-core
Probably going to be something along the lines of
fedora_adapter = Valkyrie::Adapter.find(:fedora)
postgres_adapter = Valkyrie::Adapter.find(:postgres)
book = fedora_adapter.query_service.find_by(id: "myid")
book.id = nil
new_book = postgres_adapter.persister.save(model: book)
The use case exists in Hyrax, and at least two institutions I know of use it (UCSB & CHF):
I have a record which has complex metadata as one of the properties - IE, a date range where it's important that the beginning and the end of the range are stored together.
Possible implementation:
it "can save nested resources" do
book = resource_class.new(title: "Sub-nested")
book2 = resource_class.new(title: "Nested", nested_resource: book)
book3 = persister.save(model: resource_class.new(nested_resource: book2))
reloaded = query_service.find_by(id: book3.id)
expect(reloaded.nested_resource.first.title).to eq ["Nested"]
expect(reloaded.nested_resource.first.nested_resource.first.title).to eq ["Sub-nested"]
end
Now, the problem: Getting that test to pass with the postgres & memory adapters took about 20 LOC. Both natively support the concept of nesting and the abstractions are already written and debugged. However, for the other two adapters:
ActiveFedora: There's no interface for "here's a nested resource, build out the hash URIs and handle this for me please." I can't imagine how to write one, either. I could see this working out with something lower level, IE a Fedora persister which directly integrates with LDP, but I don't think that's an option ATM. Maybe the solution here is to reach out to those institutions have implemented this and see what they've done, so we can at least have a compatibility layer.
Solr: There is no such thing as nesting. You can add "child documents", but they're indexed independently, require an ID, and don't have the same lifespan as their parents (https://issues.apache.org/jira/browse/SOLR-6096).
So I'm inclined to say we either:
Supporting multiple types per property will be important for use cases such as controlled in-repository terms. Need some way to distinguish "3" as a remote ID from "3" the string.
What features in an example repository which, when fulfilled, mean this is a viable pattern for Hydra (and/or Hyrax)?
Ideas I'd like feedback on:
The shared specs are a good step, but as we start to solidify some of the interfaces we should probably find a good way to add proper documentation.
Piotr Solnica, the main dev behind virtus wrote this comment a while ago: https://www.reddit.com/r/ruby/comments/3sjb24/virtus_to_be_abandoned_by_its_creator/ and there hasn't been much activity on the gem recently.
This might not be an issue and Virtus might be stable enough for our needs, but we might have to eliminate Virtus at some point.
Now there's two: Disk & Memory.
I think we'll need at least three for a valid prototype:
In the future I'd like to look at
How do we store mime_type and filename? In Fedora these are stored with the binary.
Right now there's four - Memory/Postgres/Fedora/Solr.
I think I'd like to keep Memory/Postgres/Fedora working at least. If Solr doesn't, it's not the end of the world. If it does, it might make some interactions easier. There may be ongoing problems with supporting multiple data-types though.
DEPRECATION WARNING: schema_migrations_table_name is deprecated and will be removed from Rails 5.2 (called from block (2 levels) in <top (required)> at /Users/jcoyne/workspace/valkyrie/spec/support/database_cleaner.rb:4)
I think this is because it's storing them in string fields.
The alternative is fix the raw Fedora adapter's performance problem (#72).
This would be nested resources (Using hash code URIs) for the edm:TimeSpan use case (UCSB)
I have a branch now which has two folders in this repository. However, now that I think about it, I wonder if we can turn off autoloading of the lib directory and just have an entire gem structure in lib/valkyrie
I suggest we return void. Returning a File could be expensive and we may not use the result.
I think this basically means make blacklight-access-controls work. What's the difference between that and HydraAccessControls? @jcoyne ?
File upload gems tend to be pretty locked into ActiveRecord norms. It would be nice if we could prove that Carrierwave could be used with a Valkyrie model without too much interference.
https://coderwall.com/p/e9d_ja/using-carrierwave-uploader-for-tableless-model-in-rails relevant?
Question here about where one should draw the line between "query powered by the fact that you have a Solr index" and "query necessary for the backend to support."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.