The valkyrie from samvera

Implement derivative generation

Try really hard to use hydra-derivatives here.

Generic support for ordered properties?

Virtus objects can have metadata attached to properties - ordered: true seems like one we could add.

However, this will probably be annoyingly difficult to implement for the AF adapter.

Write documentation on how ValueMapper works.

Rename `FileRepository` to `StorageAdapter`

Also any references to repository should be storage_adapter.

Consistent naming is important.

Custom Work Type Implementation

Create a short list of steps on what it takes to add a new work type (IE Book or Page). Consider a generator.

"save_all"?

In bulk migration use cases, it might be more efficient to load up a lot of resources, change them in memory, and then persist them all at once (at least for solr/postgres.) The implementations can sometimes be complex (postgres in particular), and it's not efficient for all adapters (AF for instance). Do we want this?

named param in find_by_id(id: id) is redundantly redundant

If the method name is find_by_id, I don't think a named id parameter provides any extra clarity.

.find_by_id(id: id)

vs.

.find(id: id)

or

.find_by_id(id)

Would prefer either of the latter two forms.

Rename Fedora adapter to ActiveFedora Adapter?

I think it's more honest.

Formalize a `transaction buffer` pattern

I have no idea what the interface for this might be like, but the speed was dramatically different and it helped things a lot. A transaction buffer (as is possible now) looks something like this:

    memory_adapter = Valkyrie::Persistence::Memory::Adapter.new
    adapter = Valkyrie::AdapterContainer.new(
      persister: CompositePersister.new(Valkyrie::Adapter.find(:postgres).persister, memory_adapter.persister),
      query_service: Valkyrie::Adapter.find(:postgres).query_service
    )
   ## Save a bunch of stuff via adapter.persister.save(model: book)
   #
   ## Now that you have a bunch of saved objects in the database, 
   # you can DRASTICALLY speed up solr indexing (18 mins -> a few seconds for 2600) by doing them all in one call
   Valkyrie::Adapter.find(:solr).persister.save_all(models: memory_adapter.query_service.find_all)

Storage adapter for Fedora

It's in the charter.

Identify supported data types.

It takes work in each persister to navigate back and forth between native ruby datatypes and the data-store. We need to document which data types we support.

Right now all that's supported is Internal IDs, language-tagged RDF Literals, and strings. Dates? Times? Integers? ::RDF::URIs?

We probably shouldn't be using test.com URIs

test.com is a real domain — we should probably use example.com instead and/or make it easier to configure which URIs are used.

indexing_persister configured adapter is a persister, not an adapter.

named parameters in persisters

Convert methods to use named parameters in persister classes.

Document how to use shared specs

Figure out how to talk about the lack of dirty tracking

I think our forms have dirty tracking, but our models way don't (on purpose.) We should find a way to document that.

Use storage adapter checksums for downloading files.

Isolate Valkyrie model code into `lib` for extraction into gem later.

Hyrax Adapter?

This would be an adapter which is proven to be able to interact with the way Hyrax stores data in Fedora/Solr. It will probably be difficult, and isn't actually part of the charter.

Support for GlobalIDs?

When we work on #53 we're going to need to store the user's identifier. Originally I thought that was going to be the username or email or whatever devise said was the primary key, but I realized it might be better to simply support GlobalIDs as a data type in Valkyrie.

That way you could just have GlobalID turn them into objects if you wanted that, and there'd be a difference between "tpend" and "gid://app/User/1"

Implement configurable ID generation

For things like NOIDs. I think this is necessary - especially for migration.

Look into Shrine as a replacement for storage adapters

I haven't dug into it a lot, but it seems to have a lot of good and similar opinions to Valkyrie, with a lot more work put into it:

https://github.com/janko-m/shrine

rake tasks fail because tmp directory doesn't exist

In the readme it says to run rake server:development but this causes an error because the tmp directory is not part of the git clone.

$ rake server:development
Loading configuration from /Users/jcoyne/.solr_wrapper.yml
Unable to copy /var/folders/9t/rygbnddx0b1ckw6tjs3m18qm0000gq/T/d20170706-12948-s9kond to tmp/blacklight-core: No such file or directory @ dir_s_mkdir - tmp/blacklight-core

When migration from Hyrax to another adapter is possible, document how to do it.

Probably going to be something along the lines of

fedora_adapter = Valkyrie::Adapter.find(:fedora)
postgres_adapter = Valkyrie::Adapter.find(:postgres)
book = fedora_adapter.query_service.find_by(id: "myid")
book.id = nil
new_book = postgres_adapter.persister.save(model: book)

Run characterization on files during upload.

Implement Collections

Nested resources?

The use case exists in Hyrax, and at least two institutions I know of use it (UCSB & CHF):

I have a record which has complex metadata as one of the properties - IE, a date range where it's important that the beginning and the end of the range are stored together.

Possible implementation:

  it "can save nested resources" do
    book = resource_class.new(title: "Sub-nested")
    book2 = resource_class.new(title: "Nested", nested_resource: book)
    book3 = persister.save(model: resource_class.new(nested_resource: book2))

    reloaded = query_service.find_by(id: book3.id)
    expect(reloaded.nested_resource.first.title).to eq ["Nested"]
    expect(reloaded.nested_resource.first.nested_resource.first.title).to eq ["Sub-nested"]
  end

Now, the problem: Getting that test to pass with the postgres & memory adapters took about 20 LOC. Both natively support the concept of nesting and the abstractions are already written and debugged. However, for the other two adapters:

ActiveFedora: There's no interface for "here's a nested resource, build out the hash URIs and handle this for me please." I can't imagine how to write one, either. I could see this working out with something lower level, IE a Fedora persister which directly integrates with LDP, but I don't think that's an option ATM. Maybe the solution here is to reach out to those institutions have implemented this and see what they've done, so we can at least have a compatibility layer.

Solr: There is no such thing as nesting. You can add "child documents", but they're indexed independently, require an ID, and don't have the same lifespan as their parents (https://issues.apache.org/jira/browse/SOLR-6096).

So I'm inclined to say we either:

Figure out how to get those two adapters to do nested objects.
Find a workaround - IE, recommend explicitly creating those nested objects as independent things and coming up with a good way for form objects to handle that.
Start pushing for the solutions which were easy (I don't think this is politically feasible)
Declare the experiment a failure because of the difficulty of abstraction of nested resource behavior.

Add support for `member_ids` having strings and not just local URIs

Supporting multiple types per property will be important for use cases such as controlled in-repository terms. Need some way to distinguish "3" as a remote ID from "3" the string.

Identify MVP Use Cases

What features in an example repository which, when fulfilled, mean this is a viable pattern for Hydra (and/or Hyrax)?

Ideas I'd like feedback on:

Collections
Access controls
File upload/download
Derivative generation
1. Storage of derivatives in any of the multiple backends.
Re-ordering of membered resources
Custom work types.
1. I don't think we need to write a generator, but a set of steps to add a new one which could be programmatic.
Load a pre-existing AF model from Hyrax into a Valkyrie model.
1. This would be the ideal migration strategy - in that there's no data migration.

Documentation standards?

The shared specs are a good step, but as we start to solidify some of the interfaces we should probably find a good way to add proper documentation.

Virtus gem not actively maintained

Piotr Solnica, the main dev behind virtus wrote this comment a while ago: https://www.reddit.com/r/ruby/comments/3sjb24/virtus_to_be_abandoned_by_its_creator/ and there hasn't been much activity on the gem recently.

This might not be an issue and Virtus might be stable enough for our needs, but we might have to eliminate Virtus at some point.

Determine Prototype Storage Adapters

Now there's two: Disk & Memory.

I think we'll need at least three for a valid prototype:

Disk
Memory
Fedora

In the future I'd like to look at

AWS
Content-Addressable Storage (Whether this is a disk-offshoot that stores based on fixity, IPFS, or both, I dunno.)

Document memory adapters, link to them from other adapters.

File metadata on storage adapters

How do we store mime_type and filename? In Fedora these are stored with the binary.

Determine prototype adapters.

Right now there's four - Memory/Postgres/Fedora/Solr.

I think I'd like to keep Memory/Postgres/Fedora working at least. If Solr doesn't, it's not the end of the world. If it does, it might make some interactions easier. There may be ongoing problems with supporting multiple data-types though.

schema_migrations_table_name is deprecated

DEPRECATION WARNING: schema_migrations_table_name is deprecated and will be removed from Rails 5.2 (called from block (2 levels) in <top (required)> at /Users/jcoyne/workspace/valkyrie/spec/support/database_cleaner.rb:4)

samvera / valkyrie Goto Github PK

valkyrie's People

Contributors

Stargazers

Watchers

Forkers

valkyrie's Issues

Recommend Projects

Recommend Topics

Recommend Org