nulib / donut Goto Github PK
View Code? Open in Web Editor NEWDigital Object Northwestern University Toolkit
Digital Object Northwestern University Toolkit
Related to #53
Relates to #89
We probably shouldn't rely on Harvard for hosting this zip file since it stopped working for us last week.
Done looks like:
Fits zip file uploaded to an S3 bucket, and .ebextensions/01_packages.config
updated to point to our hosted version.
Rake task should take a CSV and fire off the actual ingest for each row.
Ensure that all the columns that need to be ingested (note not all columns need to be ingested, such as BE and BF) map to something in the Images resource for DONUT.
Spawn tickets to expand resource as needed. The tickets can be done in a later phase.
Here's a good example: http://donut.repo.rdc-staging.library.northwestern.edu/concern/images/cee2e75c-1d2e-4551-a46b-661878aa9b5d?locale=en#?c=0&m=0&s=0&cv=0&xywh=-783%2C-58%2C2588%2C1137
The images show up in the universal viewer, but there aren't any representative images showing up on the #show page.
I'm seeing this error in the logs:
I, [2018-01-23T18:23:09.387540 #26275] INFO -- : [239a4161-c3be-4b85-a51b-0e01a357da65] Started POST "/" for 127.0.0.1 at 2018-01-23 18:23:09 +0000
D, [2018-01-23T18:23:09.427695 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65] Load LDP (21.5ms) http://fcrepo.repo.vpc.rdc-staging.library.northwestern.edu/rest/bb/52/0f/f8/bb520ff8-9d94-47db-9107-cd0b275b9ad0 Service: 47398585891020
D, [2018-01-23T18:23:09.493206 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65] Hyrax::Operation Load (1.8ms) SELECT "curation_concerns_operations".* FROM "curation_concerns_operations" WHERE "curation_concerns_operations"."id" = $1 LIMIT $2 [["id", 167], ["LIMIT", 1]]
F, [2018-01-23T18:23:09.495636 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496104 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] ActiveRecord::RecordNotFound (Couldn't find Hyrax::Operation with 'id'=167):
F, [2018-01-23T18:23:09.496198 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496325 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] activerecord (5.1.4) lib/active_record/relation/finder_methods.rb:343:in `raise_record_not_found_exception!'
which is weird, because i can pull up that record in the rails console. Maybe it's a race condition or something?
Anyway i'm looking into this now
refs: https://github.com/nulib/next-generation-repository/issues/352
Per our workflow, a work needs to be in one and exactly one admin set, our spreadsheet batch ingestion needs to have an admin set column that takes an admin set ID.
So we don't need code for using FileUtils on dev and test environments but the aws-sdk on the production environment (which also means the aws code is never tested by ci).
Right now we're going to start with minio:
Once CreateWorkJob has successfully ingested a resource, CreateWorkJob should enqueue a cleanup job for that masterfile. This could be done via hooks or by calling out to super for CreateWorkJob and then adding in desired code.
This job should delete the file from the pending bucket (#35)
This is just good practice and we'll thank ourselves later.
After Carrick's-update-to-the-latest-hyrax branch passes and is merged I'll start fixing the warnings
Primarily for Berkeley at this point, given JSON validate it and determine if a resource can be created or not. If it cannot write error to log.
For MVP required validations are:
Upgrade CE
For items in /bin , the execute bit isn't being set properly. Right now you'll have to ssh into the eb instances and change them manually, but we should get this fixed.
Bundle update our DONUT and see what happens
When an ingest manifest spreadsheet is added to the correct S3 bucket, trigger ingest via the queue.
We don't need this feature and it's extra overhead in the test suite. Just specify the model in the CSV and that will work.
The Image worktype should have the default derivatives created, verify this occurs when using our new ingest path
As per same fix in blacklight: projectblacklight/blacklight@b88b93a
When running our new import_from_s3 script, records are being imported and show up in donut but no file derivatives are showing up. We should see the coffee and library thumbnails but we're just getting the placeholder thumbnails instead.
My guess is that this has something to do with pulling the binaries from s3 to create derivatives and making sure we're hitting the remote_files
part of the actor stack
nul-ingest
userThe user model is storing escaped email strings as ids, which seems to break things like deleting a user from a role. For example, first.last@northwestern
is getting stored as the user key but trying to delete the user from role fails with a user key not found error looking for [email protected]
.
This might would be solved by storing the netid as the user key is User.rb by changing to
def to_s
username
end
not found
errorsRead the CSV and parse each row as JSON, pass JSON to validator
(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR ROLE
As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.
Here is an example of using the relator endpoint through our local questioning authority:
http://devbox.library.northwestern.edu/authorities/search/loc/relators?q=art
Right now we're overriding CreateWithRemoteFilesActor from Hyrax so we can exclude the area where it's encoding the URL one too many times, but it's also making rubocop upset.
Since it's not our file, we shouldn't really care if it's violating rubocop rules and it should be excluded from it's checks.
Hyku's importer code has associated rspec tests, we should get them running in donut to validate our work
Right now donut requires minio running to mimic s3, have a bucket created, and then that bucket needs to be populated to test out our import feature. It requires a few files to be created in the users home directory, an environment variable or two to be set up, and the aws cli scripts have to be run manually every time minio goes up or down.
It's great that we can mimic s3 locally and help speed up dev without relying on outside sources, but it's become unwieldy and missing any one of those steps mean that your tests will fail or give you a false positive. I'm going to try and organize and automate this as much as possible
When derivatives are created successfully, make it fail and follow the failure through the logs to verify we're logging. Beside #38 where an error is written if the metadata is invalid, ensure all the other edge cases write out errors, namely:
The redis-store dependency defined in Gemfile.lock has a known high severity security vulnerability in version range < 1.4.0 and should be updated.
This kicked in with Avalon, it may impact DONUT and DONUT might be a Gemfile update.
avalonmediasystem/avalon#2702 shows the fix in Avalon
We're getting this error message when trying to create derivatives on AWS
Errno::ENOENT (No such file or directory @ rb_sysopen - /var/donut-temp/hyrax/uploaded_file/file/34/<filename>.jpg)
on our EB worker instance, the /var/donut-temp
folder exists but there are no subfolders under it.
So the file from s3 isn't being copied over to a temp folder and no derivatives are being created. We need to figure out why and where it's happening
I was just testing our import from s3 job on AWS and the jobs are failing on fits:
E, [2018-01-18T17:50:40.060580 #26161] ERROR -- : [fee2029b-36e4-4401-abea-770862632455] [ActiveJob] [CharacterizeJob] [576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92] Error performing CharacterizeJob (Job ID: 576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92) from BetterActiveElasticJob(default) in 140.21ms: RuntimeError (Unable to execute command "/usr/local/fits-1.0.5/fits.sh -i "/tmp/d20180118-26161-5qzrty/coffee.jpg""
Picked up JAVA_TOOL_OPTIONS: -Xmx128m
Error: Could not find or load main class edu.harvard.hul.ois.fits.Fits
):
/opt/rubies/ruby-2.4.2/lib/ruby/gems/2.4.0/gems/hydra-file_characterization-0.3.3/lib/hydra/file_characterization/characterizer.rb:51:in `internal_call'
part of #53
We missed this one in the initial round...
(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR
As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.
Spreadsheet: https://docs.google.com/spreadsheets/d/1F35hLSD11a1mf9UTXvgAc7xKXkAixOaBwVvzYQulnkc/edit#gid=396400352
Since donut was started on hyrax before 1.0 was released (i think) there might be generated views, configs, controllers, etc that were applicable at the time of they were run, but have been refactored away or are no longer needed or any other number of things.
Carrick and I were talking about starting fresh with Hyrax 2 and bringing over our customizations and configs from donut, but we think a more appropriate time to do that will be when Hyrax 3 is released, since that'll be valkyrie based and will be significantly different than hyrax 2 anyway.
So once Hyrax 3 is released and we're ready to transition Donut to it, we should start a new rails project, run all the updated generators, and then carefully bring over our customizations and configs and refactor where needed.
Determine what controlled vocab updates implemented locally would benefit or be appropriate for Hyrax core. At a minimum, see if Authority Select works with multiple items in Hyrax core, and fix that. Up for debate whether controlled vocab mixed with Authority Select is a generic enough use case for users outside of Northwestern.
We're getting a deprecation warning: DEPRECATION WARNING: Using a dynamic :action segment in a route is deprecated and will be removed in Rails 5.2. (called from block (2 levels) in <top (required)> at /home/travis/build/nulib/donut/config/routes.rb:15)
here: https://github.com/nulib/donut/blob/deploy/staging/config/routes.rb#L15
We should refactor this sooner rather than later, but Carrick and I weren't sure what the new syntax was and didn't want to spend all day on it. I'm putting in this issue as a reminder that we'll need to change this before rails 5.2 is released (which is kind of soon)
URL encoding is handled differently between Minio and S3. This is being handled by checking the Rails environment now, but that is not ideal.
refs: https://github.com/nulib/next-generation-repository/issues/415
We need to create two new properties for technical metadata to store the entire exif hash and the version of the exiftool.
Also document this so people can parse this hash later.
The specs from hyku run in donut successfully now, but not all of them are passing yet. We should get them all green (we may have to modify some of the specs because we aren't going to be using filesystem based ingestion)
From the spreadsheet in https://github.com/nulib/next-generation-repository/issues/352
Add these properties following the pattern in: #158
See the exif.rb model, fix the two todos for the ns ones
batch uploads are creating derivative images locally, but aren't when we run it on staging. Look into why! We know this was working before, the Import URLs were being double encoded in a way that was easy to fix. We had it working using Minio in local dev environments.
refs: https://github.com/nulib/next-generation-repository/issues/352
Possibly a shared spec? Something to exercise the model and ensure the attributes are set and solrized.
There's always a lot of noise in the specs and this is a pretty easy fix
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.