Comments (8)
@chrisdaaz Looking over the XML DTD (can be accessed at https://secure.etdadmin.com/dtds/etd.dtd) the element DISS_access_option
doesn't specify the options (unlike embargo code, e.g. embargo_code (0 | 1 | 2 | 3 | 4) "0"
):
<!--
This element contains the text of the selected access option.
For example "Open access", "Campus use only", etc.
-->
<!ELEMENT DISS_access_option (#PCDATA)>
Is it possible that there is only a limited number of actual values used for DISS_access_option
that we could use to set visibility? If so, a mapping would help, i.e.
Open Access -> open
Campus use only -> authenticated
??? -> restricted
from arch.
Hey @chrisdaaz We have code written for the change for new dissertations.
We're trying to figure out how many of these it may have effected.
We don't have the source files They lifecycle out. Can you download of them? We think we may be able to do some fancy grepping to figure out what we're dealing with.
from arch.
I ran a report and found 771 "Campus use only" dissertations that are likely in Arch. Here's the report.
When you look at the report, the first column ID
refers to a value we put into "Alternative Identifier". For example, a dissertation with an ID of 15484
would map to http://dissertations.umi.com/northwestern:15594
in the Arch record.
From the report, I can tell that there are two options available for DISS_access_option
:
Open Access -> open
Campus use only -> authenticated
There are also blanks which would mean not applicable -- do nothing.
Another thing: all dissertations added before the batch ingest feature was available will not have that "Alternative Identifier", so the ID field in the report won't help us. Can we match by Title?
from arch.
Ok @chrisdaaz I wrote a script to generate a new CSV to determine if we can use titles to find all the dissertations. Here's the script I ran (for future reference if needed):
s3 = Aws::S3::Client.new
resp = s3.get_object(bucket: "stack-p-arch-dropbox", key: "titles_names.csv")
csv = CSV.parse(resp.body.string, headers: true, header_converters: :symbol, liberal_parsing: true)
csv_string = CSV.generate do |new_csv|
csv.each.with_index(1) do |row, index|
gw = GenericWork.where(title: Array(row[:title]))&.first
match = gw&.creator&.present? ? gw.creator.any? { |c| c.include?(row[:student_last_name])} : false
new_csv << [index, gw&.id, row[:title], row[:student_last_name], match]
end;nil
end; s3.put_object({acl: "authenticated-read", body: csv_string, bucket: "stack-p-arch-dropbox", key: "title_matches.csv"})
The output csv is at s3://stack-p-arch-dropbox/title_matches.csv
if you want to download it and take a look. If there is a title match the second column should contain the Arch ID for the dissertation (blank means no match, but it could be for a number of reasons including funky character encodings. There are 45 total that didn't match the title query). The last boolean column is a check to see whether the last name in the Proquest spreadsheet is part of any of the creators' names in the record found by title (I hope that sentence is understandable).
from arch.
@bmquinn moving into in progress. Toss points on it at some point.
from arch.
@bmquinn wondering what your thoughts are about this idea: what if we applied authenticated access restrictions on filesets while keeping the works public?
authenticated
works in Arch are not discoverable from Google or NUsearch or Arch's browse/search features. They require the user to login before they can find and access a work and its files. Users must somehow how know a work exists in Arch before they can access it.
Users who can access works via NetID authentication currently have no way of discovering dissertations via Google or NUsearch. I wonder if the following scenario could be done programmatically:
- Find dissertations that have "Campus use only" values in their ProQuest XML metadata
- Change the visibility of those Works to Public
- Change the visibility of those Works's Filesets to Northwestern
This would signal to campus (via Google/ NUsearch indexing) that Arch has dissertations that may be relevant to their research. When they visit the public Work record in Arch and attempt to download the dissertation PDF, they will be prompted to Login with NetID. Does this make sense?
As we discussed, you might not be able to find every dissertation in Arch via the script, so I can check on those remaining dissertations manually.
from arch.
Please add your planning poker estimate with ZenHub @bmquinn
from arch.
Hi @chrisdaaz I've been doing some dry-run testing of the script I've written to fix these, but I have a quick question before I hit "go". There are 17 works in the batch of 770 that have FileSets in addition to the PDF with the ProQuest id e.g. XXXX_1234.pdf
(a range of types including video, documents, images, etc.). Should I set the visibility on those the same as the "main" one or leave their visibility as-is? Thanks!
from arch.
Related Issues (20)
- Fix Arch resourcelist to use https within the loc nodes HOT 2
- Arch is not generating DOIs
- Dependabot couldn't find the branch dependencies
- Create Dataset Work Type HOT 5
- Log in Errors: could not authenticate from %{kind} because "%{reason}" HOT 1
- Do URLs for file downloads ever change? HOT 2
- Arch - Revisiting the Home Page HOT 3
- Dependabot cannot be automatically upgraded to GitHub-native security updates HOT 1
- Hyrax Dataset Testing HOT 3
- Download All Files (.zip) Hangs on "Preparing Download" HOT 3
- Add Help Text to "Select Type of Work" modal HOT 2
- "Deposit Your Work" nav link resolves to dashboard except on homepage HOT 2
- Hints for Dataset Descriptions HOT 2
- Dataset: Rename "Identifier" to "Alternate Identifier" and remove new "Alternate Identifier" field HOT 1
- File sets (intermittently) not attaching to works in Arch production HOT 21
- Display Dataset DOI as Link HOT 1
- Add JSON-LD to public Dataset Work view with Schema.org syntax HOT 1
- Configure dependabot on ARCH
- Catalog Sorting Displays "You are not authorized..." message HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arch.