Severity Is the production site running? <ul class="contains-t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ok <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Please <a href="https://app.zenhub.com/workspaces/RepDev-Planning-5cc36e1062872c3f34d8

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

"Campus use only" releases dissertation to the public about arch HOT 8 CLOSED

chrisdaaz commented on July 17, 2024

"Campus use only" releases dissertation to the public

from arch.

Comments (8)

bmquinn commented on July 17, 2024

@chrisdaaz Looking over the XML DTD (can be accessed at https://secure.etdadmin.com/dtds/etd.dtd) the element DISS_access_option doesn't specify the options (unlike embargo code, e.g. embargo_code (0 | 1 | 2 | 3 | 4) "0"):

<!--
  This element contains the text of the selected access option.
  For example "Open access", "Campus use only", etc.
-->
<!ELEMENT DISS_access_option (#PCDATA)>

Is it possible that there is only a limited number of actual values used for DISS_access_option that we could use to set visibility? If so, a mapping would help, i.e.

Open Access -> open
Campus use only -> authenticated
??? -> restricted

from arch.

davidschober commented on July 17, 2024

Hey @chrisdaaz We have code written for the change for new dissertations.
We're trying to figure out how many of these it may have effected.

We don't have the source files They lifecycle out. Can you download of them? We think we may be able to do some fancy grepping to figure out what we're dealing with.

from arch.

chrisdaaz commented on July 17, 2024

@bmquinn @davidschober

I ran a report and found 771 "Campus use only" dissertations that are likely in Arch. Here's the report.

When you look at the report, the first column ID refers to a value we put into "Alternative Identifier". For example, a dissertation with an ID of 15484 would map to http://dissertations.umi.com/northwestern:15594 in the Arch record.

From the report, I can tell that there are two options available for DISS_access_option:

Open Access -> open
Campus use only -> authenticated

There are also blanks which would mean not applicable -- do nothing.

Another thing: all dissertations added before the batch ingest feature was available will not have that "Alternative Identifier", so the ID field in the report won't help us. Can we match by Title?

from arch.

bmquinn commented on July 17, 2024

Ok @chrisdaaz I wrote a script to generate a new CSV to determine if we can use titles to find all the dissertations. Here's the script I ran (for future reference if needed):

s3 = Aws::S3::Client.new
resp = s3.get_object(bucket: "stack-p-arch-dropbox", key: "titles_names.csv")
csv = CSV.parse(resp.body.string, headers: true, header_converters: :symbol, liberal_parsing: true)

csv_string = CSV.generate do |new_csv|
  csv.each.with_index(1) do |row, index|
    gw = GenericWork.where(title: Array(row[:title]))&.first
    match = gw&.creator&.present? ? gw.creator.any? { |c| c.include?(row[:student_last_name])} : false
    new_csv << [index, gw&.id, row[:title], row[:student_last_name], match]
  end;nil
end; s3.put_object({acl: "authenticated-read", body: csv_string, bucket: "stack-p-arch-dropbox", key: "title_matches.csv"})

The output csv is at s3://stack-p-arch-dropbox/title_matches.csv if you want to download it and take a look. If there is a title match the second column should contain the Arch ID for the dissertation (blank means no match, but it could be for a number of reasons including funky character encodings. There are 45 total that didn't match the title query). The last boolean column is a check to see whether the last name in the Proquest spreadsheet is part of any of the creators' names in the record found by title (I hope that sentence is understandable).

from arch.

davidschober commented on July 17, 2024

@bmquinn moving into in progress. Toss points on it at some point.

from arch.

chrisdaaz commented on July 17, 2024

@bmquinn wondering what your thoughts are about this idea: what if we applied authenticated access restrictions on filesets while keeping the works public?

authenticated works in Arch are not discoverable from Google or NUsearch or Arch's browse/search features. They require the user to login before they can find and access a work and its files. Users must somehow how know a work exists in Arch before they can access it.

Users who can access works via NetID authentication currently have no way of discovering dissertations via Google or NUsearch. I wonder if the following scenario could be done programmatically:

Find dissertations that have "Campus use only" values in their ProQuest XML metadata
Change the visibility of those Works to Public
Change the visibility of those Works's Filesets to Northwestern

This would signal to campus (via Google/ NUsearch indexing) that Arch has dissertations that may be relevant to their research. When they visit the public Work record in Arch and attempt to download the dissertation PDF, they will be prompted to Login with NetID. Does this make sense?

As we discussed, you might not be able to find every dissertation in Arch via the script, so I can check on those remaining dissertations manually.

from arch.

kdid commented on July 17, 2024

Please add your planning poker estimate with ZenHub @bmquinn

from arch.

bmquinn commented on July 17, 2024

Hi @chrisdaaz I've been doing some dry-run testing of the script I've written to fix these, but I have a quick question before I hit "go". There are 17 works in the batch of 770 that have FileSets in addition to the PDF with the ProQuest id e.g. XXXX_1234.pdf (a range of types including video, documents, images, etc.). Should I set the visibility on those the same as the "main" one or leave their visibility as-is? Thanks!

from arch.

"Campus use only" releases dissertation to the public about arch HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent