Giter Site home page Giter Site logo

catalogue-data's Introduction

IDN Catalogue Data

This repository contains part of the data of the Indigenous Data Network (IDN)'s Knowledge Graph which is delivered online via the Prez system as a series of catalogues and reference datasets, such as spatial data collections and vocabularies.

The IDN Prez system is online at:

IDN Catalogues and Datasets

The IDN is producing multiple systems and datasets:

  1. Demonstration Catalogue of Australian datasets

    • with varying levels of indigenous relevance to demonstrate several aspects of indigenous data governance, sovereignty and how to even rate the "indigenous-ness" of data in the first place.

  2. Agents Database

    • containing information about Agents - People and Organisations - that have some relation to indigenous data

  3. University of Melbourne’s Indigenous Data Catalogue

    • this is currently (May, 2023) empty but will fill shortly

  4. Register of vocabularies

    • multiple vocabularies, all assembled and some created, by the IDN that support modelling indigenous data

  5. Indigenous spatial reference data

    • indigenous language, land use, treaty and other areas

    • all from other sources, attributed in the data

Additionally, the IDN will support a catalogue of ANU's indigenous data underdevelopment by ANU’s First National Portfolio that’s not online yet.

This repository contains only some of those system’s data, see next.

This repository’s content

This repository contains:

  • Demonstration Catalogue items' metadata

    • metadata entries for the catalogued resources in data/democat/

  • the vocabularies within the IDN’s Register of vocabularies

    • within data/vocabularies/

  • background ontologies used to provide labelling for Prez' data

    • within data/_background/

  • IDN Prez system metadata

    • within data/system/

    • defines things like the multiple IDN catalogues, system labels etc.

Also:

  • data/unpublished/ contains data either previously published and removed but not deleted as it may be used again

Stored elsewhere are:

  • Agents Database content

    • some test data is stored here in but the Agents DB is building/storing its own data within it

    • see the AgentsDB data repository

  • Indigenous spatial reference data

    • some of these datasets are large so their raw content isn’t directly available

    • see the repo https://github.com/idn-au/spatial-data for a listing of the datasets and instructions on how they are produced

(Meta)Data Models

The metadata of items in the Demonstration Catalogue and all other catalogues based on IDN work - the UoM IDCat and the ANU’s FNP’s future catalogue - use the IDN Catalogue Profile which is a data cataloguing standard based on DCAT.

Agents data in the Agents Database are formulated according to the Agents Governance Profile.

License & Rights

The contents of this repository is licensed under Creative Commons 4.0 International. See the LICENSE file in the repository for details.

Contact

For technical enquiries:

Jamie Feiss
Data Infrastructure Developer
Indigenous Data Network
University of Melbourne
[email protected]

For policy:

Levi Murray
Strategic Data Manager
Indigenous Data Network
University of Melbourne
[email protected]

Owner Organisation
Indigenous Data Network
https://idnau.org

catalogue-data's People

Contributors

nicholascar avatar jamiefeiss avatar ssilcot avatar metaduck avatar recalcitrantsupplant avatar lalewis1 avatar

Watchers

 avatar  avatar

catalogue-data's Issues

Create Dummy data for RIMPA presentation

To demonstrate the use case:

Cassey is a Wurundjeri woman tasked by her community with searching the collections at the Australian National University
for data about them and ensuring that the data is held appropriately.
As a representative of a community that is a stakeholder in some of the data in the system, Cassey wants to be able to
discover data about/from her community, regardless of its particular home location and then to be able to inspect the access
policies applied to it. She then wants to be able to verify that the policies, as stated, are implemented.

  • Create the data
  • Record a demonstration of the use case scenario with the dummy data.

convert aries data < 2015

Data received from Adegboyega 26/04/2024.

To be cleaned and converted to RDF in line with the previously converted aries dump of data from 2015 onwards

push updated prez image

release v3.8.11 contains fixes from recent PRs.

Test locally with PrezUI v3.8.2
Push to Dev AWS / Terraform
Push to Prod MRC

Add Metadata for ARIES/Thesis/OAIPMH data.

need to define datasets and vocabularies for all the data that i have extracted from ANU.

clean it up and add all neccessary metadata for proper rendering in PrezUI as we will soon need to upload it all to PrezUI hosted at NCI.

Updating the indigenous-persons-organisation definition

Hi Nick - I went to create a pull-request but then realised I didn't know how to in this repo so I will submit the extended definition as an inssue instead. I have updated the definition to include the Office of the Registrar of Indigenous Corporations (ORIC) indigeneity requirement. I have also added a historyNote to show where the definition comes from and the hyperlink to the policy document. I am not sure if the code is correct but I thought I would have a go. - Margie

:indigenous-persons-organisation
a skos:Concept ;
dcterms:provenance "Created for the IDN project, 2022"@en ;
rdfs:isDefinedBy cs: ;
skos:definition "The organisation comprises indigenous persons that meet the Office of the Registrar of Indigenous Corporations (ORIC) indigeneity requirement." ;
skos:historyNote "Office of the Registrar of Indigenous Coporations, Policy Statement 11." ;
rdfs:seeAlso https://www.oric.gov.au/sites/default/files/documents/01_2022/PS-11_Indigeneity-requirement_v7-0.pdf ;
skos:inScheme cs: ;
skos:prefLabel "Indigenous Persons Organisation"@en ;

Migration of scripts from K-AI to ANU

30 Apr - Lawson to ask Jamie if the migration of the ANU prez dev to NCI is something he needs to get info about

26-Mar Boyega to bring Jamie up to speed of any work required

Briscoe-Smith Metadata

Communicate with Len and Sandra to arrive at proper metadata for the briscoe-smith archive resource.

i.e. the resource that represents the archive itself.

Use the metadata entry tool to create the RDF and gather the required details.

There may be some conflicts with the information presented in the METADATA table in the HDMS database, to be clarified with Len and Sandra.

Pointing to data and/or to landing/provenance description pages -- AUSLANG example -- for consideration

Thinking out loud.... purpose -- to be very clear and distinct about what we are showing in the catalogue.

As at 2022-08-30 1.26pm: exploring http://idn.kurrawong.net/catalog/idndc/AUSLANG

Followed the "Access address" link which resolves to a download of the actual dataset.

Checking under the hood (curl -H -v) the target url is https://collection.aiatsis.gov.au/datasets/austlang/001.csv

The "ex:home" page is https://collection.aiatsis.gov.au/datasets/austlang/001 which is the provider's landing page, providing a data dictionary, a link to a live online search service and a link to the download csv format above.

There is also this: https://collection.aiatsis.gov.au/austlang/about which the data.gov.au LANDING page falsely describes as the "complete Austlang resource" -- it is not, but it is AIATSIS's complete DESCRIPTION of the context, meaning and provenance of the dataset! In fact. one could make a strong argument that unless this is read, one really doesn't understand at all what you are looking at.

There is also data.gov.au's activity list, showing recent changes: https://data.gov.au/data/dataset/activity/austlang-dataset-001

Starting to think we might need a small vocab to encode a range of relationships between the thing our catalog entry is describing and an associated resolvable uri!

e.g.

DESCRIPTIVE relationship: Contextual information (descriptive metadata, provenance) ABOUT the cataloged "dataset":

DERIVATIVE or INSTANCE relationship: An accessible distribution of the catalogued "dataset":

  • a computer file with a particular format (in this case csv)
  • a service end point

I can even imagine a "USAGE" or "APPLICATION" type of relationship -- could point at resources in which dataset featured, or educational/capability resources in how to use it.

Open to discussion but in this PARTICULAR case (a really core "reference" dataset), I think the catalogue presentation would benefit by having multiple "Access address" entries with a clear typing of the nature of the association. Or perhaps even a way for the user to choose their "focus" ("I know what I want... just point me at the data/service" versus "what the hell is this all about, how can I use it?").

Sorry to ramble but devil in the detail here!

Plus issues with scalability... we perhaps need to be thinking about patterns in what "portals" like Trove or data.gov.au or RDA are doing? But that's another story.

Briscoe-Smith DataModel

Create a first draft data model (lucid chart diagram)

Use the RiC-O Ontology to model the data and ensure encapsulation of all primary data fields from.

INVENTORY
SERIES
ACCESSION
PROVENANCE

convert languages to objects

currently they are just comma delimited string literals. Terhi would like to see them as objects so that analysis is easier.

Digitize maps from Len

16-Apr Have talked with Liam and he is half way through
09-Apr Need to follow up with Liam to see how he is going with the work

05-Mar Check in with Liam to see if this has started

20-Feb Liam is currently handling this work

Work with Liam to digitize the maps from Len. Once we have them as shape files, we can convert to RDF

Review IDN Prez config

System:

Vocabs:

Spatial:

  • WKT data for NNTT dataset
  • Feature labels for NSW data
  • Search returns no results

Label fix steps:

Create a governance testing KG

Create a small RDF KG that lists dummy datasets, people, organisations and policies and allows demo querying of it to discover good and otherwise governance arrangements.

Flag ANU data as indigenous using a variety of techniques.

02-Apr Work is ongoing from this point

26-Mar There is now a formal process in place to make it easier to flag things against. Some techniques (English words) have been surprisingly useful:) - Flagging matches where words are not english (reverse dictionary). Outcome - a list of Indigenous Publications.

Will next search for landmarks/features/place names - particulary old names. e.g., 'Uluru' was 'Ayers Rock'

Need to expand the reference set of data used to flag works as indigenous, using the data sources provided by Adegboyega.

  • Refactor RDF outputs to use sosa:Observations
  • add flagging techniques
    • english words
    • indigenous language names
    • indigenous place names

Where formal data dictionaries are publically available, point at them directly from the catalog?

AGIL is a great dataset to start with. I note it has a formalised (but not machine readable) data dictionary here: https://data.gov.au/data/storage/f/2013-12-02T03:02:16.736Z/agildataset-management-summary-2013-11.pdf which is dated 2013! Those contents really help a potential data user understand in depth what one might be able to do with that dataset.

Eventually I would like to see the catalog capable of pointing directly at such a DD, if one exists, as a kind of "in depth" descriptive resource, and to show the date of that resource (could that be added to our profile as an optional item?).

Risks with broken links using URLs of course (but we can detect that), so perhaps consider mining a copy of those sorts of really core files (there won't be many)?

Contribute ANU researchers to Agents DB

12-Dec-23 Nick will attempt to include this in the workshop tomorrow

Nick will run through the process this afternoon (5/12/23)

Second pass underway

Filter the AERIES list of ANU people for indigenous researchers

Briscoe Smith Archive POC

28 May - work has started based on the RICO ontology [this is also what the UniMelb are going to be using]

Begin work for ANU on conversion of the Briscoe-Smith archive.

As discussed with Lenoard Smith and Sandra Silcot from ANU.
There is a piece of work to move the Briscoe-Smith archive from an Access Database to RDF and store it with the rest of the ANU Catalogue using Prez on the NCI infrastructure.

This piece of work can be broken down into three parts.

  1. Metadata modelling
    Create a suitable ontological model to support the archive and its needs. Collaboration between KurrawongAI and ANU will be needed to arrive at the destination here.

  2. Conversion
    Once an ontological model has been established, convert the data from RDB to RDF in line with the model. Some parsing of unstructured fields may be required to achieve alignment with the desired model.

  3. Metadata Entry form tooling
    The Archive will require continued additions from not-yet catalogued items. There are a number of possible solutions available to support this.

    1. An adapted version of the Metadata entry tool from the IDN project.
    2. VocExcel templates
    3. custom built data entry/management portal.

Initially, I (LL) will try to convert a sample of the archive to RDF, a rough metamodel draft can be agreed upon with refinement to happen later. The idea for this first pass will be to just get some data converted and visible in the new system (prez) so that Len and Sandra can get a feel for the process and how it might play out.

Begin ISU data enquiry in preparation for a catalogue

Review Data Maturity Model and then:
Request five datasets from ISU to categorise in spreadsheet to prototype cataloguing
Work with ISU on spreadsheet if required
Determine what ISU information is required to align with UoM catalogue refresh

How best to include DOIs/PIDs to publications derived from a dataset

These publications were derived from research using the KHRD database:

SS to confirm it is appropriate to add this to the catalog entry as per DCAT2 spec (https://www.w3.org/TR/vocab-dcat-2/#examples-dataset-publication):

dct:isReferencedBy <https://doi.org/10.4225/03/5a9779c80f529>;
dct:isReferencedBy <http://doi.org/10.1007/s12546-020-09253-x>;

Check that CatPrez will appropriately render this relationship (does not need to resolve/fetch anything).

Improve the next AIATSIS dataset

30 Apr - Need to follow up again as we are running out of FY

16-Apr Follow up with Anthony again.

Following on from the meeting will be suggestions to AIATSIS re AusLang (they need to adopt PIDS) - Thesaurus

Metadata schema for Yumi Sabe could be helpful to begin an IDN - AIATSIS task for discussion.
Could also consider having AIATSIS being the sponsor for the IDC metadata profile.

20-Feb Waiting on feedback from Anthony

Feedback on the PlaceNames gazzetteer still pending - potentially ask them what they would like to do next?

Other databases:
https://aiatsis.gov.au/research/guides-and-resources/native-title-resources/native-title-law-database

https://aiatsis.gov.au/third-national-indigenous-languages-survey-online/language-status-map-and-graph-data

https://collection.aiatsis.gov.au/austlang/about and then on data.gov.au https://data.gov.au/data/dataset/austlang-dataset-001

LL 08/01/24
Heard Back from Anthony just before Christmas but no meeting scheduled yet. Hopefully hear from him again this week.

Follow up with NCI regarding the catalogue space

4-Jun Information received and will now seek advice on the next steps. The records in the new PURE extract are around 135 000 vs 450 000 from the first extract out of AERIES.

28-May Gboyega to really follow up today

21-May Gboyega will reach out today (post Lawson's engagement with NCI) to begin planning next steps.

14-May Robert having issues with CORS requests. Lawson and Jamie to debug today. will check in with Rob to see if anything else is needed after this issue is resolved.

07-May Robert and Lawson are communicating across this. The assumption is that the PoC is up and running. Next step is to migrate from the Kurrawong infrasstructure to ANU
Lawson will chase up with an email to confirm

30 Apr - Need to follow up with Rob again. Lawson has been keeping them up to speed on the technical questions. Lawson thinks Rob has it up and running.

harvest new collections from oaipmh

As per email from Boyega.

All Metadata in ANU Open Research is open access and harvestable. This is the OAI-PMH details https://openresearch-repository.anu.edu.au/oai/request?verb=Identify that allow you to harvest the metadata.

The Full ANU Research collection contains the following, plus many more collections that may contain some relevant materials:

[NARU](https://openresearch-repository.anu.edu.au/handle/1885/9187) – This is still in progress as we digitise more ANU and NARU publications over the coming year or two
[CAEPR](https://openresearch-repository.anu.edu.au/handle/1885/114085) – This is deposited into directly from CAEPR, so should be fairly up to date with their working papers/reports etc
[NCIS](https://openresearch-repository.anu.edu.au/handle/1885/9491)
[ANU First Nations Portfolio](https://openresearch-repository.anu.edu.au/handle/1885/272442)
[ANU Research publications](https://openresearch-repository.anu.edu.au/handle/1885/26) - will have the most overlap with ARIES as we have a feed from ARIES to this collection.

The Archive and Library Collections section will also have indigenous materials that may be of interest, including photographs from researchers and the ANU Photography department, ANU Annual Reports, AV material,

[ANU Publications](https://openresearch-repository.anu.edu.au/handle/1885/238435) – digitised ANU publications from the library collection, may also contain NARU material and other relevant historical publications
[ANU Publications: Flood Replacements](https://openresearch-repository.anu.edu.au/handle/1885/207875) - digitised ANU publications from the library collection that were lost during the Chifley floods.

Troubleshoot performance issues with Fuseki

NCI Fuseki is using lots of disk space and taking ~ 20 min to start up. there may be some configuration options that are degrading performance.

Perhaps the uid field on the text index should be set? so that dropped triples can be removed from the index.

Ensure usage of example queries

A few example queries that it would be nice to ensure are working properly.

counts of languages and place names
get a list of names of supervisors ranked by total number of publications from people who they supervised.
summary of keywords for publications flagged as language words / place names / unrecognized words

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.