For each archive, we need a standard way to record some metadata with the archive. At

👎 on frontmatter. i think it confuses most people. a <code class="notr

👎 on frontmatter. i think it confuses most people. <p

👎 Yeah... I'm not drinking the NodeJS kool-aid ;) <p d

(1GB+ of ram for a text editor, seriously?). <p dir="au

Archive metadata and licensing --> js discussion about archives HOT 16 CLOSED

ipfs-inactive commented on August 18, 2024

Archive metadata and licensing --> js discussion

from archives.

Comments (16)

davidar commented on August 18, 2024

👍

However, instead of inventing our own format, ideally we could use an existing standard. For example:

from archives.

davidar commented on August 18, 2024

I think we should separate metadata into two categories:

machine-readable only, such as timestamps and hashes, which human end-users aren't likely to care about
both machine- and human-readable, such as descriptions and licenses

For (1) I'm perfectly happy to just dump a (hidden) .metadata.json in the root directory, with whatever format is used by the tool used to update the archive.

For the second, I think we should use either (or both):

human-readable HTML + machine-readable tags
human-readable Markdown (or similar) + machine-readable YAML header (like Jekyll uses)

under the conventional README and LICENSE filenames. Personally I'm in favour of Markdown+YAML, and we can include a copy of the markdown viewer webapp

To answer your other questions:

Should the metadata include maintainer information?

Yes, I'd say to include this in the license: e.g. "Original source blah, processed and uploaded to IPFS by blah"

Should the metadata include the script/tool that was used to sync/update the archive? might be useful is the current maintainer goes away

Definitely, I think it's even been suggested to put a copy of the tool within the archive itself. IPFS de-duplication means this has no more overhead than a link.

from archives.

eminence commented on August 18, 2024

What would be the purpose of including hashes? IPFS itself will ensure data integrity.

Something like Markdown or YAML sounds find. I'd rather not use HTML, because HTML is not very friendly if you don't have a web browser to render it

from archives.

davidar commented on August 18, 2024

What would be the purpose of including hashes? IPFS itself will ensure data integrity.

Some protocols (like rsync) supporting checking the hash of a remote file to see if it has changed. I'm basically talking about any metadata that the update tool can use internally to make its job easier.

Something like Markdown or YAML sounds find. I'd rather not use HTML, because HTML is not very friendly if you don't have a web browser to render it

Agreed. Specifically I'm proposing something like:

README.md:

---
title: arXiv
source: http://arxiv.org/
authors:
  - arXiv contributors
  - IPFS archivists
updated: 2015-03-14
---
This is a mirror of the [Creative Commons](http://creativecommons.org)
subset of [arXiv](http://arxiv.org).

Yada yada

LICENSE.md:

---
license: http://creativecommons.org/licenses/by-sa/3.0/
title: CC-BY-3.0
morePermissions: blah
attributionURL:
  - http://arxiv.org
  - http://ipfs.io
attributionName: arXiv, IPFS
---
You are free to:

    Share — copy and redistribute the material in any medium or format
    Adapt — remix, transform, and build upon the material 

Yada yada

We also need to account for the fact that archives may have different licenses for different parts, in which case I'd suggest placing a separate LICENSE file into the relevant directories.

@eminence @jbenet Thoughts?

from archives.

jbenet commented on August 18, 2024

👎 on frontmatter. i think it confuses most people.
a package.jsonld based on OKFN's or npm's would probably work well.
we should try to use existing formats here if possible
typical to include purely verbatim license files

from archives.

davidar commented on August 18, 2024

👎 on frontmatter. i think it confuses most people.

Fair enough. I meant it as a more readable alternative to the HTML microformats recommended by Creative Commons (and many others), which are even more confusing.

a package.jsonld based on OKFN's

👍 Thanks, that looks even better.

or npm's would probably work well.

👎 Yeah... I'm not drinking the NodeJS kool-aid ;)

we should try to use existing formats here if possible
typical to include purely verbatim license files

Sorry, I should have provided a reference, as I'm not the first person to propose something like this:

http://blog.martinfenner.org/2013/06/29/metadata-in-scholarly-markdown/

but I agree that it's not exactly widespread (yet :).

from archives.

jbenet commented on August 18, 2024

👎 Yeah... I'm not drinking the NodeJS kool-aid ;)

Well, the OKFN data-package.json is directly derived from npm's package.json.

It turns out that node is one of the best programming systems out there, thanks to npm. npm got so much extremely right. The assumption that "it's js, it has to be bad" is so absurdly wrong. It beats go get/vendor, cabal, gem, and so on. cargo promises to be on the ballpark, mostly because it copied npm in all the important things.

http://blog.martinfenner.org/2013/06/29/metadata-in-scholarly-markdown/

The problem with frontmatter is that it makes processing the files very annoying, particularly in APIs. I like it as a writer, but not a programmer.

from archives.

davidar commented on August 18, 2024

I know this isn't the right place for this discussion, but I'll bite. I haven't used npm much, so I may be missing something, but looking at the spec, nothing particularly novel jumps out at me. It just looks like all the standard package fields, but in JSON.

Don't get me wrong, JavaScript actually ranks reasonably highly on my list compared to a lot of alternatives. But this current trend that JavaScript is the solution to every problem, and somehow solves it better than every other programming language, is frankly ridiculous. People complain about Haskell monads being painful, and yet callback hell is the best thing since sliced bread. Green threads have been around for a long time, and other languages have done a lot more in getting concurrency right. Don't even get me started on atom (1GB+ of ram for a text editor, seriously?).

from archives.

whyrusleeping commented on August 18, 2024

(1GB+ of ram for a text editor, seriously?).

~17KB baby! anything more is bloat.

whyrusl+  5126  4.6  0.2 198740 16996 pts/3    S+   10:37   0:00 vim repo/fsrepo/fsrepo.go

from archives.

eminence commented on August 18, 2024

+1 on the data-packages format. My original proposal is fairly similar to this, so it matches up pretty well with what I had in mind

from archives.

davidar commented on August 18, 2024

@eminence @jbenet Ok, so I'm thinking we should have:

an OKFN datapackage.json file,
a verbatim LICENSE file (either in the top-level directory, or sub-directories in the case of multiple licenses), and
a standard README(.md) file containing any lengthy descriptions, etc.

from archives.

jbenet commented on August 18, 2024

@davidar SGTM.

And, not talking about javascript. Talking about npm. This is inconsistent:

👎 Yeah... I'm not drinking the NodeJS kool-aid ;)
👍 Thanks, that looks even better.
the OKFN data-package.json is directly derived from npm's package.json.

The point is that the statement "not drinking the <THING> kool-aid" is typical of actively ignoring whatever <THING> is, including anything that may be good and valuable, instead of studying <THING> and dismissing the provably bad parts. I'm really tired of the js-hate, particularly when people make inconsistent or uninformed claims, like dismissing npm without even trying it, or understanding why it is well designed. It is similar to the dismissal that haskell gets from the "hardcore C/C++ systems people" (i.e. because they've not taken the time to understand it).

anyway, yep. not really worth discussing here.

from archives.

davidar commented on August 18, 2024

The point is that the statement "not drinking the kool-aid" is typical of actively ignoring whatever is, including anything that may be good and valuable, instead of studying and dismissing the provably bad parts.

@jbenet Alright, I apologise for my wording, I could have phrased it better. For the record, I would have been equally as dismissive about using Python's packaging format for this, or Debian's, or whatever, simply because software packaging and data packaging are different problems. In any case, I can approve of a data packaging format which happens to be derived from a small part of NPM without necessarily approving of NPM as a whole. It's not that I dislike NPM in particular, I just don't see the relevance to data packaging in comparison to any other software packaging format.

Also, despite what people seem to think, I don't hate JS/NPM anymore than I hate Python/PyPI (which I use quite often). What I do hate is when people try to apply them to things outside of the domain in which it makes sense to do so ("when all you have is a hammer, everything looks like a nail"). My motto is "all programming languages suck, but some suck less than others in specific circumstances". JS is a good choice for some problems, for others it sucks (e.g. atom, IMHO). Haskell is good for some things, for systems programming it sucks. Python ... you get the idea.

In terms of NodeJS (and last I checked NPM is an official subproject) in particular, it's kind of the embodiment of applying a language to a problem it was never meant to solve. If it were marketed as a scripting language (in the same category as Python), then I wouldn't have a problem with it, but a lot of people make far more overzealous claims about it (yes, it's better than PHP, but there's a lot of other languages that are better still). It would be like trying to run Perl or Fortran in a web browser (please tell me nobody has tried that :). As a result, it makes me automatically skeptical of assertions about it's superiority without any supporting facts. You keep telling me NPM is well-designed, and I've used it a little and tried researching myself to understand what you mean, but I'm not seeing anything all that special TBH. Like I said, please elaborate if you think I'm missing something.

Anyway, those were the thoughts I was trying to convey in my somewhat flippant remark :)

from archives.

rht commented on August 18, 2024

~17KB baby! anything more is bloat.

neovim ~11KB

from archives.

rht commented on August 18, 2024

Consider using spdx for license parsing, see ipfs/kubo#337.

from archives.

jbenet commented on August 18, 2024

I reopened as #45 since this turned into js ~~bikeshedding~~ discussion

from archives.

Archive metadata and licensing --> js discussion about archives HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent