mrhedmad / data-myr Goto Github PK

2.0 2.0 0.0 83 KB

A way to locally manage data in a FAIR way

License: MIT License

Python 100.00%

data-science fair fair-data formats openscience

data-myr's Introduction

I'm Luca, and these are my repositories.

I work as a PhD student at the University of Turin, Italy in the Complex Systems for Life Sciences program. I'm a bioinformatician, working with everything genomic and transcriptomic related.

I'm an author and the maintainer for (almost) all packages in my laboratory's organization.

If you want to learn more about me, take a look at my ABOUT page. It contains more information on what I do, as well as my Curriculum Vitae and publication lists.

I really like collaborating with others. If you want to collaborate, check the CONTRIBUTING files in the repository for the project you want to contribute to, or contact me directly!

I administrate bioinfo.ds, a bioniformatics-centered community on Discord.

📨 Contact Me

All my contact information lives in my linktree. I strive to keep it up-to-date!

My projects

This is a (possibly incomplete) list of my finished, abandoned (💀) and current (🚧) projects. I look at this when I want to feel good about myself.

🔨 Tools

Kerblam! and Kerblam usage examples: a Rust tool / project management system that makes your life easier when dealing with data analysis.
Metasplit: a Python tool to break up (large) CSV files column-wise based on an external metadata file.
Panid: a Python tool to convert between different IDs (e.g. ensembl, HGNC, etc...).
Fast Cohen's D: a small Rust tool to calculate Cohen's D.
BioTEA: a containerized Python tool to analysi microarray expression.

📖 Resources

🚧 The Data Stewardship Knowledgebase: an aggregator of links related to data stewardship, open science and data management in general.
💀 The Handbook was meant to be the place to go to when looking for code style guides, how-tos and more.
🚧 Mimir: a policy to handle data for a research group with little resources in a FAIR way.
A tiny guide to Make, which converts to this website.
The code for my blog, which you can read here.
💀 The Data Myr specification, the precursor to Mimir.
Programming notes some notes and exercises on the usage of R.
The Membrane Transport Protein database, a database of transporters.
ICT-Bib, a database of papers about transporters.

📦 Libraries

Bonsai a Python library to handle Tree like data structures. It's very easy to use.

🔬 Data analyses

🚧 Transportome Profiler, the main analysis of my PhD thesis.
🚧 Data analysis project structure: an analysis on the structure of data analysis projects.
Turin author network: an analysis of the co-authorship network between the authors of the university of Turin.
Code used in my Master thesis.

📝 Papers and preprints

A Typst paper template to jumpstart paper writing in Typst.
A Latex paper template to jumpstart paper writing in Latex.
🚧 Code of the paper for Kerblam!

🛸 Others

My dotfiles
Milton: my Discord bot.
💀 Chip 8 implementation in Rust (which does not really work).

data-myr's People

Contributors

Stargazers

data-myr's Issues

Move error handling to topmost layer

The error handling, especially the validation errors, should be accessible from the topmost layer, in the case (which is very probable) that the functions are used as a library.

I suggest creating an error type with a slot for the validation errors, and raising it. myr should catch it when main is called and handle it with pretty parsing. If myr is used as a library, then the error can be caught and the validation errors can be handled as the programmer wishes.

Reusing existing standards in the specification

Hi @MrHedmad,

I really like the idea of myr and it looks promising. My experience with ro-crate has been similar in the past.
I had a few questions / suggestions regarding the specification.

The contents of "specification" in metadata.json look a lot like jsonschema. Is there any reason not to reuse [a subset of] jsonschema directly? This would allow reusing existing validators. From your example, I think the jsonschema may look something like this, do you think it is too verbose / obscure for users?

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of a real person."
        },
        "ORCID": {
          "type": "string",
          "pattern": "^\\d{4}-\\d{4}-\\d{4}-\\d{4}$",
          "description": "An ORCID id."
        },
        "email": {
          "type": "string",
          "format": "email",
          "description": "An e-mail address."
        }
        "type": {
          "type": "string",
          "enum": ["person"]
        }
      },
      "required": ["name", "type"]
    },
    "file": {
      "type": "object",
      "properties": {
        "path": {
          "type": "string",
          "description": "The path to the file."
        },
        "MIME_type": {
          "type": "string",
          "description": "The MIME type of the file."
        },
        "author": {
          "$ref": "#/definitions/person",
          "description": "The author of the file."
        },
        "date": {
          "type": "string",
          "format": "date",
          "description": "The date the file was created."
        },
        "type": {
          "type": "string",
          "enum": ["file"]
        }
      },
      "required": ["path", "MIME_type", "type"]
    }
  }
}

The concept of remote keys being expanded in frozen copies is really nice. It seems this can also be done with jsonschema using pointers but I've never tried it.

Alternatively, since you mention schema.org, what do you think of json-ld / rdf? In the README, you mentioned

Then, once (and if) a global standard is defined, you can migrate your data to that standard (in some way).

RDF can help with this and there are even tools like SSOM that make this easier. RDF / json-ld is a bit hard to understand / write and so may not be a good solution for data entry but might be interesting as an export format for frozen bundles? Not sure if that would make sense.

Maybe all that is too complex / out of scope, just wanted to hear your thoughts on it :)

Add code coverage reports

It's really easy to add code coverage reports when using Pytest (see here). Why not add an automatic report of code coverage for data-myr too?

mrhedmad / data-myr Goto Github PK

data-myr's Introduction

I'm Luca, and these are my repositories.

📨 Contact Me

My projects

🔨 Tools

📖 Resources

📦 Libraries

🔬 Data analyses

📝 Papers and preprints

🛸 Others

data-myr's People

Contributors

Stargazers

data-myr's Issues

Move error handling to topmost layer

Reusing existing standards in the specification

Add code coverage reports

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent