Giter Site home page Giter Site logo

data-myr's Introduction

I'm Luca, and these are my repositories.

I work as a PhD student at the University of Turin, Italy in the Complex Systems for Life Sciences program. I'm a bioinformatician, working with everything genomic and transcriptomic related.

I'm an author and the maintainer for (almost) all packages in my laboratory's organization.

If you want to learn more about me, take a look at my ABOUT page. It contains more information on what I do, as well as my Curriculum Vitae and publication lists.

I really like collaborating with others. If you want to collaborate, check the CONTRIBUTING files in the repository for the project you want to contribute to, or contact me directly!

I administrate bioinfo.ds, a bioniformatics-centered community on Discord.

๐Ÿ“จ Contact Me

All my contact information lives in my linktree. I strive to keep it up-to-date!

My projects

This is a (possibly incomplete) list of my finished, abandoned (๐Ÿ’€) and current (๐Ÿšง) projects. I look at this when I want to feel good about myself.

๐Ÿ”จ Tools

  • Kerblam! and Kerblam usage examples: a Rust tool / project management system that makes your life easier when dealing with data analysis.
  • Metasplit: a Python tool to break up (large) CSV files column-wise based on an external metadata file.
  • Panid: a Python tool to convert between different IDs (e.g. ensembl, HGNC, etc...).
  • Fast Cohen's D: a small Rust tool to calculate Cohen's D.
  • BioTEA: a containerized Python tool to analysi microarray expression.

๐Ÿ“– Resources

๐Ÿ“ฆ Libraries

  • Bonsai a Python library to handle Tree like data structures. It's very easy to use.

๐Ÿ”ฌ Data analyses

๐Ÿ“ Papers and preprints

๐Ÿ›ธ Others

data-myr's People

Contributors

dependabot[bot] avatar mrhedmad avatar

Stargazers

 avatar  avatar

data-myr's Issues

Move error handling to topmost layer

The error handling, especially the validation errors, should be accessible from the topmost layer, in the case (which is very probable) that the functions are used as a library.

I suggest creating an error type with a slot for the validation errors, and raising it. myr should catch it when main is called and handle it with pretty parsing. If myr is used as a library, then the error can be caught and the validation errors can be handled as the programmer wishes.

Reusing existing standards in the specification

Hi @MrHedmad,

I really like the idea of myr and it looks promising. My experience with ro-crate has been similar in the past.
I had a few questions / suggestions regarding the specification.

The contents of "specification" in metadata.json look a lot like jsonschema. Is there any reason not to reuse [a subset of] jsonschema directly? This would allow reusing existing validators. From your example, I think the jsonschema may look something like this, do you think it is too verbose / obscure for users?

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of a real person."
        },
        "ORCID": {
          "type": "string",
          "pattern": "^\\d{4}-\\d{4}-\\d{4}-\\d{4}$",
          "description": "An ORCID id."
        },
        "email": {
          "type": "string",
          "format": "email",
          "description": "An e-mail address."
        }
        "type": {
          "type": "string",
          "enum": ["person"]
        }
      },
      "required": ["name", "type"]
    },
    "file": {
      "type": "object",
      "properties": {
        "path": {
          "type": "string",
          "description": "The path to the file."
        },
        "MIME_type": {
          "type": "string",
          "description": "The MIME type of the file."
        },
        "author": {
          "$ref": "#/definitions/person",
          "description": "The author of the file."
        },
        "date": {
          "type": "string",
          "format": "date",
          "description": "The date the file was created."
        },
        "type": {
          "type": "string",
          "enum": ["file"]
        }
      },
      "required": ["path", "MIME_type", "type"]
    }
  }
}

The concept of remote keys being expanded in frozen copies is really nice. It seems this can also be done with jsonschema using pointers but I've never tried it.

Alternatively, since you mention schema.org, what do you think of json-ld / rdf? In the README, you mentioned

Then, once (and if) a global standard is defined, you can migrate your data to that standard (in some way).

RDF can help with this and there are even tools like SSOM that make this easier. RDF / json-ld is a bit hard to understand / write and so may not be a good solution for data entry but might be interesting as an export format for frozen bundles? Not sure if that would make sense.

Maybe all that is too complex / out of scope, just wanted to hear your thoughts on it :)

Add code coverage reports

It's really easy to add code coverage reports when using Pytest (see here). Why not add an automatic report of code coverage for data-myr too?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.