Giter Site home page Giter Site logo

mieda's People

Contributors

vc1492a avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

ryanstonebraker

mieda's Issues

Update the contribution guidelines

The contribution guidelines do not provide any information as to branching strategy or how to submit pull requests that add new features or address known issues. Revise the contribution guidelines so that it's clear to developers / users as to how they can help.

Include an example of using a different 'key' parameter in the readme

The current readme.md illustrates an example of how a user can return a NetworkX graph after merging the input intervals together, but does not illustrate how to specify a different key other than set_items.

E.g.

from mieda.intervals import Merge

intervals = [
    {"start": "A", "finish": "D",
     "sensors": {"1"}},
    {"start": "A", "finish":"C",
     "sensors": {"2"}}
]

print(Merge.union(intervals=intervals, key="sensors"))

It would be beneficial to include this in readme.md to illustrate to users than using an alternative key is possible.

Specifying an incorrect input key results in a TypeError

Screen Shot 2020-05-14 at 11 44 39 AM

Specifying an incorrect input key results in a TypeError in the current implementation that is on the dev branch. This isn't helpful to users and does not provide an indication to a potential user as to how to correct the issue.

When an incorrect key is specified, it would be prudent to return a helpful response to the user that helps them correct the input data or parameters specified in union(). A unit test should be developed that ensures the proper warning is issued when in this scenario.

Interval merging does not merge overlapping intervals on larger datasets

When using MIEDA's main approach on real-world data, an issue has been encountered whereby sets of many overlapping intervals are not merged appropriately. While the cause of the issue is still unknown and warrants further investigation, it seems to occur on larger datasets where there are many sets of intervals which begin and/or end at the same time. The issue does not occur when using MIEDA's alternative interval balancing approach.

Importantly, this issue was not captured by our existing unit tests and thus our current tests should be checked and, if it makes sense, new tests written to capture the root cause's scenario in conjunction with providing the appropriate fix to the code base.

Library documentation

It would be beneficial to setup documentation for the library that exists outside of the readme. This would ensure functionality remains well documented and easy to understand for users.

Create self-maintained data structure

Create a data structure of always-split intervals with basic operations for:

  • adding a new base interval
  • deleting an interval
  • checking for interval collisions (returns the tags of collided intervals)
  • looking up an interval by tag (shows original, and all sub-split intervals)

Optional progress bar

A progress bar would be a nice option to have as part of the Mergeclass so that users can gain an understanding of the length of time needed to perform the merge operation.

Explore update optimizations using a DiGraph

The interval balancing approach sorts and de-conflicts intervals in the same asymptotic time no matter whether the list of intervals was pre-sorted or not. While this isn't a problem on most reasonably sized lists of intervals, it leaves room for improvement when scaling to large datasets. A digraph structure could be theoretically used to take advantage of pre-sorting and improve the efficiency of the update operation on an already sorted and de-conflicted set of intervals.

This issue depends on mieda having a data structure to update, as mentioned in #10.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.