vc1492a / mieda Goto Github PK
View Code? Open in Web Editor NEWMerging of set-containing Intervals Efficiently with a Directed-graph Algorithm
License: Other
Merging of set-containing Intervals Efficiently with a Directed-graph Algorithm
License: Other
The contribution guidelines do not provide any information as to branching strategy or how to submit pull requests that add new features or address known issues. Revise the contribution guidelines so that it's clear to developers / users as to how they can help.
Line 103 in 970be37
While MIEDA uses break
to stop processing when pairs have been updated, ensuring some scalability, it would be beneficial to understand more thoroughly how many iterations are needed to update the intervals during / prior to iteration. This could ensure greater scalability.
The current readme.md
illustrates an example of how a user can return a NetworkX graph after merging the input intervals together, but does not illustrate how to specify a different key
other than set_items
.
E.g.
from mieda.intervals import Merge
intervals = [
{"start": "A", "finish": "D",
"sensors": {"1"}},
{"start": "A", "finish":"C",
"sensors": {"2"}}
]
print(Merge.union(intervals=intervals, key="sensors"))
It would be beneficial to include this in readme.md
to illustrate to users than using an alternative key is possible.
Specifying an incorrect input key
results in a TypeError
in the current implementation that is on the dev
branch. This isn't helpful to users and does not provide an indication to a potential user as to how to correct the issue.
When an incorrect key
is specified, it would be prudent to return a helpful response to the user that helps them correct the input data or parameters specified in union()
. A unit test should be developed that ensures the proper warning is issued when in this scenario.
When using MIEDA's main approach on real-world data, an issue has been encountered whereby sets of many overlapping intervals are not merged appropriately. While the cause of the issue is still unknown and warrants further investigation, it seems to occur on larger datasets where there are many sets of intervals which begin and/or end at the same time. The issue does not occur when using MIEDA's alternative interval balancing approach.
Importantly, this issue was not captured by our existing unit tests and thus our current tests should be checked and, if it makes sense, new tests written to capture the root cause's scenario in conjunction with providing the appropriate fix to the code base.
It would be beneficial to setup documentation for the library that exists outside of the readme. This would ensure functionality remains well documented and easy to understand for users.
Create a data structure of always-split intervals with basic operations for:
A progress bar would be a nice option to have as part of the Merge
class so that users can gain an understanding of the length of time needed to perform the merge operation.
The interval balancing approach sorts and de-conflicts intervals in the same asymptotic time no matter whether the list of intervals was pre-sorted or not. While this isn't a problem on most reasonably sized lists of intervals, it leaves room for improvement when scaling to large datasets. A digraph structure could be theoretically used to take advantage of pre-sorting and improve the efficiency of the update operation on an already sorted and de-conflicted set of intervals.
This issue depends on mieda having a data structure to update, as mentioned in #10.
MIEDA currently supports passing datetime objects for the start
and finish
indices of the intervals passed for merging. Adding support for integers, strings, and perhaps other types of objects would expand its utility.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.