theslimvreal / pse---la-meets-ml Goto Github PK
View Code? Open in Web Editor NEWNumerical Linear Algebra meets Machine Learning - PSE Project
License: BSD 2-Clause "Simplified" License
Numerical Linear Algebra meets Machine Learning - PSE Project
License: BSD 2-Clause "Simplified" License
In order for our default loading paths to work, we need to put files (unlabled matrices, labled matrices, trained neural network) into the specified folders.
we should write a test for the new_search method from ssget to increase test coverage
When saving datasets, it should be saved with a fix name, together with the current date and time.
The command parsing still has some bugs, like an error when entering 2 spaces instead of 1.
There should be more tests testing this edge cases.
Collect and formulate nonfunctional requirements for the specification sheet
Make a list of all required/preferred technologies along with some documenation or tutorial links that help to get started.
Find out, which features of a matrix really matter for the efficiency of an algorithm (e.g. percentage of zeors, diagonal arrangement of the values, etc...)
At the moment all of our guides/documentations are in the /utils folder.
This is not very self explaining for 3rd party users.
Therefore we should start using the Wiki of our respitory. This will also create a uniform style of our guides/documentations.
What needs to be done is to move the informations contained in the /utils folder to the wiki of our repository.
The documentation created by Doxygen can now be found here.
The documentation of the code is quite good, but the surrounding page doesn't look very pretty.
It would be great if someone works himself into the Doxygen configuration to find out what can be done here.
Collect ideas and create global test cases for the specification sheet
we should test the method print predictions from the classifier including exeptions to increase test coverage
Find a licence that fits our needs (maybe BSD-2) and add a LICENSE.md file Tutorial for creating LICENSE file so the license will be shown on every file.
The time each solver took on a matrix should be saved in the hdf5 as another vector.
The collector doesn't use the size the user entered. This should be changed. Standard size should be 128 and written into the config file. The user can enter a different size.
Find good and big sparse metrices data sets for the learning and testing process of the machine learning
Find out if there are good modules for creating integration test in python.
Create about 5 slides for the colloquium for the specification sheet
Collect and formulate functional requirements for the specification sheet
In order to test our code that uses the Ginkgo library, we need the Ginkgo environment.
But because it's very difficult to set up Ginkgo, we could try to run our code on the lsdf cluster, where we have a working Ginkgo installation.
In order to do this, we would have set up the travis build process differently:
This will result in a slower build process but gets us the opportunity to run tests using the Ginkgo library, which would be very good to have.
We would also not have to always install SSGet ourselves.
If this approach is good, needs to be tested/evaluated.
When entering no input the program crashes.
The problem is w try to pop an element from an empty array.
Need to be fixed.
Define what needs to be done soon
e.g. what are the main goals, that we want to achieve, ...
The documenation should be more easy to access than haveing each user to generate it by himself.
This might help.
The collector sometimes prints error messages to the command line when trying to check the regularity of some matrices. This warning message should not be displayed, but handlet internaly.
We should find a solution to set default values automatically with help of the config file
The folder data in modules/shared should be moved to a top level folder with a representative name.
we need to either implement the loader or delete the loader class
Running the collector without a provided name results in a crash of the program. This should not happen. The other commands should also be checked if similar behavior can happen there too.
We need a definition of the grayscale sparsity image in the glossary.
work out scenarios and use cases for the specification sheet
The view still has a quite low test coverage. This can be increased by more unit test. Maybe with some integration tests on other modules this can also be increased.
We need to separate the unittest from the integration test.
At the moment, most of the matrices in a dataset will be labled with the same solver.
This is not very good for the learning process of the neural netowork.
It would be good to give the user the opportunity to create a dataset, where each solver occurs the same amount of times.
This can either be accomplished by creating a new command that connects the collector module directly to the labeling module. With this you could label each matrix immediatly after fetching and only allow amount of matrices / amount of solver matrices to be labeled with the same solver and discard the rest of them. This might be very difficult and/or slow.
A more easy way would be to just give the labeler a new flag -b/--balanced. This will get the solver to trim his created dataset of labeled matrices to a dataset where each solver will occur the same amount of times. This approach will only be sensible with a huge amount of matrices, because some solvers occur close to never. It will also result in an undeterministic size of the labled matrices dataset.
The test coverage is still quite low in the labeling module, this can be increased with at least some more unit tests and maybe some integration tests.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Find out how we can convert a pattern image out of a given matrix so we can pass it to Keras.
Some information in the paper we got from Markus.
I made a list of things that stood out to me while working on the specification sheet. I´d love if we could discuss them.
*Is it usefull to let the user decide how he wants to split the training and test data? Since he can not change any of the other hyperparamters it seems out of place
*Why is user able to specify a path to safe the neural network to, but he can not specify a path to safe the matricies and labeld greyscale pattern images(maybe default results folder for every module)
*if the user only wants to use our classifier it might be reasonable to let him use our nn and not worry about specifing a path where the nn is in the file system.
*When someone uses the classifier and when someone uses the labling module a greayscale pattern image conversion has to be done. Would it make sense to create another module for the greyscale pattern image?
*we could add a "durchführbarkeitsanalyse" (similar work has been done by our supervisors, there a many preexisting libaries)
*We each should review the work someone else has done
Gather ideas for command line usage of our software.
Later on, work out a entire command set for our system
We need to figure out, how fare we can go with object orientation in Python (like polymorphy,...). So someone needs to collect some information about this and potentially present to the team, what is possible and how things are done in Python.
The README.md file is the first thing a user sees when visiting our repository.
At the moment this file only contains the most basic information.
This should be extended by more explanations of our project together with some examples on how to use our project.
That means, first the user wants to know what our programm does.
The user needs to know how he can run our code (setup & installation).
The user needs to know how he can use our programm (interaction examples).
For more specific information you can reference our wiki.
Code Climate still shows some issues with our code. This issues should be inspected and removed if possible.
When using the loader or saver, they should raise an IOException if something went wrong.
Every usage of them should therefore be in a sepcial try/except block to make sure the file got correctly loaded.
This step is important so we don't have program crashes on incorrect inputs/config file configurations.
Create and push a first version of the specification sheet, where we can start working and commiting on.
we need to check if the input matrix of the classifier is regular and write a test for that
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.