The pse---la-meets-ml from theslimvreal

Create datasets that will be uploaded on git

In order for our default loading paths to work, we need to put files (unlabled matrices, labled matrices, trained neural network) into the specified folders.

implement ssget new_search test

we should write a test for the new_search method from ssget to increase test coverage

Save standard names

When saving datasets, it should be saved with a fix name, together with the current date and time.

Test edge cases on command parsing

The command parsing still has some bugs, like an error when entering 2 spaces instead of 1.
There should be more tests testing this edge cases.

nonfunctional requirements

Collect and formulate nonfunctional requirements for the specification sheet

Collect required technologies

Make a list of all required/preferred technologies along with some documenation or tutorial links that help to get started.

Find important features of metrices

Find out, which features of a matrix really matter for the efficiency of an algorithm (e.g. percentage of zeors, diagonal arrangement of the values, etc...)

At the moment all of our guides/documentations are in the /utils folder.
This is not very self explaining for 3rd party users.
Therefore we should start using the Wiki of our respitory. This will also create a uniform style of our guides/documentations.
What needs to be done is to move the informations contained in the /utils folder to the wiki of our repository.

Improve doxygen doccumentation

The documentation created by Doxygen can now be found here.
The documentation of the code is quite good, but the surrounding page doesn't look very pretty.
It would be great if someone works himself into the Doxygen configuration to find out what can be done here.

global test cases

Collect ideas and create global test cases for the specification sheet

implement test print prediction from classifier

we should test the method print predictions from the classifier including exeptions to increase test coverage

test coverage for training module

Find and add a appropriate license

Find a licence that fits our needs (maybe BSD-2) and add a LICENSE.md file Tutorial for creating LICENSE file so the license will be shown on every file.

Save time of solvers together with on-hot-encoding

The time each solver took on a matrix should be saved in the hdf5 as another vector.

Implement size in collector module

The collector doesn't use the size the user entered. This should be changed. Standard size should be 128 and written into the config file. The user can enter a different size.

Find sparse matrices for ML learning

Find good and big sparse metrices data sets for the learning and testing process of the machine learning

Integration test in python

Find out if there are good modules for creating integration test in python.

Create the power point slides for the colloquium

Create about 5 slides for the colloquium for the specification sheet

functional requirements

Collect and formulate functional requirements for the specification sheet

Run travis on lsdf-cluster

In order to test our code that uses the Ginkgo library, we need the Ginkgo environment.
But because it's very difficult to set up Ginkgo, we could try to run our code on the lsdf cluster, where we have a working Ginkgo installation.

In order to do this, we would have set up the travis build process differently:

Travis needs to use ssh to get a connection with the cluster
to do this it either needs to be logged into the kit-vpn
or use the server the ATIS provieds for each student (ssh into this server to ssh into cluster)
after that it copies all the files to the lsdf server
runs pytest
uploads coverage to CodeClimate
creates documentation
uploads documentation to GitHub-Pages

This will result in a slower build process but gets us the opportunity to run tests using the Ginkgo library, which would be very good to have.
We would also not have to always install SSGet ourselves.

If this approach is good, needs to be tested/evaluated.

Fix no input crash

When entering no input the program crashes.
The problem is w try to pop an element from an empty array.
Need to be fixed.

Define first steps

Define what needs to be done soon

e.g. what are the main goals, that we want to achieve, ...

add "Durchfürbarkeitsanalyse"

Auto-Deploy Doxygen doccumentation to GitHub

The documenation should be more easy to access than haveing each user to generate it by himself.
This might help.

Suppress collector warnings

The collector sometimes prints error messages to the command line when trying to check the regularity of some matrices. This warning message should not be displayed, but handlet internaly.

Find a solution to set default values automatically

We should find a solution to set default values automatically with help of the config file

Move paths for loading/saving

The folder data in modules/shared should be moved to a top level folder with a representative name.

use Loader class in all classes where a dataset is loaded

we need to either implement the loader or delete the loader class

Prevent program crashes when input is missing

Running the collector without a provided name results in a crash of the program. This should not happen. The other commands should also be checked if similar behavior can happen there too.

explain grayscale sparsity image in glossary

We need a definition of the grayscale sparsity image in the glossary.

system model

work out scenarios and use cases for the specification sheet

Increase test coverage in view

The view still has a quite low test coverage. This can be increased by more unit test. Maybe with some integration tests on other modules this can also be increased.

separate unittest from integration test

We need to separate the unittest from the integration test.

Command for balanced data set

At the moment, most of the matrices in a dataset will be labled with the same solver.
This is not very good for the learning process of the neural netowork.
It would be good to give the user the opportunity to create a dataset, where each solver occurs the same amount of times.

This can either be accomplished by creating a new command that connects the collector module directly to the labeling module. With this you could label each matrix immediatly after fetching and only allow amount of matrices / amount of solver matrices to be labeled with the same solver and discard the rest of them. This might be very difficult and/or slow.

A more easy way would be to just give the labeler a new flag -b/--balanced. This will get the solver to trim his created dataset of labeled matrices to a dataset where each solver will occur the same amount of times. This approach will only be sensible with a huge amount of matrices, because some solvers occur close to never. It will also result in an undeterministic size of the labled matrices dataset.

Brainstorm on the design for the first module

Increase test coverage in labeling module

The test coverage is still quite low in the labeling module, this can be increased with at least some more unit tests and maybe some integration tests.

create a small test dataset for the unit tests to use

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

find new demarcation criteria

Creation of pattern images

Find out how we can convert a pattern image out of a given matrix so we can pass it to Keras.
Some information in the paper we got from Markus.

Random thoughts

I made a list of things that stood out to me while working on the specification sheet. I´d love if we could discuss them.

*Is it usefull to let the user decide how he wants to split the training and test data? Since he can not change any of the other hyperparamters it seems out of place

*Why is user able to specify a path to safe the neural network to, but he can not specify a path to safe the matricies and labeld greyscale pattern images(maybe default results folder for every module)

*if the user only wants to use our classifier it might be reasonable to let him use our nn and not worry about specifing a path where the nn is in the file system.

*When someone uses the classifier and when someone uses the labling module a greayscale pattern image conversion has to be done. Would it make sense to create another module for the greyscale pattern image?

*we could add a "durchführbarkeitsanalyse" (similar work has been done by our supervisors, there a many preexisting libaries)

since the "tipps" includes explaining our product we might want to add a page of explaination for the problem we want to solve(what are iterative solvers, why is our work usefull...)

*We each should review the work someone else has done

command line (system model)

Gather ideas for command line usage of our software.
Later on, work out a entire command set for our system

Object orientation in Python

We need to figure out, how fare we can go with object orientation in Python (like polymorphy,...). So someone needs to collect some information about this and potentially present to the team, what is possible and how things are done in Python.

Write Readme

The README.md file is the first thing a user sees when visiting our repository.
At the moment this file only contains the most basic information.
This should be extended by more explanations of our project together with some examples on how to use our project.
That means, first the user wants to know what our programm does.
The user needs to know how he can run our code (setup & installation).
The user needs to know how he can use our programm (interaction examples).
For more specific information you can reference our wiki.

Clean up code / remove Code Climate issues

Code Climate still shows some issues with our code. This issues should be inspected and removed if possible.

Error handling on loader/saver

When using the loader or saver, they should raise an IOException if something went wrong.
Every usage of them should therefore be in a sepcial try/except block to make sure the file got correctly loaded.
This step is important so we don't have program crashes on incorrect inputs/config file configurations.

theslimvreal / pse---la-meets-ml Goto Github PK

pse---la-meets-ml's People

Contributors

Stargazers

Watchers

Forkers

pse---la-meets-ml's Issues

Recommend Projects

Recommend Topics

Recommend Org