Giter Site home page Giter Site logo

statula's Introduction

Statula


Simple terminal-based program for descriptive statistics

Current version: 0.1.16

Build Status license codecov

Initially developed as a leeway from statistics exam, this small piece of software is supposed to allow quick data analysis of big datasets. The main focus of this project is speed.


Usage

In order to perform analysis, just type: ./statula dataset_filename into your terminal. Data should be in text file which contains only whitespaces and floating point numbers.
In case there is a character that does not meet above demands, all remaining characters of the current line shall be skipped.

Starting Parameters

-h/--help Prints simple help panel
-o/--open Open specified file
-s/--save Save result to specified file. Amount of targets must be equal to amount of files opened via --o
-l/--language Print result using specified language (language file has to be present in the current directory.
--precision Print result using specified float precision. --silent Disable printing to standard output.
--nosort Disable sorting the input. It results in vast performance improvement at the risk of incorrect results in mode, median and skewness. USE WITH CAUTION
--stdin Read directly from stdin. It is the default behaviour if Statula does not receive any starting parameters. Use EOF combination (which is CTRL+D on most systems) to finish data input.
--print_name Print file names directly above each printed dataset.

If there is just one string after "./statula" (not starting with "-"), then it shall be used as a default filename for the session.


Bit fields
Following section might come in useful if you plan to familiarize yourself with the code. Macro definitions can be found in header files.

Dataset:

Bit 2 1 0
Option SORT MULTIPLE_MODES NO_MODE

Statula:

Bit 3 2 1 0
Option PRINT_FILE_NAME PRINT_HELP STDIN PRINT_TO_STDOUT

QTYMCUW

Questions that you might come up with.

Q: What do "memory allocation responsibilities" mean?
A: In short: "None" means that the function doesn't allocate memory that's visible outside of it's scope.

"Delegated to other functions" - memory allocation is delegated to other functions, but if you were to compare memory usage from before and after calling the function, new memory is allocated. It's just not done by that function itself.

"Allocates memory" speaks for itself.

You can sort of deduce complexity of a function based on it's memory allocation responsibilities - high-level ones usually have second type of memory allocation responsibilities, whereas primitives may allocate memory (init_ functions) or not (is_ boolean functions).

TODO

  • Mathematical statistics

  • Extend fuctionalities in descriptive statistics

  • Get around limitations of current language system for help panel*

*I think help panel should work even if Statula is unable to load strings from file, however I am not quite sure how can I avoid hard-coding it into the program itself.

Tests are written using Criterion library - it is not neccessary to download it in order to use Statula, however I recommend it as it is a great piece of software. Thanks for everything Snaipe!


statula's People

Contributors

cipo28 avatar fee1600d avatar jaihindhreddy avatar justthegame avatar leoche avatar osiewicz avatar progborg avatar rafaelkalan avatar tunnelsnake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

statula's Issues

Revisit macro naming

Currently, there are some macros that might collide if some module of this project was to be taken out of it's scope - that is, SORT is really general name. Solution to that problem would be to add prefix "STATULA_" to these.

Add new statistical functions

Dataset struct has not changed much ever since its creation - atleast not from the functionality point of view. I have focused more on metafunctionality such as command line arguments and such. Feel free to add new functions.

Add new .lang files

Open up .en-gb,lang and do it on the fly. Ideally your translation should be reviewed by someone else.

Improve mode information

As of today, it's impossible to tell whether we have no mode in given dataset since the message for that is "No mode/more than one mode" - ideally we'd like to remove any ambiguity by changing MODE_PRESENT flag into two: MULTIPLE_MODES and NO_MODE - if they are both set to 0 then that means that we have only one mode and that dataset->mode is valid. Default state for these flags should be (respectively) 0 and 1.

Make eprintf-calling functions easier to test

As of today there is no unified interface for mock error values - functions can either return NULL or (in case of value-returning functions), arbitrary values - however, there is no consistency with these and so, we'll need something like macros to help us solve these. exit function in eprintf has a magic number of '2' which is not okay. I want it to signal something so that bash scripting can be made a tad bit easier.

Move .lang files to separate directory

That will require some fiddling within the code. We should no longer use relative paths in all bits of code.

Intended behaviour: default search directory for .lang files should be predefined. and not implied to be the same as binary directory.
Current behaviour: Statula searches current directory for .lang file. This caused minor issue with #6 .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.