Giter Site home page Giter Site logo

nomegen's People

Contributors

ianbollinger avatar

Watchers

 avatar  avatar

nomegen's Issues

Graphical user interface

It would be nice to have a GUI to go along with the CLI. The question is what toolkit to use. Currently, hsQML seems to be the most portable.

Release on Hackage

Prerequisites:

  • finish documentation and README
  • finish test suite
  • finalize API

Binary serialization for Nomicons (and MarkovMaps)

Generating the MarkovMap is an expensive operation and is unnecessary if the underlying Nomicon has not changed. This is currently impossible to do, however, as the implementation of the data structure MarkovMap relies on is not exposed and provides no means of binary serialization.

Additionally, a binary format for Nomicon files would allow for faster deserialization, especially since individual names wouldn't need to be parsed into segments. For ease of use, nomegen would need to detect if the text format had been altered since it was last "compiled".

Write README

  • Document the command line interface.
  • Document the Nomicon file format.

Clean up lens-based interface

  • Generate and export lenses for Nomicon and other data types.
  • Don't export record fields or constructors.
  • Rename and export isomorphisms.
  • Simplify Cons/Snoc instances.

Add back configuration data type

We need a separate configuration type in addition to the Nomicon. It should store: the Nomicon, the size of the prediction context, and the generated MarkovMap.

Eliminate partial functions

Avoid calling partial functions like (!!) and incomplete pattern matches. Avoiding the latter will eliminate spurious warnings about (hypothetically) impossible matches. Additionally, don't call error when parsing fails; we need more context about the failure to provide a helpful error message anyway.

YAML error messages

  • Pretty-print YAML error messages.
  • Attempt to provide a better error message than "mzero".
  • Try to explain error messages, though we'd be better served just using a less complex format than YAML.
  • Try not to leak the concrete details of YAML deserialization errors, though this has downsides.

Handle suffixes properly

Currently nomegen generates names that are as long as possible instead of potentially terminating on a suffix.

Unicode normalization

Currently we do no Unicode normalization on either names or segments, which means that seemingly identical characters could fail to parse unexpectedly.

Documentation

  • Write top-level module documentation.
  • Add description field to module header.
  • Document complex algorithms: countSegments, windows, markovGenerate.
  • Write description field in cabal file.

Replace YAML with custom format

Even YAML is too verbose a format and has weird quirks. Namely, having to quote the letters "n" and "y" for segments is confusing and forgetting to currently yields a meaningless error message. Eliminating yaml as a dependency may also eliminate a few transitive dependencies.

Prefix generation

Currently the generator merely selects a random segment to start with and then terminates after n segments are generated. The results are sloppy and we should instead always select the initial node in the Markov chain and terminate on the final node.

For instance, we could wrap each Segment in the MarkovMap in a data type like this:

data Component a
   = Initial
   | Medial !a
   | Final

Allow custom seeding for PRNG (consider modularizing PRNG)

There should at least be a backdoor to seed the PRNG. Additionally, it would be nice if the usage of the PRNG were modular and could be swapped out for another. Currently, mwc-random must be used for its generation of variates in a categorical distribution; however, we can't serialize the tables it generates. Thus it may have to be replaced anyway. Using something like Vose's Alias Method for generating variates seems optimal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.