ianbollinger / nomegen Goto Github PK
View Code? Open in Web Editor NEWa library and utility for randomly generating names
License: Apache License 2.0
a library and utility for randomly generating names
License: Apache License 2.0
It would be nice to have a GUI to go along with the CLI. The question is what toolkit to use. Currently, hsQML seems to be the most portable.
Prerequisites:
Generating the MarkovMap
is an expensive operation and is unnecessary if the underlying Nomicon
has not changed. This is currently impossible to do, however, as the implementation of the data structure MarkovMap relies on is not exposed and provides no means of binary serialization.
Additionally, a binary format for Nomicon
files would allow for faster deserialization, especially since individual names wouldn't need to be parsed into segments. For ease of use, nomegen would need to detect if the text format had been altered since it was last "compiled".
Nomicon
and other data types.We need a separate configuration type in addition to the Nomicon
. It should store: the Nomicon, the size of the prediction context, and the generated MarkovMap
.
Currently the value is hard-coded as 2 for "aesthetic" reasons.
Avoid calling partial functions like (!!)
and incomplete pattern matches. Avoiding the latter will eliminate spurious warnings about (hypothetically) impossible matches. Additionally, don't call error
when parsing fails; we need more context about the failure to provide a helpful error message anyway.
markovGenerate
countSegments
windows
But should this should be the default behavior?
Currently nomegen generates names that are as long as possible instead of potentially terminating on a suffix.
Currently we do no Unicode normalization on either names or segments, which means that seemingly identical characters could fail to parse unexpectedly.
countSegments
, windows
, markovGenerate
.Even YAML is too verbose a format and has weird quirks. Namely, having to quote the letters "n" and "y" for segments is confusing and forgetting to currently yields a meaningless error message. Eliminating yaml
as a dependency may also eliminate a few transitive dependencies.
Currently the generator merely selects a random segment to start with and then terminates after n segments are generated. The results are sloppy and we should instead always select the initial node in the Markov chain and terminate on the final node.
For instance, we could wrap each Segment
in the MarkovMap
in a data type like this:
data Component a
= Initial
| Medial !a
| Final
The utility isn't too useful on its own without provided data to train it on.
There should at least be a backdoor to seed the PRNG. Additionally, it would be nice if the usage of the PRNG were modular and could be swapped out for another. Currently, mwc-random
must be used for its generation of variates in a categorical distribution; however, we can't serialize the tables it generates. Thus it may have to be replaced anyway. Using something like Vose's Alias Method for generating variates seems optimal.
The source code is currently littered with TODO
comments that should be tracked here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.