ogallagher / arcaeca2-lang-stats Goto Github PK
View Code? Open in Web Editor NEWScripts for analyzing languages
Scripts for analyzing languages
An environment without node JS interpreter installed should be able to run the webpage in a browser with file://
protocol by replacing network file fetch with a file upload widget.
Isn't this condition backwards?
arcaeca2-lang-stats/FindHoles.js
Lines 108 to 112 in 9d99885
I think we instead want:
if (top > X && bottom > X and pattern < X) {
arcaeca2-lang-stats/FindHoles.js
Lines 257 to 269 in 0351957
corresponding failing test
arcaeca2-lang-stats/test/FindHoles.js
Lines 150 to 157 in 0351957
9 passing (17ms)
1 failing
1) FindHoles
#expandCategories
handles non trivial patterns:
AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
1 !== 6
actual expected
1 6
Capital letters A B C each represent a phoneme class.
Pattern ABC
represents all strings made of an instance of each phoneme class/category, in that order.
For example, if a1
is an instance of A
and b1
is an instance of B
, then a1b1
is an instance of AB
.
Currently, we define a 3 phoneme hole anomaly ABC
where the number of occurrences (count) of the full pattern is 0, but the counts of each sub pattern AB
and BC
are greater than 0. A hole/valley is unusually low count, and a hill/peak is an unusually high count.
arcaeca2-lang-stats/FindHoles.js
Line 118 in a82201a
We can instead define valleys and peaks as statistical outliers, taking the average and then look for statistical outliers?
Start by expressing the ratio of frequency between ABC
and AB + BC
as a ratio: ABC / (AB + BC)
.
Say the average ABC/AB+BC
occurrence ratio is 0.25, with a standard deviation of 0.1.
That means you should look for any individual char sequence within the ABC pattern whose ratio is less than 0.15 (valley) or greater than 0.35 (peak).
ABC/AB+BC
for any one instance of the ABC pattern would be, for example:
(count of a1b1bc1 occurrences) / ((count of a1b1) + (count of b1c1))
As of now, this issue is about half resolved.
A single phoneme can be more than one character. For example: ch
and sh
could be English phonemes of the consonant category.
Here, the code is assuming that the 3-phoneme string is only 3 characters:
arcaeca2-lang-stats/FindHoles.js
Lines 113 to 115 in a82201a
See proof of corresponding logs:
whole start end: a1a1a1 a1 1a1a1 | 0 undefined undefined
whole start end: a1a1a2 a1 1a1a2 | 0 undefined undefined
whole start end: a1a1a3 a1 1a1a3 | 0 undefined undefined
...
The third row in the above logs should instead be a1a1a3 a1a1 a1a3
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.