Giter Site home page Giter Site logo

arcaeca2-lang-stats's People

Contributors

ogallagher avatar

Watchers

 avatar

arcaeca2-lang-stats's Issues

Make code browser compatible

  1. Create parent webpage.
  2. Replace node file system calls with networked file import.
  3. Replace module syntax where needed.

Condition for 3-phoneme hole

Isn't this condition backwards?

// if "AB" occurs, and "BC" occurs, but "ABC" doesn't
if ( (tTop[sTop] < 2) && (tBottom[sBottom] < 2) && (tPattern[sKey] > 2) ) {
// I'm not really sure what a good metric would be for deciding if a combination "occurs" or not
tOut.push(sKey)
}

I think we instead want:

if (top > X && bottom > X and pattern < X) {

expandCategories always returns single phoneme sequence

let output = [[]]
// TODO this method currently overwrites the output list for every subsequent code from the pattern
for (let i = 0, ch = inputString.charAt(0); i < inputString.length; i++, ch = inputString.charAt(i)) {
// input char is a category
if (categories[ch] !== undefined) {
output = multiply(output, categories[ch])
}
// input char is anything else
else {
// for every existing item in out, add this char
output.forEach((op) => op.push(ch))
}
}

corresponding failing test

it('handles non trivial patterns', function() {
let sequences = expandCategories('AB', categories)
assert.strictEqual(
sequences.length,
categories['A'].length * categories['B'].length
)
})
})

  9 passing (17ms)
  1 failing

  1) FindHoles
       #expandCategories
         handles non trivial patterns:

      AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:

1 !== 6
      
      actual expected
      1      6

Statistical definition of 3-phoneme anomaly

Capital letters A B C each represent a phoneme class.
Pattern ABC represents all strings made of an instance of each phoneme class/category, in that order.
For example, if a1 is an instance of A and b1 is an instance of B, then a1b1 is an instance of AB.

Currently, we define a 3 phoneme hole anomaly ABC where the number of occurrences (count) of the full pattern is 0, but the counts of each sub pattern AB and BC are greater than 0. A hole/valley is unusually low count, and a hill/peak is an unusually high count.

if ( (tTop[sTop] >= 1) && (tBottom[sBottom] >= 1) && (tPattern[sKey] < 1) ) {

We can instead define valleys and peaks as statistical outliers, taking the average and then look for statistical outliers?

Start by expressing the ratio of frequency between ABC and AB + BC as a ratio: ABC / (AB + BC).

Say the average ABC/AB+BC occurrence ratio is 0.25, with a standard deviation of 0.1.
That means you should look for any individual char sequence within the ABC pattern whose ratio is less than 0.15 (valley) or greater than 0.35 (peak).

ABC/AB+BC for any one instance of the ABC pattern would be, for example:

(count of a1b1bc1 occurrences) / ((count of a1b1) + (count of b1c1))

FindHoles start/top and end/bottom sequences are incorrect

A single phoneme can be more than one character. For example: ch and sh could be English phonemes of the consonant category.

Here, the code is assuming that the 3-phoneme string is only 3 characters:

let sTop = sKey.substring(0,2) // the corresponding "AB"
let sBottom = sKey.substring(1) // the corresponding "BC"
console.log(`whole start end:\t${sKey}\t${sTop}\t${sBottom}\t|\t${tPattern[sKey]}\t${tTop[sTop]}\t${tBottom[sBottom]}`)

See proof of corresponding logs:

whole start end:        a1a1a1  a1      1a1a1   |       0       undefined       undefined
whole start end:        a1a1a2  a1      1a1a2   |       0       undefined       undefined
whole start end:        a1a1a3  a1      1a1a3   |       0       undefined       undefined
...

The third row in the above logs should instead be a1a1a3 a1a1 a1a3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.