Giter Site home page Giter Site logo

hlda's People

Contributors

lengerfulluse avatar mkneesh avatar

Stargazers

 avatar

Watchers

 avatar

hlda's Issues

Experiment with Differ Gamma

with other parameter unchange

 <GAM   0.2      0.2>
 DEPTH 3
 ETA    3.2       0.025      0.0005
 GEM    0.1     100
 SCALING      0.2      50
 SAMPLE_ETA    0
 SAMPLE_GEM    0


 Path  34/11/9/6/5/5/5/4/4/            54
 Word allocation    2116/437/45

Sampling with Differ Initial values

Hyper-Hyper Parameter's Influence on Result for Limited Iteration:

Before, We have always think that, As long as we select the sampling for GEM and ETA, we
would get a ideal result for the word allocation and tree structure. And it's naturally we could get a optimization of ultimate corpus modeling.
In practical it seems different a lot. the main cause lies in the too limited iterator times.

ETA  5.2    0.025    0.005
GEM    0.4  100
SAMPLE ETA 1
SAMPLE  GEM 1
word allocation:         1399/526/673
Path      1      1     1      1      1      1     1    1     1

Final score and sampling results:
Score  58359    
ETA    0.67    1.458    1.459  
GEM_MEAN   0.57
GEM_SCALE   7.17

Experiment with Differ Pi for word allocation.

Focus on the Pi parameter's Influence on Word Allocation

Pi parameter control how strict the m parameter influence on word allocation.

ETA 5.2   0.025  0.005
GEM 0.4  [300]
word allocation      1327/774/497   
Path   13  10   8   4   4   4   4 

ETA 5.2   0.025    0.005
GEM 0.4  [500]
word allocation      1369/740/489
Path  14   9   6   5    4    4    4  

ETA 5.2  0.025     0.005
GEM 0.4   [10]
word allocation      1230/838/530
Path    12    10    6    5   4   4   4

ETA 5.2    0.025    0.005
GEM 0.4    [2000]
word allocation         1338/762/498
Path    14     7      7    6    5     4    4

Same GEM_MEAN and GEM_SCALE settings with different other parameters.

With different iterator times, comparison between word allocations:

  1. Experiments with GEM_MEAN and GEM_SCALE
    GEM_MEAN 0.5 GEM_SCALE 100.

    Iter10000     Iter30000    Iter50000     Iter80000       
    0  1217            1152             1131          1175     
    1   666             757               733            718      
    2   715             689               734            705      
    

Above show differ iterators.
2. Experiments with ETA
It seems that, not only the GEM_MEAN and GEM_SCALE will influence the word allocation , the ETA setting also have a great impact on the word allocation of different levels. But time of iterator seems have little influence on the word allocation.

    GEM_MEAN   0.4          GEM_SCALE      100  
              ETA                         Allocation    
        3.2     0.025     0.005         1357/485/756       
        1.2     0.025     0.005         1403/722/473   
        5.2     0.025     0.005          1344/780/474   

Above show the differ ETA.
3. Possible Reason for Missing mode.levels Files

Experiment with GEM

GEM 0.1

 ETA   3.2        0.025      0.0005
 GAM   1.0        1.0
 SCALING_SHAPE     1.0
 SCALING_SCALE      50
 Path    15/12/8/7/5/5/5/4/4/3                                                 65
 Word  Allocation                   2117/447/34

GEM 0.4

 Path     12/11/10/8/8/7/7/6/6/6/6/5/4                                       50
 Word  Allocation                   1423/757/418

GEM 0.7

 Path      14/12/12/10/6/6/6/6/6/6/5/4                                         39
 Word  Allocation                  1146/703/749

How ETA and GEM parameters influenced the Word allocation and mode.levels files

As the Issue#2 mentioned, For a definitive ETA setting, if we smaller the GEM_MEAN value, mode.levels file will missing! However, when we increase the ETA value, the mode.levels appear again.
Therefore, There exists some connection of effect on the final word allocation and tree structure.
ETA 5.2 0.25 0.005

  GEM_MEAN    0.15            YES             2271/291/36
  GEM_MEAN    0.25             YES            1640/687/271
  GEM_MEAN    0.75              YES            786/650/1162

ETA influence on topic numbers and doc_no, word_no in every level.

Experiment on ETA's Influence on topic number related in each level

GEM_MEAN 0.4 GEM_SCALE 100

ETA 5.2   0.025  0.05
Word Allocation       1267/749/582   
level            topics          doc_no/word_no
0                 1                         147/1267
1                 9                         135/728        5/9      1/2    1/2  1/1
2                 44                        48/222   23/110   13/43    12/65    6/19 

ETA  5.2  0.025   0.005
Word Allocation       1288/746/564
level            topics         doc_no/word_no
0                  1                     147/1288
1                  12                   136/726         1/4     1/0   1/1   1/2
2                  49                    30/110     16/75    13/47   8/42   7/28   7/20 


ETA 5.2  0.025    0.5
Word Allocation       1335/901/362
level            topics         doc_no/word_no
0                  1                      147/1335
1                  13                     134/877   2/12   1/0   1/4   1/2
2                  93                     6/23    6/17    6/17   4/10   4/13   4/11   

ETA 5.2   0.025    0.5
Word Allocation      1342/914/342
level           topics          doc_no/word_no
0                  1                      147/1342
1                  11                     136/882      2/6      1/5    1/4   1/3   1/0
2                  103                    6/9    4/10     4/7    3/15  3/11   3/14

Relation between GEM_MEAN, GEM_SCALE and mode.levels file

Differ ETA with Same GEM_MEAN and GEM_SCALE

For a long time, I have always think that the reason for disappear of mode.levels file is ETA setting. However, the following experiments denied my assumption.

 1.2    0.025    0.0005    YES
 1.2    0.025    0.005      YES
 0.2    0.025    0.005      YES
 0.2    0.25      0.005      YES
 2.2    0.25      0.005      YES

Above with GEM_MEAN 0.5 and GEM_SCALE 100

When We use the same ETA parameters and Different GEM_MEAN:

We setting ETA with 2.2 0.25 0.005:

 0.15      NO
 0.25      NO    
 0.35      YES
 0.5        YES

Sampling with Differ SCALING_SHAPE and SCALING SCALE

Differ SCALING Parameter( G Prior)

Before, we have always attempt to sampling GEM and ETA and GAM, but keeps the other parameter unchange, We will always find that the topest score and mode are almost the same. So we try to change the SCALING parameters:

 ETA 0.2      2.5        0.5
 GAM    1.0     1.0
 GEM_MEAN   0.1
 GEM_SCALE      100
 SCALING_SHAPE    1.0
 SCALING_SCALE     0.5
 SAMPLE      ETA      1
 SAMPLE       GEM       1

  word allocation  0/0/2598
  Path        114       30        2         1
  Score       -33.9       
  ETA      1.517      1.517          1.097
  GEM_MEAN     1.0
  GEM_SCALE       6.967

Another Comparison with SCALING_SCALE 100

    Iter 126
    ETA 0.2      2.5     0.5
    SAMPLE     ETA     1
    SAMPLE     GEM     1
    SAMPLE_SHAPE   1.0
    SAMPLE_SCALE     100
    word allocation         1406/526/666 
    Path          1         1          1           1            1         1           1
    Final Score and Sampling results:
    Score              58710
    ETA        0.7443        1.459           1.456
    GEM        0.573          8.099

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.