cmu-phil / tetrad Goto Github PK

View Code? Open in Web Editor NEW

402.0 402.0 110.0 66.89 MB

Repository for the Tetrad Project, www.phil.cmu.edu/tetrad.

License: GNU General Public License v2.0

Java 92.56% HTML 6.20% Haskell 0.02% CSS 0.04% Roff 0.03% PostScript 1.11% Fortran 0.05%

tetrad's People

Contributors

Stargazers

Watchers

Forkers

renjiey jmogarrio dmalinsk igormk espinoj biotech25 adambrodie ajsedgewick amurrayw mglymour1 ps7z mglymour jdramsey rubencmu cg09 vineet1992 geoframecomponents dachylong tsintian indera chen0031 kingfish777 varnithakurlireddy marksilvis light44 rws2 lizziesilver xim2016 getbioinfo kgrover monmon-2007 jennwilliams ekummerfeld rahmarid grseb9s kshajih1 iksinje saramagliacane bd2kccd jlokimlin jenny-nlc the-alchemist coggin pgm8sjc xuerchen shuyanw omig12 ishadas xlab-tongji symbol-research carrylj hungnk25 soraismus gkovaig probably-correct vishalbelsare mkim0710 honphy shyamalschandra joewalp knut0815 amydoulaohu hedy233 jbdatascience chengfa110 jlaborda hebo910820 drotich bio-ontology-research-group xuezhizhang midekko regusch charleslow-cmu jonygao621 wangtz1994 molimomo while519 renhongjia suvodeep90 dscausality xwbxxx lnsongxf macrofinancehub chunchill nh89 khesoem khaes-kth lipi12q leejeric jtytx dddlincool jmpuerta 777rebecca james-hadoop fdoperezi neverthink-hhf wangxuekui xavierwong wangcj05 neil-chao

tetrad's Issues

Might be a counterexample to the way GFCI is coded; need to revert.

Problem with loading data sets

Dear Experts,

I’m trying to load 12 text files into the data box and after I press "load all”, the loading log remains blank. After pressing save, I see all 12 tabs but they all have the same data (subject 1’s data set, as opposed to each subject’s unique data). I am using version 5.2.1-3 on mac. Has anyone else experienced this? I would greatly appreciate any advice.

Thanks,
Eleni

Clean up PerformanceTests so that it can be used more easily.

PerformanceTests is a collection of tests for a variety of algorithms, but the code is a bit of a jumble right now and hard to read, let alone use. Needs to be cleaned up.

Reduce the size of the ejar.

Setting minimize to true for shade resulted in problems with loading the configuration in the Tetrad GUI. However, setting it to false increases the size of the ejar from 16G to 30+G. Need to find a way to split the difference. Maybe leave out specific jars from the build.

Build failure due to test error

Attempting to build project, while running tests will get the following error:

Tests in error:
    test8(edu.cmu.tetrad.test.TestStandradizedSem): non symmetric matrix: the difference between entries at (1,2) and (2,1) is larger than 0

This results in build failure

How can I measure program runtime?

Hi,

I am using "tetradcmd-5.1.0-10.jar" in Windows by making batch file to run for multiple input and output files. I am searching causal variables from my input data using FCI algorithm.

How can I measure the program runtime? I like to measure the program runtime for each search in each input data set, because some input data sets have very complex models.

Thank you,
Sanghoon

Files are in the wrong place for TestTetradCmd--need to move them.

See the TODO in the file.

Change the name of GES in the interface to FGS. Or include both GES and FGS. (Better.)

Go through the unit tests and make sure they all test something. Scale the really complex ones down.

Add a checkbox for FGS in the interface to allow the user to assume one-edge faithfulness if they want.

Currently, the original GES algorithm and the newer FGS algorithm are both in the interface. One feature of FGS is that the user can assume that if X and Y are uncorrelated then X is not adjacent to Y in the graph. Assuming this kind of faithfulness speeds up the search considerably but is not always helpful. So the user should be able to choose whether to assume it or not. We need a switch to let the user decide.

FOFC doesn't scale with sample size

FOFC isn't scaling with sample size.

Any kind of pure measurement model, FOFC slow with large N, e.g., 10,000.

Matrix Toolkits for Java (MTK) doesn't play well with 64 bit Windows.

If imports starting with no.uib.cipr.matrix are commented out, the problem code is put in red in IntelliJ. Classes affected are IndTestHsic, KernelUtils, Ling, Lingam.

The compiled classes seem to run OK, but the relevant tests break when run in Maven.

I think we should support 64 bit Windows. If so, need to translate this matrix algebra to a different library.

Joe

manual

I'm a new user of Tetrad V. The program is very impressive, but its usability would benefit greatly from some attention to the manual (new_manual.pdf). To begin with, separate the material into sections and add a table of contents, all with hyperlinks.

Thanks very much.

Error message while running TetradCmd with PC algorithm.

Hi,

I was running TetradCmd in Windows for PC algorithm, but I got an error message. Please see below.

java -jar tetradcmd-5.1.0-10.jar -data input.txt -datatype discrete -algorithm pc -depth -1 -significance 0.01

Exception in thread "main" java.lang.IllegalStateException: No algorithm was specified.
at edu.cmu.tetradapp.TetradCmd.runAlgorithmTetradCmd.java:508
at edu.cmu.tetradapp.TetradCmd.TetradCmd.java:80
at edu.cmu.tetradapp.TetradCmd.mainTetrad.java:945

When I ran it for FCI or other algorithms, I didn't get any error message. Only PC algorithm is giving error message. Even if I got the error message while running for PC algorithm, I could get an output. When I compared the PC output and other algorithm outputs, they are different so I think PC algorithm is working and the error message doesn't seem critical. But, I am not sure whether PC algorithm is working appropriately. Could you explain what the error message means and if I can fix it.

Thank you,
Sanghoon

Fix the Javadoc formatting errors.

Fix Y structure finder in graph subsets gadget.

Doesn't seem to find all and only Y structures.

When multiple data sets are in a data wrapper, save them out to separate files.

The idea here is that when you create several data sets in a DataWrapper and go to save them, you have to save them one at a time. It would be more useful if they could all be saved with one menu command. This would allow simulation facilities in Tetrad to produce data sets useful for other programs.

Find a way to run javadocs in Maven.

There was a task in Ant to do this; it's not part of the default Maven deploy. Can it be added?

Generating data sets by different seed number?

When I generate 1000 data sets in the Data box, may I be sure that the 1000 data sets will be generated by all different seed numbers? Is there no possibility that I would get the same data sets generated by the same seed number? This question is under assumption that I have many causal variables in my graph and I set large sample size enough to generate all different data sets more than 1000.

I tried to generate 10 data sets when I have just 1 causal variable and 1 target, and I set sample size 2. Then, as we can anticipate, many data sets (7~8 data sets) were all the same. So, I was curious if TETRAD is programmed to assign every different seed number when generating 1000 data sets.

Sorry for asking many questions these days, and thank you,
Sanghoon

Pull out unit tests in tetradapp into a separate directory.

The unit tests in tetradapp aren't actually being run. They should go in tetrad/src/test.

"lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar" is working in Windows, too.

Hi,

I just tested to use "lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar" in Windows (I am using Windows Server 2008 R2 Enterprise), and it is working, too. In the Wiki, you explained that it should be Unix-type machine.. I am confused. Do I know something wrong? Is it okay to use it in Windows, and may I expect the same performance?

Thank you,
Sanghoon

Revise CFCI in light of recent change to FCI.

TestSerializable got broken by the move to submodules.

I'm not sure if it can be salvaged yet; there are now multiple paths to classes instead of just one.

Adjust instructions for downloading and launching ejars.

The instructions at http://www.phil.cmu.edu/tetrad/current.html are out of date. It's not clear how ejars will be launched in the future; this needs to be worked out. The Linux instructions need to be updated, since Tetrad will run under Open JDK now.

Update package Readme files.

instantiate data sets and save them in command line method?

Hi,

This question was asked in the Goggle group, but it was not asnwered.

In TETRAD, I used the template, "Simulate data from IM" and instantiated 100 data sets in the IM box. I know that I can save the data sets to .txt file in the Data box. But, I don't want to save the 100 data set files manually.

Is there command line method that I generate/instantiate data sets like I did in IM box, and save the data sets? I think using command line to simulate and instantiate data sets will be very complicated because it will be difficult in command line to set 'Graph type', such as which variable is direct and which variable is target, and to set 'Parametric model' giving probabilities for each variable in every condition, Therefore, I was curious if there is command line method at least to save data sets automatically after I instantiated 100 or 1000 data sets in TETRAD workspace. (But, also I thought that there would be no method to extract the instantiated data sets from TETRAD in order to save the data sets using command line methods). I like to know if there is command line method to save the instantiated data sets.

Thank you,
Sanghoon

Add class-level documentation to all classes in the project.

Most methods are self-documenting because of their names and signatures. Leaving those aside, each class should have a class-level doc, even if it's simple.

Move group ID from edu.cmu.tetrad to edu.cmu

With edu.cmu.tetrad paths in the published version have two tetrads in them. Needs to be adjusted in several places.

Check names of algorithms.

Make sure the algorithm have their proper names before they get too ensconsed.

Switch copyright notices to Maven gadget.

The old gadget for inserting copyright notices doens't seem viable any longer with the conversion of the code to submodules. Try the Maven gadget instead.

http://codeoftheday.blogspot.com/2013/10/apache-maven-tips-addappend-copyright.html

Definition for edge directions

I ran TETRA cmd and got some information of interacting nodes and edge directions between the edges. I found this definition of edge direction in the manual of TETRAD v.4

directed (-->), 2) undirected (---), 3) unoriented (o-o), and 4) bidirected (o->)

But, I couldn't find good definition of this in the manual of TETRAD v.5. Also, personally, I don't understand the difference between undirected (---) and unoriented (o-o). I think they sound the same. And, I think symbol of 'bidirected' should be <->, rather than o->. Could you teach me where I can find a good and clear definition of the edge directions? I need to define it for my lab's manuscript.

Thank you,
Sanghoon

command line "tetrad.jar" doesn't work.

Hi,

I followed the introduction for command line tetrad here,
https://github.com/cmu-phil/tetrad/wiki/Command-Line-Tetrad

But, it doesn't seem to work well for me. I am using linux server, and I think my java version is fine. Please look at below the java version checking and 'tetrad.jar' running, and error message. I attached my sample input file. Could you help me?

Simulated_example.txt

[user167@login0 8_TETRAD]$ java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.1) (rhel-1.45.1.11.1.el6-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
[user167@login0 8_TETRAD]$ java -jar lib-tetrad-5.3.0-20151113.150857-1-tetracmd.jar -data Simulated_example.txt -datatype discrete -algorithm pc -depth -1 significance 0.01
Exception in thread "main" java.lang.UnsupportedClassVersionError: edu/cmu/tetrad/cmd/TetradCmd : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: edu.cmu.tetrad.cmd.TetradCmd. Program will exit.
[user167@login0 8_TETRAD]$

Get rid of unnecessary matrix libraries.

This is a wish-list item, maybe doable. Currently we are using the following matrix libraries:

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-math3</artifactId>
        <version>3.5</version>
    </dependency>
    <dependency>
        <groupId>colt</groupId>
        <artifactId>colt</artifactId>
        <version>1.2.0</version>
    </dependency>
    <dependency>
        <groupId>gov.nist.math</groupId>
        <artifactId>jama</artifactId>
        <version>1.0.2</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.matrix-toolkits-java</groupId>
        <artifactId>mtj</artifactId>
        <version>1.0.1</version>
    </dependency>

Much of this is overlapping functionality. We have a class, TetradMatrix, that wraps the Apache matrix library. Can we remove some of the other matrix libraries and use TetradMatrix instead?

Make tetrad into a multimodal project with tetradlib as a module that can be reused.

Here’s what I will add today. I will make this into a multi-module maven project so that the tetradlib part of the project is a module that can be reused. (a description of this is here https://books.sonatype.com/mvnex-book/reference/multimodule.html).

Recode Ling.

The algorithm is unstable, giving radically different answers each time the execute button is pressed. FastICA seems to be more unstable than it should be. Need to recode FastICA first. See Issue #61 and Issue #62.

Matrix Toolkits for Java (MTK) doesn't play well with 64 bit Windows

If imports starting with no.uib.cipr.matrix are commented out, the problem code is put in red in IntelliJ. Classes affected are IndTestHsic, KernelUtils, Ling, Lingam.

The compiled classes seem to run OK, but the relevant tests break when run in Maven.

I think we should support 64 bit Windows. If so, need to translate this matrix algebra to a different library.

Graph properties for the attached graph doesn't come back.

native_smooth_clean_master_graph.txt

The problem is the cycle checker. It checks for each node, depth first, whether there is a path from that node to itself. The question is whether there's a better way. Perhaps breadth first?

Remove any classes from the repository that should not be part of the public build.

Fix CCD.

CCD does not pass its tests. The tests are good. It should pass all of them, and the tests should be commented back in.

It's not clear whether CCDGES should work or not. Probably it should be moved to a child repository unless proven correct.

Recode LiNGAM.

See Issue #60.

Difference between 'tetradcmd-5.1.0-10' and 'lib-tetrad-5.3.0.20151113.150857-1-tetradcmd.jar'

Hi,
I was using "tetradcmd-5.1.0-10.jar" in windows batch. I like to make sure if the results of pc or fci search algorithm between "tetradcmd-5.1.0-10" vs. "lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar" are different in terms of graph edges? When I ran both tetrad cmd, I got 32 edges by 'tetradcmd-5.1.0-10.jar', but I got just 27 edges by 'lib-etrad-5.3.0-20151113.150857-1-tetradcmd.jar'. Also, I found that some edge directions and interacting nodes are different between two results. For example,
SNP_A-2127756_3 --- SNP_A-1839049_2 vs. SNP_A-2127756_3 <->SNP_A-1999524_1.

Do you recommend to use the latest version of TETRAD cmd for accurate(?) search result?

Thank you,
Sanghoon

Fix random tests in TestGeneralizedSem.

Several tests in TestGeneralizedSem (and maybe some other classes) depend on a random seed and sometimes fail. Need to fix a random seed for which they do not fail.

Consolidate FAS and FasStableConcurrent classes into one each. Pick the best.

There were several versions of each; only one is needed.

Clean up Misclassifications and put it back in the interface.

Update jars to more recent versions where possible.

Model fitness for instantiated models

I have a data of 20,000 records, my Bayes Parametric model contains latent variables and that is why I used EM Bayes Estimator to find an estimate of the parameters of the model. The problem is, the running time is very long - I waited few hours before I stopped the learning process. I have found other software (GeNIe, https://dslpitt.org/genie/) which can be used to estimate the parameters of the model for a given data and I was able to find an estimate of the parameters of my model for a shorter time. I have manually inserted the parameter values in the component "Instantiated model", however, I was not able to find a functionality to estimate the model fitness (P-value) so that I can know how good is my model. Could you please tell me whether this type of functionality exists in Tetrad?

cmu-phil / tetrad Goto Github PK

tetrad's People

Contributors

Stargazers

Watchers

Forkers

tetrad's Issues

Recommend Projects

Recommend Topics

Recommend Org