Giter Site home page Giter Site logo

Comments (68)

janamach avatar janamach commented on July 26, 2024 1

Did you remember to do pip install

Oops :-(

The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve logs?

The problems seem to start at the step 1 of solve. E.g.:

$ cat matlab_solve_m_16.log 
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_16_16.mat
07:36:23 -I- Finished loading trgraph with 166 tracklets
07:36:23 -I- Loading ids
07:36:23 -I- Finding single ant nodes
07:36:23 -I- Some preperations
07:36:23 -I- Looking for bottleneck pairs
07:36:23 -I- done distance mat
Undefined function or variable 'pairs'.
Error in trgraph/get_bottleneck_pairs (line 523)

Error in trgraph/solve (line 28)

Error in solve_single_movie (line 54)

Error in antrax_mcr_interface (line 30)
MATLAB:UndefinedFunction

$ cat matlab_solve_m_30.log 
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
07:36:24 -I- Finished loading trgraph with 374 tracklets
07:36:24 -I- Loading ids
07:36:25 -I- Finding single ant nodes
07:36:25 -I- Some preperations
07:36:25 -I- Looking for bottleneck pairs
07:36:25 -I- done distance mat
07:36:25 -I- Resetting graph id assigments
07:36:25 -I- Filtering out tracklets identified as non-ant
07:36:25 -I- ...0 tracklets classified as no-ant were filtered
07:36:25 -I- ...7 short, unconnected and unidentified tracklets were filtered
07:36:25 -I- Propagating ids from src tracklets
07:36:26 -I- Propagation loops
07:36:26 -I-     ...assigned 0 tracklets
07:36:26 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)

Error in trgraph/solve (line 150)

Error in solve_single_movie (line 54)

Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript

In this case, 16 had the MATLAB:UndefinedFunction during tracking, while 30 finished properly. The classify step finished normally in both cases.

from antrax.

janamach avatar janamach commented on July 26, 2024 1

But running solve on a local machine with MATLAB already showed errors in step 1:

$ grep -rHinoL "Done" matlab_solve_m_* 
matlab_solve_m_25.log
matlab_solve_m_28.log
matlab_solve_m_35.log
matlab_solve_m_36.log

All of those logs show the same error:

$ cat matlab_solve_m_25.log 
09:24:23 -D- initializing expreader object
09:24:23 -I- Reading video information from file
09:24:26 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.

Error in trgraph.load (line 879)
                load(fname,'G');

Error in trhandles/loaddata (line 607)
                GS = trgraph.load(Trck,movlist);

Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);

Errors also appeared during step 2:

$ cat matlab_solve_g_3.log 
09:38:37 -D- initializing expreader object
09:38:37 -I- Reading video information from file
09:38:41 -I- solving graph from movies 25-36
09:38:41 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.

Error in trgraph.load (line 879)
                load(fname,'G');

Error in trhandles/loaddata (line 607)
                GS = trgraph.load(Trck,movlist);

Error in solve_across_movies (line 70)
G = Trck.loaddata(movlist,colony);

But matlab seems to be able to load the file:

>> load antrax/graphs/graph_25_25.mat     
Warning: Variable 'G' originally saved as a trgraph cannot be instantiated as an object and will be read in as a uint32. 

And in step 3 it was quite expected:

$ grep -rHinoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log
$ cat matlab_export_m_36.log 
10:27:21 -D- initializing expreader object
10:27:21 -I- Reading video information from file
10:27:24 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_36_36.mat.

Error in trgraph.load (line 879)
                load(fname,'G');

Error in trhandles/loaddata (line 607)
                GS = trgraph.load(Trck,movlist);

Error in export_single_movie (line 51)
G = Trck.loaddata(m,colony);

from antrax.

asafgal avatar asafgal commented on July 26, 2024

from antrax.

janamach avatar janamach commented on July 26, 2024

The HPC I am using is very easy to get access to, maybe its primary purpose is training new users. I asked if they can increase my queued job quota.

Anyhow, you can run a single step of the solve by using --step 1 (or 2 or 3). Note that you have to wait for one step to fully finish before running the next one.

I tried that, it somehow didn't work:

(antrax) [fr_jm1121@uc2n994 ~]$ antrax solve H1CN0304/ --hpc --step 1 --hpc-options partition=single,[email protected],cpus=4,mem-per-cpu=4000,time=24:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh

Job number 19452619 was submitted


Jobfile created in H1CN0304/antrax/logs/hpc_solve2.sh

Job number 19452620 was submitted


Jobfile created in H1CN0304/antrax/logs/hpc_solve3.sh

sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
    sys.exit(main())
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
    """)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
    return self.func(*args, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
    ret = cli(*args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
    return func('{0} {1}'.format(name, command), *args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
    jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
    jid = out.split()[-1]
IndexError: list index out of range

If I add --dry, I get a different error:

$ antrax solve H1CN0304/ --step 2 --hpc --dry --hpc-options partition=single,[email protected],cpus=4,mem-per-cpu=4000,time=24:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh

Dry run, no job submitted.

Traceback (most recent call last):
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
    sys.exit(main())
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
    """)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
    return self.func(*args, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
    ret = cli(*args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
    return func('{0} {1}'.format(name, command), *args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
    return jid
UnboundLocalError: local variable 'jid' referenced before assignment

But the .sh file it generated has --step 1 in it, although I asked for --step 2

As far as I understand, sbatch path/to/hpc_solve1.sh in this case should be equivalent to starting the jobs through antrax interface with --step 1, is that right?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

from antrax.

janamach avatar janamach commented on July 26, 2024

One of the 60 jobs in step 1 is failing consistently, while all other 59 finished successfully. The log says:

============================= JOB FEEDBACK =============================

NodeName=uc2n405
Job ID: 19453304
Array Job ID: 19453240_50
Cluster: uc2
User/Group: fr_jm1121/fr_fr
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:01:04 core-walltime
Job Wall-clock time: 00:00:16
Memory Utilized: 1.02 MB
Memory Efficiency: 0.01% of 15.62 GB

What could be a possible reason? Is there a way to "rescue" this?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Can you look at the corresponding anTraX-generated logs? These will be session/logs/hpc_solve1_50.log and session/logs/matlab_solve_m_50.log

from antrax.

janamach avatar janamach commented on July 26, 2024

The text above is from hpc_solve1_50.log, the corresponding matlab_solve_m_50.log has not been generated.

While looking at the matlab_solve_m_*.log 's, I found more problems that were not reflected in hpc_solve1_*.log. I looked for logs that did not have the word "Done" in them with:

$ grep -rHnoL "Done" matlab_solve_m*
matlab_solve_m_21.log
matlab_solve_m_25.log
matlab_solve_m_44.log
matlab_solve_m_54.log
matlab_solve_m_59.log
matlab_solve_m_60.log

All had the same UnrecognizedVarName error:

$ cat matlab_solve_m_59.log 
18:22:16 -I- Reading video information from file
18:22:20 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)

Error in trgraph.load (line 891)

Error in trhandles/loaddata (line 607)

Error in solve_single_movie (line 52)

Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName

from antrax.

asafgal avatar asafgal commented on July 26, 2024

The UnrecognizedVarName error seems to be caused by the fact there are no classified tracklets in the video (check to see if antrax/labels/autoids_59.csv is indeed empty). This probably because either you didn't had any detections in those videos, or only multi-ant detections. Either way, I'll need to patch this. I guess I never tested the software with such a sparse tracking problem. You might be able to ignore this issue for now and continue to the next steps, but it also possible that the next steps will complain as well.

As for the error in video #50, I'm not sure. It seems the crash happened before matlab was even started, which is weird. Can you verify that the data files exist? These should be:

antrax/graphs/graph_50_50.mat
antrax/tracklets/trdata_50_50.mat
antrax/images/images_50_50.mat
antrax/labels/autoids_50_50.mat

Also try to take a look in the logs of the previous steps, maybe there will be some clues there.

from antrax.

janamach avatar janamach commented on July 26, 2024

check to see if antrax/labels/autoids_59.csv is indeed empty

No, none of the ones that showed the UnrecognizedVarName error are empty, they look pretty normal to me:

$ head autoids_59.csv 
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0

Can you verify that the data files exist? These should be:

antrax/graphs/graph_50_50.mat

Exists!

antrax/tracklets/trdata_50_50.mat

Did you mean trdata_50.mat? That exists.

antrax/images/images_50_50.mat

Did you mean images_50.mat? That exists too.

antrax/labels/autoids_50_50.mat

This doesn't exist. If you meant csv, then there is a file for each video.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

ok, weird.

This will need to be debugged on a local machine. Can you sync your data back?

Try to run solve step 1 for video 50 and see it crashes and why.

For the other error, try loading the data in an interactive matlab session with:

Trck = trhandles(uigetdir);
G = Trck.loaddata(59);

from antrax.

janamach avatar janamach commented on July 26, 2024

To keep it simple, I will compare 59 (that failed above) to 58 (completed successfully).

Running solve with either MCR or MATLAB 2019a gives the MATLAB:table:UnrecognizedVarName error in the log, but not in terminal:

$ antrax solve --step 1 --movlist 59 H1CN0304/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

07/04/21 16:14:39 -I- Starting 2 workers
07/04/21 16:14:39 -I- Started solve movie 59
07/04/21 16:14:39 -D- running matlab mcr 
07/04/21 16:14:39 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 59 trackingdirname antrax
07/04/21 16:14:39 -D- matlab app exited with code None
07/04/21 16:15:29 -I- Finished solve movie 59
07/04/21 16:15:29 -I- Workers closed

Log with MCR:

$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log 
16:14:53 -D- initializing expreader object
16:14:53 -I- Reading video information from file
16:14:57 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)

Error in trgraph.load (line 891)

Error in trhandles/loaddata (line 607)

Error in solve_single_movie (line 52)

Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName

Log with MATLAB:

$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log
16:46:49 -D- initializing expreader object
16:46:50 -I- Reading video information from file
16:46:54 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.

Error in trgraph/load_ids (line 675)
                G.trjs.load_ids;

Error in trgraph.load (line 899)
                G.load_ids;

Error in trhandles/loaddata (line 607)
                GS = trgraph.load(Trck,movlist);

Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);

Doing the same with 58 gives the same output in terminal, but a different looking log:

$ antrax solve --step 1 --movlist 58 H1CN0304/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

07/04/21 16:18:31 -I- Starting 2 workers
07/04/21 16:18:31 -I- Started solve movie 58
07/04/21 16:18:31 -D- running matlab mcr 
07/04/21 16:18:31 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 58 trackingdirname antrax
07/04/21 16:18:31 -D- matlab app exited with code None
07/04/21 16:42:02 -I- Finished solve movie 58
07/04/21 16:42:02 -I- Workers closed
$ cat H1CN0304/antrax/logs/matlab_solve_m_58.log
16:18:43 -D- initializing expreader object
16:18:43 -I- Reading video information from file
16:18:47 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:19:41 -I- Finished loading trgraph with 16476 tracklets
16:19:42 -I- Loading ids
16:19:52 -I- Finding single ant nodes
16:19:54 -I- Some preperations
16:19:56 -I- Resetting graph id assigments
16:19:56 -I- Filtering out tracklets identified as non-ant
16:19:56 -I- ...18 tracklets classified as no-ant were filtered
16:19:56 -I- ...8727 short, unconnected and unidentified tracklets were filtered
16:19:56 -I- Propagating ids from src tracklets
16:19:59 -I-     ...finished 1000/3377
16:19:59 -I-     ...finished 2000/3377
16:19:59 -I-     ...finished 3000/3377
16:19:59 -I- Propagation loops

...

16:39:59 -I- ...working on any_ant
16:40:00 -I- ......found 288 cc's 
16:40:00 -I- ......filtered 1 cc's
16:40:02 -I- ......pruned 18 nodes
16:40:02 -I- Propagation loops
16:40:03 -I-     ...assigned 0 tracklets
16:40:03 -I- Biconnected components condition (positive)
16:40:09 -I-     ...assigned 0 tracklets
16:40:09 -I- Assigning ids to tracklets
16:40:09 -I- Saving
16:41:56 -G- Done

For the interactive matlab session (59 vs 58):

>> G = Trck.loaddata(59);
16:20:11 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.

Error in trgraph/load_ids (line 675)
                G.trjs.load_ids;

Error in trgraph.load (line 899)
                G.load_ids;

Error in trhandles/loaddata (line 607)
                GS = trgraph.load(Trck,movlist);
>> G = Trck.loaddata(58);
16:23:17 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:24:21 -I- Finished loading trgraph with 16476 tracklets

from antrax.

asafgal avatar asafgal commented on July 26, 2024

In the matlab command line, try loading the problematic autoids file and display the generated table:

f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f);
head(T)

Also, run locally solve on video 50, which had a different issue.

from antrax.

janamach avatar janamach commented on July 26, 2024

Hmmmm....

>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);
>> head(T)

ans =

  8×6 table

    Var1      Var2      Var3     Var4      Var5                   Var6              
    _____    ______    ______    _____    ______    ________________________________

    'trj'    'id10'    'ti59'    13365    'tf59'    '13365,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13372    'tf59'    '13372,GGY,0.9986485838890076,1'
    'trj'    'id10'    'ti59'    13373    'tf59'    '13373,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13375    'tf59'    '13375,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13381    'tf59'    '13381,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13385    'tf59'    '13385,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13391    'tf59'    '13391,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13393    'tf59'    '13393,Unknown,0,0'             

58 looks different:

>> f = 'antrax/labels/autoids_58.csv';
>> T = readtable(f);
>> head(T)

ans =

  8×4 table

                tracklet                  label       score     best_frame
    ________________________________    _________    _______    __________

    'trj_id10_ti58_10117_tf58_10117'    'GGY'        0.99987        1     
    'trj_id10_ti58_1139_tf58_1139'      'Unknown'          0        0     
    'trj_id10_ti58_1364_tf58_1364'      'Unknown'          0        0     
    'trj_id10_ti58_1372_tf58_1372'      'Unknown'          0        0     
    'trj_id10_ti58_1389_tf58_1389'      'Unknown'          0        0     
    'trj_id10_ti58_1395_tf58_1395'      'GGY'        0.99884        1     
    'trj_id10_ti58_1401_tf58_1401'      'Unknown'          0        0     
    'trj_id10_ti58_1405_tf58_1405'      'GGY'        0.99956        1     

Looks like underscores were turned into commas in 59...
In bash these two files look very similar:

$ head autoids_59.csv 
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0
trj_id10_ti59_13381_tf59_13381,Unknown,0,0
trj_id10_ti59_13385_tf59_13385,Unknown,0,0
trj_id10_ti59_13391_tf59_13391,Unknown,0,0
trj_id10_ti59_13393_tf59_13393,Unknown,0,0
trj_id10_ti59_13396_tf59_13396,Unknown,0,0

$ head autoids_58.csv 
tracklet,label,score,best_frame
trj_id10_ti58_10117_tf58_10117,GGY,0.9998655319213867,1
trj_id10_ti58_1139_tf58_1139,Unknown,0,0
trj_id10_ti58_1364_tf58_1364,Unknown,0,0
trj_id10_ti58_1372_tf58_1372,Unknown,0,0
trj_id10_ti58_1389_tf58_1389,Unknown,0,0
trj_id10_ti58_1395_tf58_1395,GGY,0.9988380074501038,1
trj_id10_ti58_1401_tf58_1401,Unknown,0,0
trj_id10_ti58_1405_tf58_1405,GGY,0.9995608925819397,1
trj_id10_ti58_1409_tf58_1409,Unknown,0,0

Also, run locally solve on video 50, which had a different issue.

Running. This one should take longer.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

That's odd.
Try giving an explicit delimiter:

f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f, 'Delimiter', ',');
head(T)

from antrax.

janamach avatar janamach commented on July 26, 2024

Forcing it worked:

>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);   
>> head(T)

ans =

  8x6 table

    Var1      Var2      Var3     Var4      Var5                   Var6              
    _____    ______    ______    _____    ______    ________________________________

    'trj'    'id10'    'ti59'    13365    'tf59'    '13365,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13372    'tf59'    '13372,GGY,0.9986485838890076,1'
    'trj'    'id10'    'ti59'    13373    'tf59'    '13373,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13375    'tf59'    '13375,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13381    'tf59'    '13381,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13385    'tf59'    '13385,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13391    'tf59'    '13391,Unknown,0,0'             
    'trj'    'id10'    'ti59'    13393    'tf59'    '13393,Unknown,0,0'             

>> T = readtable(f, 'Delimiter', ',');
>> head(T)

ans =

  8x4 table

                tracklet                  label       score     best_frame
    ________________________________    _________    _______    __________

    'trj_id10_ti59_13365_tf59_13365'    'Unknown'          0        0     
    'trj_id10_ti59_13372_tf59_13372'    'GGY'        0.99865        1     
    'trj_id10_ti59_13373_tf59_13373'    'Unknown'          0        0     
    'trj_id10_ti59_13375_tf59_13375'    'Unknown'          0        0     
    'trj_id10_ti59_13381_tf59_13381'    'Unknown'          0        0     
    'trj_id10_ti59_13385_tf59_13385'    'Unknown'          0        0     
    'trj_id10_ti59_13391_tf59_13391'    'Unknown'          0        0     
    'trj_id10_ti59_13393_tf59_13393'    'Unknown'          0        0     

from antrax.

asafgal avatar asafgal commented on July 26, 2024

I have no explanation to this behavior...

Anyhow, I tried to patch the issue on debug-jana branch, see if it works. It also fixes the other small issues we had in this thread and the previous... I haven't tested it, so issues might pop up.

from antrax.

janamach avatar janamach commented on July 26, 2024

You are very efficient, thank you!

The readtable thing worked locally with $ antrax solve H1CN0304/ --step 1 --movlist 59:

Before pull:

$ cat matlab_solve_m_59.log 
08:56:02 -D- initializing expreader object
08:56:02 -I- Reading video information from file          
08:56:06 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)                                              
Unrecognized table variable name 'tracklet'.   
Error in trgraph/load_ids (line 667)  
                                           
Error in trgraph.load (line 891)
                                           
Error in trhandles/loaddata (line 607)  
                                           
Error in solve_single_movie (line 52)                                                 
                                           
Error in antrax_mcr_interface (line 30)  
MATLAB:table:UnrecognizedVarName        

After pull:

$ head matlab_solve_m_59.log 
08:57:32 -D- initializing expreader object
08:57:32 -I- Reading video information from file
08:57:36 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
08:58:02 -I- Finished loading trgraph with 9369 tracklets
08:58:03 -I- Loading ids
08:58:06 -I- Finding single ant nodes
08:58:07 -I- Some preperations
08:58:08 -I- Looking for bottleneck pairs
08:58:09 -I- done distance mat
09:00:59 -I- Resetting graph id assigments

$ tail matlab_solve_m_59.log 
09:14:30 -I- ......found 359 cc's 
09:14:30 -I- ......filtered 0 cc's
09:14:32 -I- ......pruned 0 nodes
09:14:32 -I- Propagation loops
09:14:32 -I-     ...assigned 0 tracklets
09:14:32 -I- Biconnected components condition (positive)
09:14:35 -I-     ...assigned 0 tracklets
09:14:35 -I- Assigning ids to tracklets
09:14:35 -I- Saving
09:15:33 -G- Done

There's another twist: I ran the solve step on a local computer with MATLAB and all files (including 50) were processed successfully and the xy csv files were generated for each video. It took it more than a day to finish, I saw the result just now.

I am now processing another experiment on the HPC starting with tracking. I got to the solve step yesterday, but it failed as multiple jobs ran into the readtable weirdness. I will let you know how it goes :-)

from antrax.

janamach avatar janamach commented on July 26, 2024

Looks like 3ce63fd worked: I ran solve for 90 videos and none of them ran into that strange readtable problem in step 1. The last one, 90, showed MATLAB:badsubscript as it barely had any tracklets, I hope it doesn't affect the further steps.

Commit 5f0cb61 didn't seem to help though, the step option is still being ignored:

$ antrax solve CN0402/ --step 3 --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in CN0402/antrax/logs/hpc_solve1.sh

Job number 19458033 was submitted


Jobfile created in CN0402/antrax/logs/hpc_solve2.sh

Job number 19458034 was submitted


Jobfile created in CN0402/antrax/logs/hpc_solve3.sh

sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
    sys.exit(main())
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
    """)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
    return self.func(*args, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
    ret = cli(*args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
    return func('{0} {1}'.format(name, command), *args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
    jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
    jid = out.split()[-1]
IndexError: list index out of range

Also with --dry:

$ antrax solve CN0402/ --step 2 --dry --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in CN0402/antrax/logs/hpc_solve1.sh

Dry run, no job submitted.

Traceback (most recent call last):
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
    sys.exit(main())
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
    """)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
    return self.func(*args, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
    ret = cli(*args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
    return func('{0} {1}'.format(name, command), *args)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
  File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
    return jid
UnboundLocalError: local variable 'jid' referenced before assignment

from antrax.

asafgal avatar asafgal commented on July 26, 2024

I fixed the dry run issue.

As for the single step run - can you verify that you are on the debug branch on the HPC? If you indeed are, can you paste here the "solve" function in the cli.py file?

from antrax.

janamach avatar janamach commented on July 26, 2024

Thank you for fixing all these things, I just finished processing the new experiment with 90 videos, I did not run into any serious errors and the files in antdata were generated.

For the single step issue:

$ git branch
* debug-jana
  master

$ less antrax/cli.py
def solve(explist, *, glist: parse_movlist=None, movlist: parse_movlist=None, clist: parse_movlist=None, mcr=False,
          nw=2, hpc=False, hpc_options: parse_hpc_options={}, missing=False, session=None, dry=False, step=0):
    """Run propagation step"""

    explist = parse_explist(explist, session)
    mcr = mcr or ANTRAX_USE_MCR
    hpc = hpc or ANTRAX_HPC

    if hpc:

        for e in explist:

            eglist = glist if glist is not None else e.glist
            emlist = [e.ggroups[g - 1] for g in eglist]
            emlist = [m for grp in emlist for m in grp]

            hpc_options['dry'] = dry
            hpc_options['classifier'] = classifier
            hpc_options['missing'] = missing
            hpc_options['glist'] = eglist
            hpc_options['movlist'] = emlist

            if e.prmtrs['geometry_multi_colony']:
                eclist = clist if clist is not None else e.clist
                for c in eclist:
                    hpc_options['c'] = c
                    hpc_options['waitfor'] = None
                    if step == 0 or step == 1:
                        jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
                        hpc_options['waitfor'] = jid
                    if step == 0 or step == 2:
                        jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
                        hpc_options['waitfor'] = jid
                    if step == 0 or step == 3:
                        jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
            else:
                hpc_options['c'] = None
                hpc_options['waitfor'] = None
                if step == 0 or step == 1:
                    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
                    hpc_options['waitfor'] = jid
                if step == 0 or step == 2:
                    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
                    hpc_options['waitfor'] = jid
                if step == 0 or step == 3:
                    jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
    else:

        Q = MatlabQueue(nw=nw, mcr=mcr)

        for e in explist:

            eglist = glist if glist is not None else e.glist
            eclist = clist if clist is not None else e.clist
            emlist = [e.ggroups[g - 1] for g in eglist]
            emlist = [m for grp in emlist for m in grp]
            if movlist is not None:
                emlist = [m for m in emlist if m in movlist]

            if step == 0 or step == 1:
                if e.prmtrs['geometry_multi_colony']:
                    for c in eclist:
                        for m in emlist:
                            w = {'fun': 'solve_single_movie'}
                            w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
                            w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '_c_' + str(c) + '.log')
                            w['str'] = 'solve colony ' + str(c) + ' movie ' + str(m)
                            Q.put(w)
                else:
                    for m in emlist:
                        w = {'fun': 'solve_single_movie'}
                        w['args'] = [e.expdir, m, 'trackingdirname', e.session]
                        w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '.log')
                        w['str'] = 'solve movie ' + str(m)
                        Q.put(w)

                # wait for single movie tasks to complete
                Q.join()

            # stitch
            if step == 0 or step == 2:
                if e.prmtrs['geometry_multi_colony']:
                    for c in eclist:
                        for g in eglist:
                            w = {'fun': 'solve_across_movies'}
                            w['args'] = [e.expdir, g, 'trackingdirname', e.session, 'colony', c]
                            w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '_c_' + str(c) + '.log')
                            w['str'] = 'solve stitch colony ' + str(c) + ' graph ' + str(g)
                            Q.put(w)
                else:
                    for g in eglist:
                        w = {'fun': 'solve_across_movies'}
                        w['args'] = [e.expdir, g, 'trackingdirname', e.session]
                        w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '.log')
                        w['str'] = 'solve stitch graph ' + str(g)
                        Q.put(w)

                # wait for stitch to finish
                Q.join()

            if step == 0 or step == 3:
                if e.prmtrs['geometry_multi_colony']:
                    for c in eclist:
                        for m in emlist:
                            w = {'fun': 'export_single_movie'}
                            w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
                            w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '_c_' + str(c) + '.log')
                            w['str'] = 'export colony ' + str(c) + ' movie ' + str(m)
                            Q.put(w)
                else:
                    for m in emlist:
                        w = {'fun': 'export_single_movie'}
                        w['args'] = [e.expdir, m, 'trackingdirname', e.session]
                        w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '.log')
                        w['str'] = 'export movie ' + str(m)
                        Q.put(w)

                # wait for stitch to finish
                Q.join()

        # close
        Q.stop_workers()

from antrax.

janamach avatar janamach commented on July 26, 2024

P.S. All this was now done on HPC

from antrax.

janamach avatar janamach commented on July 26, 2024

Unfortunately, there are more issues with that dataset despite it completeling what seemed successfully.

  1. Some csv files have not been generated even though the videos were not empty. For all of the missing csv files matlab_export_m_*.log showed a MATLAB:badsubscript error:
$ grep -rHno "MATLAB:badsubscript" matlab_export_m_* | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript

$ for i in {1..90}; do if [ -f ../antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 16
Missing: 30
Missing: 45
Missing: 48
Missing: 49
Missing: 52
Missing: 59
Missing: 62
Missing: 64
Missing: 65
Missing: 66
Missing: 68
Missing: 70
Missing: 71
Missing: 72
Missing: 78
Missing: 80
Missing: 81
Missing: 82
Missing: 83
Missing: 85
Missing: 90

Maybe relatedly, MATLAB:UndefinedFunction and MATLAB:badsubscript were popping out throughout the whole process:

$ grep -rHno "MATLAB:UndefinedFunction" | sort
matlab_solve_m_16.log:17:MATLAB:UndefinedFunction
matlab_solve_m_45.log:17:MATLAB:UndefinedFunction
matlab_solve_m_48.log:17:MATLAB:UndefinedFunction
matlab_solve_m_49.log:17:MATLAB:UndefinedFunction
matlab_solve_m_52.log:17:MATLAB:UndefinedFunction
matlab_solve_m_59.log:17:MATLAB:UndefinedFunction
matlab_solve_m_62.log:17:MATLAB:UndefinedFunction
matlab_solve_m_65.log:17:MATLAB:UndefinedFunction
matlab_solve_m_68.log:17:MATLAB:UndefinedFunction
matlab_solve_m_70.log:17:MATLAB:UndefinedFunction
matlab_solve_m_71.log:17:MATLAB:UndefinedFunction
matlab_solve_m_78.log:17:MATLAB:UndefinedFunction
matlab_solve_m_80.log:17:MATLAB:UndefinedFunction
matlab_solve_m_81.log:17:MATLAB:UndefinedFunction
matlab_solve_m_85.log:17:MATLAB:UndefinedFunction
matlab_track_m_16.log:77:MATLAB:UndefinedFunction
matlab_track_m_45.log:77:MATLAB:UndefinedFunction
matlab_track_m_48.log:86:MATLAB:UndefinedFunction
matlab_track_m_49.log:77:MATLAB:UndefinedFunction
matlab_track_m_52.log:77:MATLAB:UndefinedFunction
matlab_track_m_59.log:77:MATLAB:UndefinedFunction
matlab_track_m_62.log:77:MATLAB:UndefinedFunction
matlab_track_m_65.log:77:MATLAB:UndefinedFunction
matlab_track_m_68.log:77:MATLAB:UndefinedFunction
matlab_track_m_70.log:77:MATLAB:UndefinedFunction
matlab_track_m_71.log:77:MATLAB:UndefinedFunction
matlab_track_m_78.log:77:MATLAB:UndefinedFunction
matlab_track_m_80.log:77:MATLAB:UndefinedFunction
matlab_track_m_81.log:77:MATLAB:UndefinedFunction
matlab_track_m_85.log:77:MATLAB:UndefinedFunction

$ grep -rHno "MATLAB:badsubscript" | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript
matlab_solve_g_2.log:37:MATLAB:badsubscript
matlab_solve_g_3.log:37:MATLAB:badsubscript
matlab_solve_g_4.log:37:MATLAB:badsubscript
matlab_solve_g_5.log:37:MATLAB:badsubscript
matlab_solve_m_30.log:25:MATLAB:badsubscript
matlab_solve_m_64.log:25:MATLAB:badsubscript
matlab_solve_m_66.log:25:MATLAB:badsubscript
matlab_solve_m_72.log:25:MATLAB:badsubscript
matlab_solve_m_82.log:26:MATLAB:badsubscript
matlab_solve_m_83.log:25:MATLAB:badsubscript
matlab_solve_m_90.log:25:MATLAB:badsubscript
  1. The second problem is that validate does not work on this dataset, but works on other datasets. tried it with both MCR and MATLAB and on debug-jana and master branch:
$ antrax validate CN0402/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

11:43:15 -D- initializing expreader object
11:43:15 -I- Reading video information from file
Subscripted assignment between dissimilar structures.

Error in trhandles/loadxy (line 514)
                    xy(i) = load([xydir,xyfiles{i}]);

Error in validate_tracking/set_experiment (line 266)
            [app.XY,frames] = app.Trck.loadxy('movlist',app.ti.m:app.tf.m,'type',app.type);

Error in validate_tracking/startupFcn (line 441)
            set_experiment(app, Trck, p.Results.session)

Error in validate_tracking (line 659)
            runStartupFcn(app, @(app)startupFcn(app, varargin{:}))

Traceback (most recent call last):
  File "/home/jana/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
    sys.exit(main())
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
    """)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
    return self.func(*args, **kwargs)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
    ret = cli(*args)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
    return func('{0} {1}'.format(name, command), *args)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
    return func(*posargs, **kwargs)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 149, in validate
    launch_matlab_app('validate_tracking', args, mcr=mcr)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/matlab.py", line 204, in launch_matlab_app
    app = eval('eng.' + appname + '(' + ','.join([str(a) for a in args]) + ')')
  File "<string>", line 1, in <module>
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/matlabengine.py", line 71, in __call__
    _stderr, feval=True).result()
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/futureresult.py", line 67, in result
    return self.__future.result(timeout)
  File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/fevalfuture.py", line 82, in result
    self._result = pythonengine.getFEvalResult(self._future,self._nargout, None, out=self._out, err=self._err)
matlab.engine.MatlabExecutionError: 
  File /home/jana/src/anTraX/matlab/@trhandles/trhandles.m, line 514, in trhandles.loadxy

  File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 266, in validate_tracking.set_experiment

  File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 441, in validate_tracking.startupFcn

  File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 659, in validate_tracking.validate_tracking
Subscripted assignment between dissimilar structures.

$ antrax validate CN0402/

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

09/04/21 11:41:37 -D- running matlab mcr 
09/04/21 11:41:37 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0402/
11:41:46 -D- initializing expreader object
11:41:46 -I- Reading video information from file
Subscripted assignment between dissimilar structures.
Error in trhandles/loadxy (line 514)

Error in validate_tracking/set_experiment (line 254)

Error in validate_tracking/startupFcn (line 429)

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)

Error in matlab.apps.AppBase/runStartupFcn (line 41)

Error in validate_tracking (line 640)

Error in antrax_mcr_interface (line 20)
MATLAB:heterogeneousStrucAssignment
09/04/21 11:41:55 -D- matlab app exited with code 249

Maybe it's trying to load a non-existing file? The error inside of one of those logs looks like this:

$ cat matlab_export_m_70.log 
09:23:13 -I- Reading video information from file
09:23:17 -I- Loading trgraph from antrax/graphs/graph_70_70.mat
09:23:18 -I- Finished loading trgraph with 200 tracklets
09:23:18 -I- Loading tracklet data for movie 70
Index in position 2 exceeds array bounds.
Error in trgraph/export_xy (line 82)

Error in export_single_movie (line 52)

Error in antrax_mcr_interface (line 42)
MATLAB:badsubscript

Loading extract-trainset worked and it showed that most blobs were identified as RBR, which is wrong. Could that have contributed to the export error?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

The validate command fails because there is something wrong with the xy files, so let's try and figure that one first.

The extract-trainset command shows you the results of the blob classifier, so if it is completely off, you should try and understand why. However, it should not cause any program crash downstream, just very bad tracking results.

The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve logs?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

btw, the single step solve on hpc works properly for me. Did you remember to do pip install (this needs to be done for python code changes, but not for matlab code).

from antrax.

janamach avatar janamach commented on July 26, 2024

All files that experience MATLAB:UndefinedFunction during track also failed during solve, maybe the fix in #17 will help. Other ones (like 30, see above) had a different error during solve -- MATLAB:badsubscript.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Yes, all these errors seems related to the degenerated graph case. Let me know how that latest version does.

About the pip install, I like to to use pip install -e <path> for packages under development, as it creates a link to working directory of the package instead of copying the files, so you don't need to install again for every change or branch switching.

from antrax.

janamach avatar janamach commented on July 26, 2024

Thank you for the pip tip, I was unaware of it :-) The solve thing with the --step option works for me now too, thank you for fixing it!

I got to the solve step with the problematic datasets, here's what I got:

  • the fix from #17 worked for MATLAB:badsubscript during track
  • During the solve step, the MATLAB:UndefinedFunction error does not show up anymore, but MATLAB:badsubscript did in 22 out of 90 cases. I repeated this twice, it's always the same videos. In an attempt to fix it, I tried training the classifier specifically on the images extracted from the problematic videos, but that didn't help. Those videos are not empty, btw, there are identifiable ants on them. The matlab_solve_m_*.log typically looks like this:
$ cat matlab_solve_m_52.log 
21:33:26 -I- Reading video information from file
21:33:32 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
21:34:03 -I- Finished loading trgraph with 11734 tracklets
21:34:04 -I- Loading ids
21:34:09 -I- Finding single ant nodes
21:34:09 -I- Some preperations
21:34:10 -I- Looking for bottleneck pairs
21:34:13 -I- done distance mat
21:34:13 -I- Resetting graph id assigments
21:34:13 -I- Filtering out tracklets identified as non-ant
21:34:13 -I- ...10530 tracklets classified as no-ant were filtered
21:34:13 -I- ...2013 short, unconnected and unidentified tracklets were filtered
21:34:14 -I- Propagating ids from src tracklets
21:34:14 -I- Propagation loops
21:34:14 -I-     ...assigned 0 tracklets
21:34:14 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)

Error in trgraph/solve (line 150)

Error in solve_single_movie (line 54)

Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript

This dataset has 90 videos of 40 min. I am processing another dataset that has 60 videos, one hour each, that one takes longer to process and I didn't get to the solve step yet. If that dataset gets through the solve step properly, I will re-slice the videos for this experiment. I will also run the solve step overnight with MATLAB on a local machine to see if this error only occurs with MCR.

from antrax.

janamach avatar janamach commented on July 26, 2024

On a local machine with MATLAB solve failed too at the same spots. The error looks like this:

$ cat matlab_solve_m_52.log                                                                                          
22:31:17 -D- initializing expreader object
22:31:17 -I- Reading video information from file
22:31:19 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
22:31:48 -I- Finished loading trgraph with 11734 tracklets
22:31:48 -I- Loading ids
22:31:52 -I- Finding single ant nodes
22:31:53 -I- Some preperations
22:31:53 -I- Looking for bottleneck pairs
22:31:55 -I- done distance mat
22:31:55 -I- Resetting graph id assigments
22:31:55 -I- Filtering out tracklets identified as non-ant
22:31:55 -I- ...10530 tracklets classified as no-ant were filtered
22:31:55 -I- ...2013 short, unconnected and unidentified tracklets were filtered
22:31:55 -I- Propagating ids from src tracklets
22:31:56 -I- Propagation loops
22:31:56 -I-     ...assigned 0 tracklets
22:31:56 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.

Error in trgraph/solve>propagate_all (line 536)
G.pairs = G.pairs(argsort(G.pairs(:,3)),:);

Error in trgraph/solve (line 150)
propagate_all(G);

Error in solve_single_movie (line 54)
solve(G,false,false);

I guess the dataset is not good then?

from antrax.

janamach avatar janamach commented on July 26, 2024

Hi,

Is --movlist supposed to work during the solve step 1 on HPC? It seems to be ignored:

$ antrax solve H1CN0304/ --step 1 --movlist 50 --hpc --hpc-options partition=single,,cpus=3,mem-per-cpu=3000,time=72:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh

Job number 19464706 was submitted


$ squeue -l
Tue Apr 13 11:29:34 2021
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) 
19464706_[20-60%60    single slv1:H1C fr_jm112  PENDING       0:00 3-00:00:00      1 (Resources) 
        19464706_1    single slv1:H1C fr_jm112  RUNNING       0:02 3-00:00:00      1 uc2n421 
        19464706_2    single slv1:H1C fr_jm112  RUNNING       0:02 3-00:00:00      1 uc2n421 
        19464706_3    single slv1:H1C fr_jm112  RUNNING       0:02 3-00:00:00      1 uc2n370 
        [...]

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Once again you are right - I fixed the movlist issue.

Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset. I just need to catch all the spots that reference the problematic variable. It's hard without being able to replicate the error on my side.

from antrax.

janamach avatar janamach commented on July 26, 2024

Thank you for fixing these things :-) I am never really sure if I am right about anything.

Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset.

I don't seem to be getting these error with a different dataset... Or did you fix this days ago? I changed the dataset that was causing all these problems by re-slicing the videos into 1 hour pieces. I also finally figured out that I need to use a far larger number of epochs during the training step than the default 5, in my case I need more than 20 (45 seems like a good number when running from scratch on a good set of examples) to get loss and accuracy values closer to 0.5 and 0.95 accordingly.

And what does --missing do in the solve context?

$ antrax solve --help

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================

Usage: antrax solve [OPTIONS] explist

Run propagation step

Arguments:
  explist

Options:
  --clist=PARSE_MOVLIST
  --dry
  --glist=PARSE_MOVLIST
  --hpc
  --hpc-options=PARSE_HPC_OPTIONS    (default: {})
  --mcr
  --missing
  --movlist=PARSE_MOVLIST
  --nw=INT                           (default: 2)
  --session=STR
  --step=INT                         (default: 0)

Other actions:
  -h, --help                        Show the help

I had some jobs fail because I did not allocate enough memory for them. And some jobs seem to fail repeatedly for no obvious reason, but that can be fixed if I remove the hpc_solve1_*.log for that job. Weird.

from antrax.

janamach avatar janamach commented on July 26, 2024

Once again you are right - I fixed the movlist issue.

Works beautifully!

$ antrax solve JS16/ --step 1 --movlist 2-4 --dry  --hpc --hpc-options partition=single,cpus=3,mem-per-cpu=3000,time=72:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in JS16/antrax_demo/logs/hpc_solve1.sh

Dry run, no job submitted.


$ cat JS16/antrax_demo/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:JS16
#SBATCH --output=JS16/antrax_demo/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=3000
#SBATCH --array=2-4%3
#SBATCH --mail-type=ALL
#SBATCH --mail-user=None

srun -N1 antrax solve JS16/ --session antrax_demo --movlist $SLURM_ARRAY_TASK_ID  --nw 1  --step 1 --mcr

I used pip install -e ., very handy. Incidentally, it also doesn't prompt the strange HPC permission error I described in #13 as it did with plain pip install ..

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Using --missing with solve will run solve on videos that do not have an xy file, which is the only output file of the step. It is useful if some jobs failed, and you want to run only those. If you don't specify the step, it will run step 1 on the missing videos, then step 2 on all graphs, and then step 3 again on the missing videos.

The MATLAB:badsubscript happens on a very specific case, where the program did not find any topologically equivalent node pairs (see the paper) in the video. I never encountered such a case in my experiments, so it is very likely that you see it only in this specific dataset. Anyhow it is a good idea to patch it, even if you found a workaround, so let me know if it happens again. The fix was in my last commit, not days ago.

Regarding the classifier - definitely! usually 50-100 epochs are needed, depending on the complexity of the problem (number of classes, image resolution, etc.). I usually recommend aiming to at least 0.95 accuracy.

I understand that you already completed tracking of a few datasets, and ran the validation procedure? What accuracy do you see?

from antrax.

janamach avatar janamach commented on July 26, 2024

No, I am actually slower than it may seem :-/ With small test datasets it worked out really well, but with large ones (e.g., 60 hours) I kept making different silly mistakes that hindered my progress. For example. I realized only yesterday that I need to run the training step much longer. Hopefully I will get to the point where I will run validation on one of the large experiments sometime this week.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

ok, hopefully the effort will pay off!

from antrax.

janamach avatar janamach commented on July 26, 2024

I think --missing might not be working... One xy file of 60 was not generated, but this restarted all jobs:

$ for i in {1..60}; do if [ -f ~/H2CN0402/antrax/antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 2
$ antrax solve H2CN0402/ --missing --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00

==================================================================================

Welcome to anTraX - a software for tracking color tagged ants (and other insects)

==================================================================================


Jobfile created in H2CN0402/antrax/logs/hpc_solve1.sh

Job number 19468563 was submitted


Jobfile created in H2CN0402/antrax/logs/hpc_solve2.sh

Job number 19468564 was submitted


Jobfile created in H2CN0402/antrax/logs/hpc_solve3.sh
$ cat H2CN0402/antrax/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL

srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID  --nw 1  --step 1 --mcr


$ cat H2CN0402/antrax/logs/hpc_solve2.sh
#!/bin/bash
#SBATCH --job-name=slv2:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve2_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-5%5
#SBATCH --mail-type=ALL

srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID  --nw 1  --step 2 --mcr


$ cat H2CN0402/antrax/logs/hpc_solve3.sh
#!/bin/bash
#SBATCH --job-name=slv3:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve3_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL

srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID  --nw 1  --step 3 --mcr

from antrax.

janamach avatar janamach commented on July 26, 2024

The log of the missing file is complaining about a possibly corrupt MAT file. The file is physically there, what do you think could have caused the problem?

$ cat matlab_solve_m_2.log 
08:07:21 -I- Reading video information from file
08:07:27 -I- Loading trgraph from antrax/graphs/graph_2_2.mat
Error using load
Unable to read MAT-file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.
Error in trgraph.load (line 886)

Error in trhandles/loaddata (line 607)

Error in solve_single_movie (line 52)

Error in antrax_mcr_interface (line 30)
MATLAB:load:unableToReadMatFile

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Can you try and load the file in matlab using the load command?
If its indeed corrupted, it's possible that something interrupted the writing of the file, so it might be just a random thing. Is track step on this video finished properly? Try re-running track for that video.

I'll take a look at the --missing issue tomorrow.

from antrax.

janamach avatar janamach commented on July 26, 2024

Matlab has the same complaint:

>> addpath(genpath(['.','/matlab']));
>> load antrax/graphs/graph_2_2_trjs.mat
Error using load
Unable to read MAT-file /media/jana/HDD/bw/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.

I think I know what I did wrong: I might have started the next step before the previous one finished. On the up side, it was otherwise a very smooth process, from track to solve.

from antrax.

janamach avatar janamach commented on July 26, 2024

No, something is still wrong. After re-slicing the videos and starting everything from scratch, I had errors during solve steps 2 and 3.

In step 2 it was either MATLAB:badsubscript or MATLAB:load:cantReadFile (?):

$ grep -rHnoL "Done" matlab_solve_g_*
matlab_solve_g_3.log
matlab_solve_g_4.log
matlab_solve_g_5.log
$ cat matlab_solve_g_3.log
00:43:23 -I- Reading video information from file
00:43:32 -I- solving graph from movies 25-36
00:43:32 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)

Error in trhandles/loaddata (line 607)

Error in solve_across_movies (line 70)

Error in antrax_mcr_interface (line 53)
MATLAB:load:cantReadFile
$ cat matlab_solve_g_4.log 
00:43:51 -I- Reading video information from file
00:43:58 -I- solving graph from movies 37-48
00:43:58 -I- Loading trgraph from antrax/graphs/graph_37_37.mat
00:44:10 -I- Loading trgraph from antrax/graphs/graph_38_38.mat
00:44:14 -I- Loading trgraph from antrax/graphs/graph_39_39.mat
00:44:16 -I- Loading trgraph from antrax/graphs/graph_40_40.mat
00:44:17 -I- Loading trgraph from antrax/graphs/graph_41_41.mat
00:44:19 -I- Loading trgraph from antrax/graphs/graph_42_42.mat
00:44:21 -I- Loading trgraph from antrax/graphs/graph_43_43.mat
00:44:22 -I- Loading trgraph from antrax/graphs/graph_44_44.mat
00:44:24 -I- Loading trgraph from antrax/graphs/graph_45_45.mat
00:44:25 -I- Loading trgraph from antrax/graphs/graph_46_46.mat
00:44:27 -I- Loading trgraph from antrax/graphs/graph_47_47.mat
00:44:28 -I- Loading trgraph from antrax/graphs/graph_48_48.mat
00:44:28 -I- Finished loading trgraph with 10016 tracklets
00:44:30 -I- Loading ids
00:44:33 -I- Finding single ant nodes
00:44:33 -I- Some preperations
00:44:34 -I- Filtering out tracklets identified as non-ant
00:44:34 -I- ...690 tracklets classified as no-ant were filtered
00:44:34 -I- ...729 short, unconnected and unidentified tracklets were filtered
00:44:35 -I- Propagating ids from src tracklets
00:44:36 -I-     ...finished 1000/7355
00:44:36 -I-     ...finished 2000/7355
00:44:36 -I-     ...finished 3000/7355
00:44:36 -I-     ...finished 4000/7355
00:44:36 -I-     ...finished 5000/7355
00:44:36 -I-     ...finished 6000/7355
00:44:36 -I-     ...finished 7000/7355
00:44:36 -I- Propagation loops
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 522)

Error in trgraph/solve (line 150)

Error in solve_across_movies (line 72)

Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript

In step 3:

$ grep -rHnoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log

$ cat matlab_export_m_25.log
00:58:53 -I- Reading video information from file
00:58:58 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)

Error in trhandles/loaddata (line 607)

Error in export_single_movie (line 51)

Error in antrax_mcr_interface (line 42)
MATLAB:load:cantReadFile

None of the previous logs showed the errors.

from antrax.

janamach avatar janamach commented on July 26, 2024

The above was partially solved by re-running the track step for movies 25,28,35,36 on HPC. Step 2 showed the MATLAB:badsubscript error in all logs (5 graphs in total), but step 3 finished successfully and the missing mat/csv files have been generated.

$ cat matlab_solve_g_3.log 
09:26:30 -I- Reading video information from file
09:26:36 -I- solving graph from movies 25-36
09:26:36 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
09:26:48 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
09:26:52 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
09:26:54 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
09:26:58 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
09:27:00 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
09:27:01 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
09:27:09 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
09:27:19 -I- Finished loading trgraph with 17914 tracklets
09:27:21 -I- Loading ids
09:27:25 -I- Finding single ant nodes
09:27:26 -I- Some preperations
09:27:28 -I- Filtering out tracklets identified as non-ant
09:27:28 -I- ...8544 tracklets classified as no-ant were filtered
09:27:28 -I- ...6359 short, unconnected and unidentified tracklets were filtered
09:27:29 -I- Propagating ids from src tracklets
09:27:31 -I-     ...finished 1000/7421
09:27:31 -I-     ...finished 2000/7421
09:27:31 -I-     ...finished 3000/7421
09:27:31 -I-     ...finished 4000/7421
09:27:31 -I-     ...finished 5000/7421
09:27:31 -I-     ...finished 6000/7421
09:27:31 -I-     ...finished 7000/7421
09:27:31 -I- Propagation loops
Index in position 1 exceeds array bounds (must not exceed 14008).
Error in trgraph/solve>propagate_all (line 522)

Error in trgraph/solve (line 150)

Error in solve_across_movies (line 72)

Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript

from antrax.

asafgal avatar asafgal commented on July 26, 2024

So, if I understand correctly, the corrupted file issue was solved by the rerun?

Regarding the new MATLAB:badsubscript error, it is different than the previous one we had above. I'm not sure what's going on there. After you tracked some of the videos again, did you also run the classify and solve1?

Step 2 actually "stitch" the graphs of individual videos, and propagate information from one video to another. In practice, it is not actually required, and that is why step 3 is able to finish properly. The tracking might be sub optimal at the interface between the videos.

from antrax.

janamach avatar janamach commented on July 26, 2024

So, if I understand correctly, the corrupted file issue was solved by the rerun?

Yes. It looks like there was some strange error happening that was not reflected in the logs, but produced some corrupt graph MAT files during track. At least that's my best explanation.

After you tracked some of the videos again, did you also run the classify and solve1?

I tried both actually, both worked. But I went with the latter one. What consequences would re-running track and then going directly to solve have on detections?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Theoretically, the algorithm is completely deterministic, so the two runs should have the same tracklet graph and tracklet names. However, there are occasionally some small misalignments between runs that I cannot explain.. Also, when you run track, it cleans some of the data generated by later steps, so it is better to run also the downstream steps.

I'm not sure what you mean by "both worked". Was the latest MATLAB:badsubscript in step 2 solved?

from antrax.

janamach avatar janamach commented on July 26, 2024

Sorry, I made it too confusing. It looks like I've been dealing with two separate problems (they just looked like one at first): xy files not being generated after step 3 and step 2 showing different errors (either MATLAB:badsubscript with MCR or Index in position 1 exceeds array bounds with MATLAB). With "both worked" I was referring to the first problem that was caused by the corrupt graph files generated during track and fixed by re-running either just track and then solve, or track, classify, and solve.

Was the latest MATLAB:badsubscript in step 2 solved?

No, it is still happening.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

ok, so let's try to understand this new MATLAB:badsubscript better (its the same error on MCR/matlab, just reported differently). As I said, it's a different one than the one we had before on this thread. We'll have to do it the painful way, as I can't reproduce it on my side.

I've added a few lines of code to report some info on the problematic place.
Run it on interactive matlab:

Trck = trhandles(uigetdir);
solve_across_movies(Trck, 'g', 3);

from antrax.

janamach avatar janamach commented on July 26, 2024

Hmm, maybe I am doing something wrong here:

>> addpath(genpath(['.','/matlab']));
>> Trck = trhandles(uigetdir);       
Warning: uigetdir is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display. For more information, see "Changes to
-nodisplay and -noFigureWindows Startup Options" in the MATLAB Release Notes. To view the release note in your system browser, run
web('www.mathworks.com/help/matlab/release-notes.html#br5ktrh-3', '-browser') 
> In warnfiguredialog (line 21)
  In uigetdir (line 60) 
Error using javaObjectEDT
Scalar input must be a java object

Error in matlab.ui.internal.dialog.Dialog/getParentFrame (line 46)
               obj.ParentFrame = javaObjectEDT(com.mathworks.hg.peer.utils.DialogUtilities.createParentWindow);

Error in matlab.ui.internal.dialog.FileSystemChooser/getParentFrame (line 129)
                parframe = [email protected](obj);

Error in matlab.ui.internal.dialog.FolderChooser/doShowDialog (line 70)
            javaMethodEDT('showOpenDialog', obj.Peer, getParentFrame(obj));

Error in matlab.ui.internal.dialog.FolderChooser/show (line 48)
            doShowDialog(obj)

Error in uigetdir_helper (line 32)
    dirdlg.show();

Error in uigetdir (line 61)
[directoryname] = uigetdir_helper(varargin{:});
 
>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
	at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
	at java.awt.Window.<init>(Window.java:536)
	at java.awt.Frame.<init>(Frame.java:420)
	at javax.swing.JFrame.<init>(JFrame.java:233)
	at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:108)
	at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:101)
	at com.mathworks.hg.peer.utils.DialogUtilities$1.runWithOutput(DialogUtilities.java:56)
	at com.mathworks.jmi.AWTUtilities$Invoker$2.watchedRun(AWTUtilities.java:475)
	at com.mathworks.jmi.AWTUtilities$WatchedRunnable.run(AWTUtilities.java:436)
	at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
	at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
	at java.awt.EventQueue.access$500(EventQueue.java:97)
	at java.awt.EventQueue$3.run(EventQueue.java:709)
	at java.awt.EventQueue$3.run(EventQueue.java:703)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
	at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)

from antrax.

asafgal avatar asafgal commented on July 26, 2024

from antrax.

janamach avatar janamach commented on July 26, 2024

Sorry, that was silly of me :-/ With the dataset that had the problem:

>> Trck = trhandles('.');
21:25:09 -I- Loading tracking session from expdir
21:25:17 -I- Reading video information from file
>> solve_across_movies(Trck, 'g', 3);
Error using solve_across_movies (line 11)
Expected a string scalar or character vector for the parameter name.
 
>> 

from antrax.

asafgal avatar asafgal commented on July 26, 2024

from antrax.

janamach avatar janamach commented on July 26, 2024
>> solve_across_movies(Trck, 3);
21:45:08 -I- solving graph from movies 25-36
21:45:08 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
21:45:37 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
21:45:55 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
21:46:11 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
21:46:24 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
21:46:31 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
21:46:37 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
21:46:42 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
21:46:49 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
21:46:57 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
21:47:02 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
21:47:06 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
21:47:08 -I- Finished loading trgraph with 80451 tracklets
21:47:12 -I- Loading ids
21:47:31 -I- Finding single ant nodes
21:47:33 -I- Some preperations
21:47:38 -I- Filtering out tracklets identified as non-ant
21:47:38 -I- ...1082 tracklets classified as no-ant were filtered
21:47:39 -I- ...13588 short, unconnected and unidentified tracklets were filtered
21:47:41 -I- Propagating ids from src tracklets
21:47:45 -I-     ...finished 1000/25235
21:47:45 -I-     ...finished 2000/25235
21:47:45 -I-     ...finished 3000/25235
21:47:45 -I-     ...finished 4000/25235
21:47:45 -I-     ...finished 5000/25235
21:47:45 -I-     ...finished 6000/25235
21:47:45 -I-     ...finished 7000/25235
21:47:45 -I-     ...finished 8000/25235
21:47:45 -I-     ...finished 9000/25235
21:47:45 -I-     ...finished 10000/25235
21:47:45 -I-     ...finished 11000/25235
21:47:45 -I-     ...finished 12000/25235
21:47:45 -I-     ...finished 13000/25235
21:47:45 -I-     ...finished 14000/25235
21:47:45 -I-     ...finished 15000/25235
21:47:45 -I-     ...finished 16000/25235
21:47:45 -I-     ...finished 17000/25235
21:47:45 -I-     ...finished 18000/25235
21:47:45 -I-     ...finished 19000/25235
21:47:45 -I-     ...finished 20000/25235
21:47:45 -I-     ...finished 21000/25235
21:47:46 -I-     ...finished 22000/25235
21:47:46 -I-     ...finished 23000/25235
21:47:46 -I-     ...finished 24000/25235
21:47:46 -I-     ...finished 25000/25235
21:47:46 -I- Propagation loops
Index in position 1 exceeds array bounds.

Error in trgraph/solve>propagate_all (line 522)
            score = G.assignment_scores(assigned_nodes(i),idix(j));

Error in trgraph/solve (line 150)
propagate_all(G);

Error in solve_across_movies (line 72)
solve(G,false,true);
 
>> 

from antrax.

asafgal avatar asafgal commented on July 26, 2024

You don't seem to have my code changes. Did you pull my latest commit?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

BTW, did you re-track your videos with different parameters? You have 80K tracklets in the latest run, while in the previous run you had 17K for the same videos.

from antrax.

janamach avatar janamach commented on July 26, 2024

hmm, I was on the wrong branch. Sorry :-/

BTW, did you re-track your videos with different parameters?

I have two datasets, both containing 60 hours of videos, but significantly different in tracklet numbers. Both datasets had issues with step2, their parameters are not identical, and their classifiers are different.

Dataset-1:

22:08:20 -I- solving graph from movies 25-36
22:08:20 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
22:08:50 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
22:09:09 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
22:09:24 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
22:09:37 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
22:09:45 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
22:09:50 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
22:09:56 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
22:10:02 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
22:10:10 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
22:10:15 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
22:10:19 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
22:10:21 -I- Finished loading trgraph with 80451 tracklets
22:10:25 -I- Loading ids
22:10:45 -I- Finding single ant nodes
22:10:47 -I- Some preperations
22:10:51 -I- Filtering out tracklets identified as non-ant
22:10:51 -I- ...1082 tracklets classified as no-ant were filtered
22:10:52 -I- ...13588 short, unconnected and unidentified tracklets were filtered
22:10:54 -I- Propagating ids from src tracklets
22:10:58 -I-     ...finished 1000/25235
22:10:58 -I-     ...finished 2000/25235
22:10:58 -I-     ...finished 3000/25235
22:10:58 -I-     ...finished 4000/25235
22:10:58 -I-     ...finished 5000/25235
22:10:58 -I-     ...finished 6000/25235
22:10:58 -I-     ...finished 7000/25235
22:10:58 -I-     ...finished 8000/25235
22:10:58 -I-     ...finished 9000/25235
22:10:58 -I-     ...finished 10000/25235
22:10:58 -I-     ...finished 11000/25235
22:10:58 -I-     ...finished 12000/25235
22:10:58 -I-     ...finished 13000/25235
22:10:58 -I-     ...finished 14000/25235
22:10:58 -I-     ...finished 15000/25235
22:10:58 -I-     ...finished 16000/25235
22:10:58 -I-     ...finished 17000/25235
22:10:58 -I-     ...finished 18000/25235
22:10:58 -I-     ...finished 19000/25235
22:10:58 -I-     ...finished 20000/25235
22:10:59 -I-     ...finished 21000/25235
22:10:59 -I-     ...finished 22000/25235
22:10:59 -I-     ...finished 23000/25235
22:10:59 -I-     ...finished 24000/25235
22:10:59 -I-     ...finished 25000/25235
22:10:59 -I- Propagation loops
22:10:59 -E- error in propagate_all
22:10:59 -I- node is 1
22:10:59 -I- size of assignment scores is 0  0
22:10:59 -I- size of assignment ids is 80451     76
Index in position 1 exceeds array bounds.

Error in trgraph/solve>propagate_all (line 523)
                score = G.assignment_scores(assigned_nodes(i),idix(j));

Error in trgraph/solve (line 150)
propagate_all(G);

Error in solve_across_movies (line 72)
solve(G,false,true);
 
>> 

Dataset-2:

>> solve_across_movies(Trck, 3);
22:15:29 -I- solving graph from movies 25-36
22:15:29 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
22:15:37 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
22:15:39 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
22:15:40 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
22:15:42 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
22:15:42 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
22:15:44 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
22:15:45 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
22:15:50 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
22:15:56 -I- Finished loading trgraph with 17914 tracklets
22:15:57 -I- Loading ids
22:16:00 -I- Finding single ant nodes
22:16:00 -I- Some preperations
22:16:01 -I- Filtering out tracklets identified as non-ant
22:16:01 -I- ...8544 tracklets classified as no-ant were filtered
22:16:01 -I- ...6359 short, unconnected and unidentified tracklets were filtered
22:16:02 -I- Propagating ids from src tracklets
22:16:03 -I-     ...finished 1000/7421
22:16:03 -I-     ...finished 2000/7421
22:16:03 -I-     ...finished 3000/7421
22:16:03 -I-     ...finished 4000/7421
22:16:03 -I-     ...finished 5000/7421
22:16:03 -I-     ...finished 6000/7421
22:16:03 -I-     ...finished 7000/7421
22:16:03 -I- Propagation loops
22:16:03 -E- error in propagate_all
22:16:03 -I- node is 2
22:16:03 -I- size of assignment scores is 0  0
22:16:03 -I- size of assignment ids is 17914     76
Index in position 1 exceeds array bounds.

Error in trgraph/solve>propagate_all (line 523)
                score = G.assignment_scores(assigned_nodes(i),idix(j));

Error in trgraph/solve (line 150)
propagate_all(G);

Error in solve_across_movies (line 72)
solve(G,false,true);
 
>> 

from antrax.

janamach avatar janamach commented on July 26, 2024

What does this error mean exactly? I am wondering if I am causing it by using the software not as intended...

In my assay, the ants are free to come into and leave the frame. They all have access to the nest and inevitably some unmarked ants come in. I trained the classifier to recognize unmarked ants as any_ant just to separate them from the marked ants I am interested in. But any_ant should be at one place at a time, while in reality there can be more than one unmarked ants in the frame. Hence, it's very possible that at the intersection between movies the data does not look as expected.

I guess if this is the problem, I should either classify unmarked ants as NoAnt to remove them from my data or skip step 2. What would you recommend?

from antrax.

asafgal avatar asafgal commented on July 26, 2024

No, it was a real bug, and I was able to reproduce it.

There was a variable that was assigned during solve1, but was cleared after solve2. So, when you try to run solve2 again without running solve1, it complains. However, I think this small bug only masks the problem from above, which is on the same variable. So try running all solve steps again..

Your any_ant solution is fine, but not ideal as you understand. Do you have good separation between this class and the Unknown class (marked ant that cannot be identified because of posture, bad image etc)? If so, you can try defining the any_ant class in the "NoAnt" category, which sounds weird, but all it does is telling the algorithm this is a category that cannot be individually tracked. I use it for larvae, food items etc.

from antrax.

janamach avatar janamach commented on July 26, 2024

So try running all solve steps again..

It worked locally with MATLAB with the dataset that has fewer tracklets! Will also test it on the other dataset on HPC, but it will probably take very long.

Do you have good separation between this class and the Unknown class (marked ant that cannot be identified because of posture, bad image etc)?

It's probably not very good right now because I did not consider this at all when I was selecting the examples. I think I will stick with any_ant ant class solution for these two datasets and try the alternative with the next experiments.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Great, let me know.

It's probably not very good right now because I did not consider this at all when I was selecting the examples. I think I will stick with any_ant ant class solution for these two datasets and try the alternative with the next experiments.

Sounds good. You want to get to the point where the tracking pipeline works, and the performance can be easily estimated. You can then tune the pipeline accordingly, depending on what you actually need in order to do your science.

from antrax.

janamach avatar janamach commented on July 26, 2024

With the fist dataset (many tracklets!) I had another problem during step 2: the graphs that did not show the error, would fail because the job would run out of memory. 2CPU / 4 GB per CPU was plenty for the the second dataset with less tracklets, while this one would fail with 10GB per CPU. I increased it to 24GB per CPU and then it was running for two days, I ended up cancelling the job because I wasn't sure if it was stuck in an infinite loop. I tried running it locally too, it indeed used a lot of memory (over 30GB of RAM a couple of hours into the job on a machine with 64GB of RAM).

Is that something you would expect with a dataset with a very high number of tracklets?

from antrax.

janamach avatar janamach commented on July 26, 2024

It's probably worth mentioning that one of the classify jobs in from that "heavy" dataset takes about 60 hours to finish with 2 CPU / 2 GB, many other ones take more than one day. Hmmm....

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Why does your dataset have so many tracklets? What is your video duration and frame rate? I was under the impression that your tracking is expected to be very sparse. Do you have many detections of non-ant blobs?

Generally though, the classify step can benefit from more cpus, especially if the average tracklet length is longer than a few seconds. The best way to chose the resources per job is to run locally one task, and see what is the typical cpu/mem consumption.

The solve step is not resource heavy usually, not in cpu and not in memory. 30GB is something I never encountered. What is the typical file size in session/graphs/*.mat?

from antrax.

janamach avatar janamach commented on July 26, 2024

What is your video duration and frame rate?

60 minutes, 12 videos per subdir, @ 25fps. Cataglyphis are fast...

I was under the impression that your tracking is expected to be very sparse.

Not always.

Do you have many detections of non-ant blobs?

Some, but not many. The vast majority of detections are ants.

What is the typical file size in session/graphs/*.mat?

Highly variable, from 1MB to 70+ MB.

I will write you an email later, it will make much more sense if I explain you the experiment.

from antrax.

lizimai avatar lizimai commented on July 26, 2024

The above was partially solved by re-running the track step for movies 25,28,35,36 on HPC. Step 2 showed the MATLAB:badsubscript error in all logs (5 graphs in total), but step 3 finished successfully and the missing mat/csv files have been generated.

$ cat matlab_solve_g_3.log 
09:26:30 -I- Reading video information from file
09:26:36 -I- solving graph from movies 25-36
09:26:36 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
09:26:48 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
09:26:52 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
09:26:54 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
09:26:58 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
09:27:00 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
09:27:01 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
09:27:09 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
09:27:19 -I- Finished loading trgraph with 17914 tracklets
09:27:21 -I- Loading ids
09:27:25 -I- Finding single ant nodes
09:27:26 -I- Some preperations
09:27:28 -I- Filtering out tracklets identified as non-ant
09:27:28 -I- ...8544 tracklets classified as no-ant were filtered
09:27:28 -I- ...6359 short, unconnected and unidentified tracklets were filtered
09:27:29 -I- Propagating ids from src tracklets
09:27:31 -I-     ...finished 1000/7421
09:27:31 -I-     ...finished 2000/7421
09:27:31 -I-     ...finished 3000/7421
09:27:31 -I-     ...finished 4000/7421
09:27:31 -I-     ...finished 5000/7421
09:27:31 -I-     ...finished 6000/7421
09:27:31 -I-     ...finished 7000/7421
09:27:31 -I- Propagation loops
Index in position 1 exceeds array bounds (must not exceed 14008).
Error in trgraph/solve>propagate_all (line 522)

Error in trgraph/solve (line 150)

Error in solve_across_movies (line 72)

Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript

Hi @janamach , I encountered similar issues as you, but on a much larger scale. I have 6 colonies per video, and 211 out of 334 videos had at least one colony failed at the solve --step 1. I am wondering, if rerun the track is the only solution you discovered so far?

from antrax.

janamach avatar janamach commented on July 26, 2024

Hi @lizimai . What happens when you run solve for one colony that failed during the solve step by adding the --clist option? What do the logs say for track and solve for the affected videos?

In my experience, many problems in the later steps are caused by issues during the track step and re-running track can sometimes help. Another issue that I found on my side was corrupt video files. In such cases, the track step would almost finish and exit with an error. In such cases I would re-encode the videos (or trim if possible). Unfortunately I am not aware of alternative solutions that would not involve rerunning track.

P.S. are you using the latest anTraX version? The problems I described in this issue were solved in 417223f as far as I remember...

from antrax.

asafgal avatar asafgal commented on July 26, 2024

Zimai, Jana is right - the first thing you should do is make sure the tracking step finished ok for the failed cases by looking at the corresponding logs. Can you post the errors you see in a new issue thread? This one is closed and actually very convoluted with what turned out to be many small problems.

from antrax.

lizimai avatar lizimai commented on July 26, 2024

Hi both, thanks for the suggestion and sorry for bringing up this thread again. I will redo the track and open a new thread.

from antrax.

asafgal avatar asafgal commented on July 26, 2024

from antrax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.