Comments (68)
Did you remember to do pip install
Oops :-(
The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve logs?
The problems seem to start at the step 1
of solve. E.g.:
$ cat matlab_solve_m_16.log
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_16_16.mat
07:36:23 -I- Finished loading trgraph with 166 tracklets
07:36:23 -I- Loading ids
07:36:23 -I- Finding single ant nodes
07:36:23 -I- Some preperations
07:36:23 -I- Looking for bottleneck pairs
07:36:23 -I- done distance mat
Undefined function or variable 'pairs'.
Error in trgraph/get_bottleneck_pairs (line 523)
Error in trgraph/solve (line 28)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:UndefinedFunction
$ cat matlab_solve_m_30.log
07:36:14 -I- Reading video information from file
07:36:22 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
07:36:24 -I- Finished loading trgraph with 374 tracklets
07:36:24 -I- Loading ids
07:36:25 -I- Finding single ant nodes
07:36:25 -I- Some preperations
07:36:25 -I- Looking for bottleneck pairs
07:36:25 -I- done distance mat
07:36:25 -I- Resetting graph id assigments
07:36:25 -I- Filtering out tracklets identified as non-ant
07:36:25 -I- ...0 tracklets classified as no-ant were filtered
07:36:25 -I- ...7 short, unconnected and unidentified tracklets were filtered
07:36:25 -I- Propagating ids from src tracklets
07:36:26 -I- Propagation loops
07:36:26 -I- ...assigned 0 tracklets
07:36:26 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
Error in trgraph/solve (line 150)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript
In this case, 16
had the MATLAB:UndefinedFunction
during tracking, while 30
finished properly. The classify
step finished normally in both cases.
from antrax.
But running solve on a local machine with MATLAB already showed errors in step 1:
$ grep -rHinoL "Done" matlab_solve_m_*
matlab_solve_m_25.log
matlab_solve_m_28.log
matlab_solve_m_35.log
matlab_solve_m_36.log
All of those logs show the same error:
$ cat matlab_solve_m_25.log
09:24:23 -D- initializing expreader object
09:24:23 -I- Reading video information from file
09:24:26 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);
Errors also appeared during step 2:
$ cat matlab_solve_g_3.log
09:38:37 -D- initializing expreader object
09:38:37 -I- Reading video information from file
09:38:41 -I- solving graph from movies 25-36
09:38:41 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_across_movies (line 70)
G = Trck.loaddata(movlist,colony);
But matlab seems to be able to load the file:
>> load antrax/graphs/graph_25_25.mat
Warning: Variable 'G' originally saved as a trgraph cannot be instantiated as an object and will be read in as a uint32.
And in step 3 it was quite expected:
$ grep -rHinoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log
$ cat matlab_export_m_36.log
10:27:21 -D- initializing expreader object
10:27:21 -I- Reading video information from file
10:27:24 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
Error using load
Cannot read file /media/jana/HDD/H2CN0402/antrax/graphs/graph_36_36.mat.
Error in trgraph.load (line 879)
load(fname,'G');
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in export_single_movie (line 51)
G = Trck.loaddata(m,colony);
from antrax.
from antrax.
The HPC I am using is very easy to get access to, maybe its primary purpose is training new users. I asked if they can increase my queued job quota.
Anyhow, you can run a single step of the solve by using --step 1 (or 2 or 3). Note that you have to wait for one step to fully finish before running the next one.
I tried that, it somehow didn't work:
(antrax) [fr_jm1121@uc2n994 ~]$ antrax solve H1CN0304/ --hpc --step 1 --hpc-options partition=single,[email protected],cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Job number 19452619 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve2.sh
Job number 19452620 was submitted
Jobfile created in H1CN0304/antrax/logs/hpc_solve3.sh
sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
jid = out.split()[-1]
IndexError: list index out of range
If I add --dry
, I get a different error:
$ antrax solve H1CN0304/ --step 2 --hpc --dry --hpc-options partition=single,[email protected],cpus=4,mem-per-cpu=4000,time=24:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Dry run, no job submitted.
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
return jid
UnboundLocalError: local variable 'jid' referenced before assignment
But the .sh
file it generated has --step 1
in it, although I asked for --step 2
As far as I understand, sbatch path/to/hpc_solve1.sh
in this case should be equivalent to starting the jobs through antrax interface with --step 1
, is that right?
from antrax.
from antrax.
One of the 60 jobs in step 1 is failing consistently, while all other 59 finished successfully. The log says:
============================= JOB FEEDBACK =============================
NodeName=uc2n405
Job ID: 19453304
Array Job ID: 19453240_50
Cluster: uc2
User/Group: fr_jm1121/fr_fr
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:01:04 core-walltime
Job Wall-clock time: 00:00:16
Memory Utilized: 1.02 MB
Memory Efficiency: 0.01% of 15.62 GB
What could be a possible reason? Is there a way to "rescue" this?
from antrax.
Can you look at the corresponding anTraX-generated logs? These will be session/logs/hpc_solve1_50.log and session/logs/matlab_solve_m_50.log
from antrax.
The text above is from hpc_solve1_50.log
, the corresponding matlab_solve_m_50.log
has not been generated.
While looking at the matlab_solve_m_*.log
's, I found more problems that were not reflected in hpc_solve1_*.log
. I looked for logs that did not have the word "Done" in them with:
$ grep -rHnoL "Done" matlab_solve_m*
matlab_solve_m_21.log
matlab_solve_m_25.log
matlab_solve_m_44.log
matlab_solve_m_54.log
matlab_solve_m_59.log
matlab_solve_m_60.log
All had the same UnrecognizedVarName
error:
$ cat matlab_solve_m_59.log
18:22:16 -I- Reading video information from file
18:22:20 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
from antrax.
The UnrecognizedVarName error seems to be caused by the fact there are no classified tracklets in the video (check to see if antrax/labels/autoids_59.csv is indeed empty). This probably because either you didn't had any detections in those videos, or only multi-ant detections. Either way, I'll need to patch this. I guess I never tested the software with such a sparse tracking problem. You might be able to ignore this issue for now and continue to the next steps, but it also possible that the next steps will complain as well.
As for the error in video #50, I'm not sure. It seems the crash happened before matlab was even started, which is weird. Can you verify that the data files exist? These should be:
antrax/graphs/graph_50_50.mat
antrax/tracklets/trdata_50_50.mat
antrax/images/images_50_50.mat
antrax/labels/autoids_50_50.mat
Also try to take a look in the logs of the previous steps, maybe there will be some clues there.
from antrax.
check to see if antrax/labels/autoids_59.csv is indeed empty
No, none of the ones that showed the UnrecognizedVarName
error are empty, they look pretty normal to me:
$ head autoids_59.csv
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0
Can you verify that the data files exist? These should be:
antrax/graphs/graph_50_50.mat
Exists!
antrax/tracklets/trdata_50_50.mat
Did you mean trdata_50.mat
? That exists.
antrax/images/images_50_50.mat
Did you mean images_50.mat
? That exists too.
antrax/labels/autoids_50_50.mat
This doesn't exist. If you meant csv
, then there is a file for each video.
from antrax.
ok, weird.
This will need to be debugged on a local machine. Can you sync your data back?
Try to run solve step 1 for video 50 and see it crashes and why.
For the other error, try loading the data in an interactive matlab session with:
Trck = trhandles(uigetdir);
G = Trck.loaddata(59);
from antrax.
To keep it simple, I will compare 59 (that failed above) to 58 (completed successfully).
Running solve
with either MCR or MATLAB 2019a gives the MATLAB:table:UnrecognizedVarName
error in the log, but not in terminal:
$ antrax solve --step 1 --movlist 59 H1CN0304/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
07/04/21 16:14:39 -I- Starting 2 workers
07/04/21 16:14:39 -I- Started solve movie 59
07/04/21 16:14:39 -D- running matlab mcr
07/04/21 16:14:39 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 59 trackingdirname antrax
07/04/21 16:14:39 -D- matlab app exited with code None
07/04/21 16:15:29 -I- Finished solve movie 59
07/04/21 16:15:29 -I- Workers closed
Log with MCR:
$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log
16:14:53 -D- initializing expreader object
16:14:53 -I- Reading video information from file
16:14:57 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
Log with MATLAB:
$ cat H1CN0304/antrax/logs/matlab_solve_m_59.log
16:46:49 -D- initializing expreader object
16:46:50 -I- Reading video information from file
16:46:54 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 675)
G.trjs.load_ids;
Error in trgraph.load (line 899)
G.load_ids;
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
Error in solve_single_movie (line 52)
G = Trck.loaddata(m,colony);
Doing the same with 58
gives the same output in terminal, but a different looking log:
$ antrax solve --step 1 --movlist 58 H1CN0304/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
07/04/21 16:18:31 -I- Starting 2 workers
07/04/21 16:18:31 -I- Started solve movie 58
07/04/21 16:18:31 -D- running matlab mcr
07/04/21 16:18:31 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface solve_single_movie H1CN0304/ 58 trackingdirname antrax
07/04/21 16:18:31 -D- matlab app exited with code None
07/04/21 16:42:02 -I- Finished solve movie 58
07/04/21 16:42:02 -I- Workers closed
$ cat H1CN0304/antrax/logs/matlab_solve_m_58.log
16:18:43 -D- initializing expreader object
16:18:43 -I- Reading video information from file
16:18:47 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:19:41 -I- Finished loading trgraph with 16476 tracklets
16:19:42 -I- Loading ids
16:19:52 -I- Finding single ant nodes
16:19:54 -I- Some preperations
16:19:56 -I- Resetting graph id assigments
16:19:56 -I- Filtering out tracklets identified as non-ant
16:19:56 -I- ...18 tracklets classified as no-ant were filtered
16:19:56 -I- ...8727 short, unconnected and unidentified tracklets were filtered
16:19:56 -I- Propagating ids from src tracklets
16:19:59 -I- ...finished 1000/3377
16:19:59 -I- ...finished 2000/3377
16:19:59 -I- ...finished 3000/3377
16:19:59 -I- Propagation loops
...
16:39:59 -I- ...working on any_ant
16:40:00 -I- ......found 288 cc's
16:40:00 -I- ......filtered 1 cc's
16:40:02 -I- ......pruned 18 nodes
16:40:02 -I- Propagation loops
16:40:03 -I- ...assigned 0 tracklets
16:40:03 -I- Biconnected components condition (positive)
16:40:09 -I- ...assigned 0 tracklets
16:40:09 -I- Assigning ids to tracklets
16:40:09 -I- Saving
16:41:56 -G- Done
For the interactive matlab session (59 vs 58):
>> G = Trck.loaddata(59);
16:20:11 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 675)
G.trjs.load_ids;
Error in trgraph.load (line 899)
G.load_ids;
Error in trhandles/loaddata (line 607)
GS = trgraph.load(Trck,movlist);
>> G = Trck.loaddata(58);
16:23:17 -I- Loading trgraph from antrax/graphs/graph_58_58.mat
16:24:21 -I- Finished loading trgraph with 16476 tracklets
from antrax.
In the matlab command line, try loading the problematic autoids file and display the generated table:
f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f);
head(T)
Also, run locally solve on video 50, which had a different issue.
from antrax.
Hmmmm....
>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);
>> head(T)
ans =
8×6 table
Var1 Var2 Var3 Var4 Var5 Var6
_____ ______ ______ _____ ______ ________________________________
'trj' 'id10' 'ti59' 13365 'tf59' '13365,Unknown,0,0'
'trj' 'id10' 'ti59' 13372 'tf59' '13372,GGY,0.9986485838890076,1'
'trj' 'id10' 'ti59' 13373 'tf59' '13373,Unknown,0,0'
'trj' 'id10' 'ti59' 13375 'tf59' '13375,Unknown,0,0'
'trj' 'id10' 'ti59' 13381 'tf59' '13381,Unknown,0,0'
'trj' 'id10' 'ti59' 13385 'tf59' '13385,Unknown,0,0'
'trj' 'id10' 'ti59' 13391 'tf59' '13391,Unknown,0,0'
'trj' 'id10' 'ti59' 13393 'tf59' '13393,Unknown,0,0'
58 looks different:
>> f = 'antrax/labels/autoids_58.csv';
>> T = readtable(f);
>> head(T)
ans =
8×4 table
tracklet label score best_frame
________________________________ _________ _______ __________
'trj_id10_ti58_10117_tf58_10117' 'GGY' 0.99987 1
'trj_id10_ti58_1139_tf58_1139' 'Unknown' 0 0
'trj_id10_ti58_1364_tf58_1364' 'Unknown' 0 0
'trj_id10_ti58_1372_tf58_1372' 'Unknown' 0 0
'trj_id10_ti58_1389_tf58_1389' 'Unknown' 0 0
'trj_id10_ti58_1395_tf58_1395' 'GGY' 0.99884 1
'trj_id10_ti58_1401_tf58_1401' 'Unknown' 0 0
'trj_id10_ti58_1405_tf58_1405' 'GGY' 0.99956 1
Looks like underscores were turned into commas in 59...
In bash these two files look very similar:
$ head autoids_59.csv
tracklet,label,score,best_frame
trj_id10_ti59_13365_tf59_13365,Unknown,0,0
trj_id10_ti59_13372_tf59_13372,GGY,0.9986485838890076,1
trj_id10_ti59_13373_tf59_13373,Unknown,0,0
trj_id10_ti59_13375_tf59_13375,Unknown,0,0
trj_id10_ti59_13381_tf59_13381,Unknown,0,0
trj_id10_ti59_13385_tf59_13385,Unknown,0,0
trj_id10_ti59_13391_tf59_13391,Unknown,0,0
trj_id10_ti59_13393_tf59_13393,Unknown,0,0
trj_id10_ti59_13396_tf59_13396,Unknown,0,0
$ head autoids_58.csv
tracklet,label,score,best_frame
trj_id10_ti58_10117_tf58_10117,GGY,0.9998655319213867,1
trj_id10_ti58_1139_tf58_1139,Unknown,0,0
trj_id10_ti58_1364_tf58_1364,Unknown,0,0
trj_id10_ti58_1372_tf58_1372,Unknown,0,0
trj_id10_ti58_1389_tf58_1389,Unknown,0,0
trj_id10_ti58_1395_tf58_1395,GGY,0.9988380074501038,1
trj_id10_ti58_1401_tf58_1401,Unknown,0,0
trj_id10_ti58_1405_tf58_1405,GGY,0.9995608925819397,1
trj_id10_ti58_1409_tf58_1409,Unknown,0,0
Also, run locally solve on video 50, which had a different issue.
Running. This one should take longer.
from antrax.
That's odd.
Try giving an explicit delimiter:
f = 'antrax/labels/autoids_59_59.csv';
T = readtable(f, 'Delimiter', ',');
head(T)
from antrax.
Forcing it worked:
>> f = 'antrax/labels/autoids_59.csv';
>> T = readtable(f);
>> head(T)
ans =
8x6 table
Var1 Var2 Var3 Var4 Var5 Var6
_____ ______ ______ _____ ______ ________________________________
'trj' 'id10' 'ti59' 13365 'tf59' '13365,Unknown,0,0'
'trj' 'id10' 'ti59' 13372 'tf59' '13372,GGY,0.9986485838890076,1'
'trj' 'id10' 'ti59' 13373 'tf59' '13373,Unknown,0,0'
'trj' 'id10' 'ti59' 13375 'tf59' '13375,Unknown,0,0'
'trj' 'id10' 'ti59' 13381 'tf59' '13381,Unknown,0,0'
'trj' 'id10' 'ti59' 13385 'tf59' '13385,Unknown,0,0'
'trj' 'id10' 'ti59' 13391 'tf59' '13391,Unknown,0,0'
'trj' 'id10' 'ti59' 13393 'tf59' '13393,Unknown,0,0'
>> T = readtable(f, 'Delimiter', ',');
>> head(T)
ans =
8x4 table
tracklet label score best_frame
________________________________ _________ _______ __________
'trj_id10_ti59_13365_tf59_13365' 'Unknown' 0 0
'trj_id10_ti59_13372_tf59_13372' 'GGY' 0.99865 1
'trj_id10_ti59_13373_tf59_13373' 'Unknown' 0 0
'trj_id10_ti59_13375_tf59_13375' 'Unknown' 0 0
'trj_id10_ti59_13381_tf59_13381' 'Unknown' 0 0
'trj_id10_ti59_13385_tf59_13385' 'Unknown' 0 0
'trj_id10_ti59_13391_tf59_13391' 'Unknown' 0 0
'trj_id10_ti59_13393_tf59_13393' 'Unknown' 0 0
from antrax.
I have no explanation to this behavior...
Anyhow, I tried to patch the issue on debug-jana branch, see if it works. It also fixes the other small issues we had in this thread and the previous... I haven't tested it, so issues might pop up.
from antrax.
You are very efficient, thank you!
The readtable thing worked locally with $ antrax solve H1CN0304/ --step 1 --movlist 59
:
Before pull:
$ cat matlab_solve_m_59.log
08:56:02 -D- initializing expreader object
08:56:02 -I- Reading video information from file
08:56:06 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
Error using tracklet/load_ids (line 746)
Unrecognized table variable name 'tracklet'.
Error in trgraph/load_ids (line 667)
Error in trgraph.load (line 891)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:table:UnrecognizedVarName
After pull:
$ head matlab_solve_m_59.log
08:57:32 -D- initializing expreader object
08:57:32 -I- Reading video information from file
08:57:36 -I- Loading trgraph from antrax/graphs/graph_59_59.mat
08:58:02 -I- Finished loading trgraph with 9369 tracklets
08:58:03 -I- Loading ids
08:58:06 -I- Finding single ant nodes
08:58:07 -I- Some preperations
08:58:08 -I- Looking for bottleneck pairs
08:58:09 -I- done distance mat
09:00:59 -I- Resetting graph id assigments
$ tail matlab_solve_m_59.log
09:14:30 -I- ......found 359 cc's
09:14:30 -I- ......filtered 0 cc's
09:14:32 -I- ......pruned 0 nodes
09:14:32 -I- Propagation loops
09:14:32 -I- ...assigned 0 tracklets
09:14:32 -I- Biconnected components condition (positive)
09:14:35 -I- ...assigned 0 tracklets
09:14:35 -I- Assigning ids to tracklets
09:14:35 -I- Saving
09:15:33 -G- Done
There's another twist: I ran the solve
step on a local computer with MATLAB and all files (including 50
) were processed successfully and the xy
csv files were generated for each video. It took it more than a day to finish, I saw the result just now.
I am now processing another experiment on the HPC starting with tracking. I got to the solve step yesterday, but it failed as multiple jobs ran into the readtable weirdness. I will let you know how it goes :-)
from antrax.
Looks like 3ce63fd worked: I ran solve
for 90 videos and none of them ran into that strange readtable problem in step 1. The last one, 90, showed MATLAB:badsubscript
as it barely had any tracklets, I hope it doesn't affect the further steps.
Commit 5f0cb61 didn't seem to help though, the step option is still being ignored:
$ antrax solve CN0402/ --step 3 --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in CN0402/antrax/logs/hpc_solve1.sh
Job number 19458033 was submitted
Jobfile created in CN0402/antrax/logs/hpc_solve2.sh
Job number 19458034 was submitted
Jobfile created in CN0402/antrax/logs/hpc_solve3.sh
sbatch: error: AssocMaxSubmitJobLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 297, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 258, in antrax_hpc_job
jid = submit_slurm_job_file(jobfile, waitfor=waitfor)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 80, in submit_slurm_job_file
jid = out.split()[-1]
IndexError: list index out of range
Also with --dry
:
$ antrax solve CN0402/ --step 2 --dry --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in CN0402/antrax/logs/hpc_solve1.sh
Dry run, no job submitted.
Traceback (most recent call last):
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 293, in solve
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
File "/home/fr/fr_fr/fr_jm1121/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/hpc.py", line 265, in antrax_hpc_job
return jid
UnboundLocalError: local variable 'jid' referenced before assignment
from antrax.
I fixed the dry run issue.
As for the single step run - can you verify that you are on the debug branch on the HPC? If you indeed are, can you paste here the "solve" function in the cli.py file?
from antrax.
Thank you for fixing all these things, I just finished processing the new experiment with 90 videos, I did not run into any serious errors and the files in antdata
were generated.
For the single step issue:
$ git branch
* debug-jana
master
$ less antrax/cli.py
def solve(explist, *, glist: parse_movlist=None, movlist: parse_movlist=None, clist: parse_movlist=None, mcr=False,
nw=2, hpc=False, hpc_options: parse_hpc_options={}, missing=False, session=None, dry=False, step=0):
"""Run propagation step"""
explist = parse_explist(explist, session)
mcr = mcr or ANTRAX_USE_MCR
hpc = hpc or ANTRAX_HPC
if hpc:
for e in explist:
eglist = glist if glist is not None else e.glist
emlist = [e.ggroups[g - 1] for g in eglist]
emlist = [m for grp in emlist for m in grp]
hpc_options['dry'] = dry
hpc_options['classifier'] = classifier
hpc_options['missing'] = missing
hpc_options['glist'] = eglist
hpc_options['movlist'] = emlist
if e.prmtrs['geometry_multi_colony']:
eclist = clist if clist is not None else e.clist
for c in eclist:
hpc_options['c'] = c
hpc_options['waitfor'] = None
if step == 0 or step == 1:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
hpc_options['waitfor'] = jid
if step == 0 or step == 2:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
hpc_options['waitfor'] = jid
if step == 0 or step == 3:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
else:
hpc_options['c'] = None
hpc_options['waitfor'] = None
if step == 0 or step == 1:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=1)
hpc_options['waitfor'] = jid
if step == 0 or step == 2:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=2)
hpc_options['waitfor'] = jid
if step == 0 or step == 3:
jid = antrax_hpc_job(e, 'solve', opts=hpc_options, solve_step=3)
else:
Q = MatlabQueue(nw=nw, mcr=mcr)
for e in explist:
eglist = glist if glist is not None else e.glist
eclist = clist if clist is not None else e.clist
emlist = [e.ggroups[g - 1] for g in eglist]
emlist = [m for grp in emlist for m in grp]
if movlist is not None:
emlist = [m for m in emlist if m in movlist]
if step == 0 or step == 1:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for m in emlist:
w = {'fun': 'solve_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '_c_' + str(c) + '.log')
w['str'] = 'solve colony ' + str(c) + ' movie ' + str(m)
Q.put(w)
else:
for m in emlist:
w = {'fun': 'solve_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_solve_m_' + str(m) + '.log')
w['str'] = 'solve movie ' + str(m)
Q.put(w)
# wait for single movie tasks to complete
Q.join()
# stitch
if step == 0 or step == 2:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for g in eglist:
w = {'fun': 'solve_across_movies'}
w['args'] = [e.expdir, g, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '_c_' + str(c) + '.log')
w['str'] = 'solve stitch colony ' + str(c) + ' graph ' + str(g)
Q.put(w)
else:
for g in eglist:
w = {'fun': 'solve_across_movies'}
w['args'] = [e.expdir, g, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_solve_g_' + str(g) + '.log')
w['str'] = 'solve stitch graph ' + str(g)
Q.put(w)
# wait for stitch to finish
Q.join()
if step == 0 or step == 3:
if e.prmtrs['geometry_multi_colony']:
for c in eclist:
for m in emlist:
w = {'fun': 'export_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session, 'colony', c]
w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '_c_' + str(c) + '.log')
w['str'] = 'export colony ' + str(c) + ' movie ' + str(m)
Q.put(w)
else:
for m in emlist:
w = {'fun': 'export_single_movie'}
w['args'] = [e.expdir, m, 'trackingdirname', e.session]
w['diary'] = join(e.logsdir, 'matlab_export_m_' + str(m) + '.log')
w['str'] = 'export movie ' + str(m)
Q.put(w)
# wait for stitch to finish
Q.join()
# close
Q.stop_workers()
from antrax.
P.S. All this was now done on HPC
from antrax.
Unfortunately, there are more issues with that dataset despite it completeling what seemed successfully.
- Some csv files have not been generated even though the videos were not empty. For all of the missing csv files matlab_export_m_*.log showed a
MATLAB:badsubscript
error:
$ grep -rHno "MATLAB:badsubscript" matlab_export_m_* | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript
$ for i in {1..90}; do if [ -f ../antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 16
Missing: 30
Missing: 45
Missing: 48
Missing: 49
Missing: 52
Missing: 59
Missing: 62
Missing: 64
Missing: 65
Missing: 66
Missing: 68
Missing: 70
Missing: 71
Missing: 72
Missing: 78
Missing: 80
Missing: 81
Missing: 82
Missing: 83
Missing: 85
Missing: 90
Maybe relatedly, MATLAB:UndefinedFunction
and MATLAB:badsubscript
were popping out throughout the whole process:
$ grep -rHno "MATLAB:UndefinedFunction" | sort
matlab_solve_m_16.log:17:MATLAB:UndefinedFunction
matlab_solve_m_45.log:17:MATLAB:UndefinedFunction
matlab_solve_m_48.log:17:MATLAB:UndefinedFunction
matlab_solve_m_49.log:17:MATLAB:UndefinedFunction
matlab_solve_m_52.log:17:MATLAB:UndefinedFunction
matlab_solve_m_59.log:17:MATLAB:UndefinedFunction
matlab_solve_m_62.log:17:MATLAB:UndefinedFunction
matlab_solve_m_65.log:17:MATLAB:UndefinedFunction
matlab_solve_m_68.log:17:MATLAB:UndefinedFunction
matlab_solve_m_70.log:17:MATLAB:UndefinedFunction
matlab_solve_m_71.log:17:MATLAB:UndefinedFunction
matlab_solve_m_78.log:17:MATLAB:UndefinedFunction
matlab_solve_m_80.log:17:MATLAB:UndefinedFunction
matlab_solve_m_81.log:17:MATLAB:UndefinedFunction
matlab_solve_m_85.log:17:MATLAB:UndefinedFunction
matlab_track_m_16.log:77:MATLAB:UndefinedFunction
matlab_track_m_45.log:77:MATLAB:UndefinedFunction
matlab_track_m_48.log:86:MATLAB:UndefinedFunction
matlab_track_m_49.log:77:MATLAB:UndefinedFunction
matlab_track_m_52.log:77:MATLAB:UndefinedFunction
matlab_track_m_59.log:77:MATLAB:UndefinedFunction
matlab_track_m_62.log:77:MATLAB:UndefinedFunction
matlab_track_m_65.log:77:MATLAB:UndefinedFunction
matlab_track_m_68.log:77:MATLAB:UndefinedFunction
matlab_track_m_70.log:77:MATLAB:UndefinedFunction
matlab_track_m_71.log:77:MATLAB:UndefinedFunction
matlab_track_m_78.log:77:MATLAB:UndefinedFunction
matlab_track_m_80.log:77:MATLAB:UndefinedFunction
matlab_track_m_81.log:77:MATLAB:UndefinedFunction
matlab_track_m_85.log:77:MATLAB:UndefinedFunction
$ grep -rHno "MATLAB:badsubscript" | sort
matlab_export_m_16.log:11:MATLAB:badsubscript
matlab_export_m_30.log:11:MATLAB:badsubscript
matlab_export_m_45.log:11:MATLAB:badsubscript
matlab_export_m_48.log:11:MATLAB:badsubscript
matlab_export_m_49.log:11:MATLAB:badsubscript
matlab_export_m_52.log:11:MATLAB:badsubscript
matlab_export_m_59.log:11:MATLAB:badsubscript
matlab_export_m_62.log:11:MATLAB:badsubscript
matlab_export_m_64.log:11:MATLAB:badsubscript
matlab_export_m_65.log:11:MATLAB:badsubscript
matlab_export_m_66.log:11:MATLAB:badsubscript
matlab_export_m_68.log:11:MATLAB:badsubscript
matlab_export_m_70.log:11:MATLAB:badsubscript
matlab_export_m_71.log:11:MATLAB:badsubscript
matlab_export_m_72.log:11:MATLAB:badsubscript
matlab_export_m_78.log:11:MATLAB:badsubscript
matlab_export_m_80.log:11:MATLAB:badsubscript
matlab_export_m_81.log:11:MATLAB:badsubscript
matlab_export_m_82.log:11:MATLAB:badsubscript
matlab_export_m_83.log:11:MATLAB:badsubscript
matlab_export_m_85.log:11:MATLAB:badsubscript
matlab_export_m_90.log:11:MATLAB:badsubscript
matlab_solve_g_2.log:37:MATLAB:badsubscript
matlab_solve_g_3.log:37:MATLAB:badsubscript
matlab_solve_g_4.log:37:MATLAB:badsubscript
matlab_solve_g_5.log:37:MATLAB:badsubscript
matlab_solve_m_30.log:25:MATLAB:badsubscript
matlab_solve_m_64.log:25:MATLAB:badsubscript
matlab_solve_m_66.log:25:MATLAB:badsubscript
matlab_solve_m_72.log:25:MATLAB:badsubscript
matlab_solve_m_82.log:26:MATLAB:badsubscript
matlab_solve_m_83.log:25:MATLAB:badsubscript
matlab_solve_m_90.log:25:MATLAB:badsubscript
- The second problem is that
validate
does not work on this dataset, but works on other datasets. tried it with both MCR and MATLAB and on debug-jana and master branch:
$ antrax validate CN0402/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
11:43:15 -D- initializing expreader object
11:43:15 -I- Reading video information from file
Subscripted assignment between dissimilar structures.
Error in trhandles/loadxy (line 514)
xy(i) = load([xydir,xyfiles{i}]);
Error in validate_tracking/set_experiment (line 266)
[app.XY,frames] = app.Trck.loadxy('movlist',app.ti.m:app.tf.m,'type',app.type);
Error in validate_tracking/startupFcn (line 441)
set_experiment(app, Trck, p.Results.session)
Error in validate_tracking (line 659)
runStartupFcn(app, @(app)startupFcn(app, varargin{:}))
Traceback (most recent call last):
File "/home/jana/anaconda3/envs/antrax/bin/antrax", line 8, in <module>
sys.exit(main())
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 651, in main
""")
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 262, in _cli
return func('{0} {1}'.format(name, command), *args)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/cli.py", line 149, in validate
launch_matlab_app('validate_tracking', args, mcr=mcr)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/antrax/matlab.py", line 204, in launch_matlab_app
app = eval('eng.' + appname + '(' + ','.join([str(a) for a in args]) + ')')
File "<string>", line 1, in <module>
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/matlabengine.py", line 71, in __call__
_stderr, feval=True).result()
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/futureresult.py", line 67, in result
return self.__future.result(timeout)
File "/home/jana/anaconda3/envs/antrax/lib/python3.6/site-packages/matlab/engine/fevalfuture.py", line 82, in result
self._result = pythonengine.getFEvalResult(self._future,self._nargout, None, out=self._out, err=self._err)
matlab.engine.MatlabExecutionError:
File /home/jana/src/anTraX/matlab/@trhandles/trhandles.m, line 514, in trhandles.loadxy
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 266, in validate_tracking.set_experiment
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 441, in validate_tracking.startupFcn
File /home/jana/src/anTraX/matlab/apps/validate_tracking.mlapp, line 659, in validate_tracking.validate_tracking
Subscripted assignment between dissimilar structures.
$ antrax validate CN0402/
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
09/04/21 11:41:37 -D- running matlab mcr
09/04/21 11:41:37 -D- command is: /home/jana/src/anTraX/bin/antrax_glnxa64_mcr_interface validate_tracking CN0402/
11:41:46 -D- initializing expreader object
11:41:46 -I- Reading video information from file
Subscripted assignment between dissimilar structures.
Error in trhandles/loadxy (line 514)
Error in validate_tracking/set_experiment (line 254)
Error in validate_tracking/startupFcn (line 429)
Error in appdesigner.internal.service.AppManagementService/tryCallback (line 336)
Error in matlab.apps.AppBase/runStartupFcn (line 41)
Error in validate_tracking (line 640)
Error in antrax_mcr_interface (line 20)
MATLAB:heterogeneousStrucAssignment
09/04/21 11:41:55 -D- matlab app exited with code 249
Maybe it's trying to load a non-existing file? The error inside of one of those logs looks like this:
$ cat matlab_export_m_70.log
09:23:13 -I- Reading video information from file
09:23:17 -I- Loading trgraph from antrax/graphs/graph_70_70.mat
09:23:18 -I- Finished loading trgraph with 200 tracklets
09:23:18 -I- Loading tracklet data for movie 70
Index in position 2 exceeds array bounds.
Error in trgraph/export_xy (line 82)
Error in export_single_movie (line 52)
Error in antrax_mcr_interface (line 42)
MATLAB:badsubscript
Loading extract-trainset
worked and it showed that most blobs were identified as RBR, which is wrong. Could that have contributed to the export error?
from antrax.
The validate
command fails because there is something wrong with the xy files, so let's try and figure that one first.
The extract-trainset
command shows you the results of the blob classifier, so if it is completely off, you should try and understand why. However, it should not cause any program crash downstream, just very bad tracking results.
The error in the export log suggests that the solve step fails on that video. Can you see if there is something weird in the corresponding solve
logs?
from antrax.
btw, the single step solve on hpc works properly for me. Did you remember to do pip install
(this needs to be done for python code changes, but not for matlab code).
from antrax.
All files that experience MATLAB:UndefinedFunction
during track
also failed during solve
, maybe the fix in #17 will help. Other ones (like 30
, see above) had a different error during solve
-- MATLAB:badsubscript
.
from antrax.
Yes, all these errors seems related to the degenerated graph case. Let me know how that latest version does.
About the pip install, I like to to use pip install -e <path>
for packages under development, as it creates a link to working directory of the package instead of copying the files, so you don't need to install again for every change or branch switching.
from antrax.
Thank you for the pip tip, I was unaware of it :-) The solve
thing with the --step option works for me now too, thank you for fixing it!
I got to the solve
step with the problematic datasets, here's what I got:
- the fix from #17 worked for
MATLAB:badsubscript
duringtrack
- During the
solve
step, theMATLAB:UndefinedFunction
error does not show up anymore, butMATLAB:badsubscript
did in 22 out of 90 cases. I repeated this twice, it's always the same videos. In an attempt to fix it, I tried training the classifier specifically on the images extracted from the problematic videos, but that didn't help. Those videos are not empty, btw, there are identifiable ants on them. Thematlab_solve_m_*.log
typically looks like this:
$ cat matlab_solve_m_52.log
21:33:26 -I- Reading video information from file
21:33:32 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
21:34:03 -I- Finished loading trgraph with 11734 tracklets
21:34:04 -I- Loading ids
21:34:09 -I- Finding single ant nodes
21:34:09 -I- Some preperations
21:34:10 -I- Looking for bottleneck pairs
21:34:13 -I- done distance mat
21:34:13 -I- Resetting graph id assigments
21:34:13 -I- Filtering out tracklets identified as non-ant
21:34:13 -I- ...10530 tracklets classified as no-ant were filtered
21:34:13 -I- ...2013 short, unconnected and unidentified tracklets were filtered
21:34:14 -I- Propagating ids from src tracklets
21:34:14 -I- Propagation loops
21:34:14 -I- ...assigned 0 tracklets
21:34:14 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
Error in trgraph/solve (line 150)
Error in solve_single_movie (line 54)
Error in antrax_mcr_interface (line 30)
MATLAB:badsubscript
This dataset has 90 videos of 40 min. I am processing another dataset that has 60 videos, one hour each, that one takes longer to process and I didn't get to the solve step yet. If that dataset gets through the solve step properly, I will re-slice the videos for this experiment. I will also run the solve step overnight with MATLAB on a local machine to see if this error only occurs with MCR.
from antrax.
On a local machine with MATLAB solve
failed too at the same spots. The error looks like this:
$ cat matlab_solve_m_52.log
22:31:17 -D- initializing expreader object
22:31:17 -I- Reading video information from file
22:31:19 -I- Loading trgraph from antrax/graphs/graph_52_52.mat
22:31:48 -I- Finished loading trgraph with 11734 tracklets
22:31:48 -I- Loading ids
22:31:52 -I- Finding single ant nodes
22:31:53 -I- Some preperations
22:31:53 -I- Looking for bottleneck pairs
22:31:55 -I- done distance mat
22:31:55 -I- Resetting graph id assigments
22:31:55 -I- Filtering out tracklets identified as non-ant
22:31:55 -I- ...10530 tracklets classified as no-ant were filtered
22:31:55 -I- ...2013 short, unconnected and unidentified tracklets were filtered
22:31:55 -I- Propagating ids from src tracklets
22:31:56 -I- Propagation loops
22:31:56 -I- ...assigned 0 tracklets
22:31:56 -I- Biconnected components condition (positive)
Index in position 2 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 536)
G.pairs = G.pairs(argsort(G.pairs(:,3)),:);
Error in trgraph/solve (line 150)
propagate_all(G);
Error in solve_single_movie (line 54)
solve(G,false,false);
I guess the dataset is not good then?
from antrax.
Hi,
Is --movlist supposed to work during the solve step 1 on HPC? It seems to be ignored:
$ antrax solve H1CN0304/ --step 1 --movlist 50 --hpc --hpc-options partition=single,,cpus=3,mem-per-cpu=3000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H1CN0304/antrax/logs/hpc_solve1.sh
Job number 19464706 was submitted
$ squeue -l
Tue Apr 13 11:29:34 2021
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
19464706_[20-60%60 single slv1:H1C fr_jm112 PENDING 0:00 3-00:00:00 1 (Resources)
19464706_1 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n421
19464706_2 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n421
19464706_3 single slv1:H1C fr_jm112 RUNNING 0:02 3-00:00:00 1 uc2n370
[...]
from antrax.
Once again you are right - I fixed the movlist issue.
Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset. I just need to catch all the spots that reference the problematic variable. It's hard without being able to replicate the error on my side.
from antrax.
Thank you for fixing these things :-) I am never really sure if I am right about anything.
Also made a new fix to the MATLAB:badsubscript issue. Try it now... Its a program bug, not an issue with your dataset.
I don't seem to be getting these error with a different dataset... Or did you fix this days ago? I changed the dataset that was causing all these problems by re-slicing the videos into 1 hour pieces. I also finally figured out that I need to use a far larger number of epochs during the training step than the default 5, in my case I need more than 20 (45 seems like a good number when running from scratch on a good set of examples) to get loss and accuracy values closer to 0.5 and 0.95 accordingly.
And what does --missing
do in the solve
context?
$ antrax solve --help
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Usage: antrax solve [OPTIONS] explist
Run propagation step
Arguments:
explist
Options:
--clist=PARSE_MOVLIST
--dry
--glist=PARSE_MOVLIST
--hpc
--hpc-options=PARSE_HPC_OPTIONS (default: {})
--mcr
--missing
--movlist=PARSE_MOVLIST
--nw=INT (default: 2)
--session=STR
--step=INT (default: 0)
Other actions:
-h, --help Show the help
I had some jobs fail because I did not allocate enough memory for them. And some jobs seem to fail repeatedly for no obvious reason, but that can be fixed if I remove the hpc_solve1_*.log
for that job. Weird.
from antrax.
Once again you are right - I fixed the movlist issue.
Works beautifully!
$ antrax solve JS16/ --step 1 --movlist 2-4 --dry --hpc --hpc-options partition=single,cpus=3,mem-per-cpu=3000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in JS16/antrax_demo/logs/hpc_solve1.sh
Dry run, no job submitted.
$ cat JS16/antrax_demo/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:JS16
#SBATCH --output=JS16/antrax_demo/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=3000
#SBATCH --array=2-4%3
#SBATCH --mail-type=ALL
#SBATCH --mail-user=None
srun -N1 antrax solve JS16/ --session antrax_demo --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 1 --mcr
I used pip install -e .
, very handy. Incidentally, it also doesn't prompt the strange HPC permission error I described in #13 as it did with plain pip install .
.
from antrax.
Using --missing
with solve will run solve on videos that do not have an xy file, which is the only output file of the step. It is useful if some jobs failed, and you want to run only those. If you don't specify the step, it will run step 1 on the missing videos, then step 2 on all graphs, and then step 3 again on the missing videos.
The MATLAB:badsubscript happens on a very specific case, where the program did not find any topologically equivalent node pairs (see the paper) in the video. I never encountered such a case in my experiments, so it is very likely that you see it only in this specific dataset. Anyhow it is a good idea to patch it, even if you found a workaround, so let me know if it happens again. The fix was in my last commit, not days ago.
Regarding the classifier - definitely! usually 50-100 epochs are needed, depending on the complexity of the problem (number of classes, image resolution, etc.). I usually recommend aiming to at least 0.95 accuracy.
I understand that you already completed tracking of a few datasets, and ran the validation procedure? What accuracy do you see?
from antrax.
No, I am actually slower than it may seem :-/ With small test datasets it worked out really well, but with large ones (e.g., 60 hours) I kept making different silly mistakes that hindered my progress. For example. I realized only yesterday that I need to run the training step much longer. Hopefully I will get to the point where I will run validation on one of the large experiments sometime this week.
from antrax.
ok, hopefully the effort will pay off!
from antrax.
I think --missing
might not be working... One xy file of 60 was not generated, but this restarted all jobs:
$ for i in {1..60}; do if [ -f ~/H2CN0402/antrax/antdata/xy_${i}_${i}.csv ]; then : ; else echo "Missing: ${i}" ; fi; done
Missing: 2
$ antrax solve H2CN0402/ --missing --hpc --hpc-options partition=single,cpus=2,mem-per-cpu=2000,time=72:00:00
==================================================================================
Welcome to anTraX - a software for tracking color tagged ants (and other insects)
==================================================================================
Jobfile created in H2CN0402/antrax/logs/hpc_solve1.sh
Job number 19468563 was submitted
Jobfile created in H2CN0402/antrax/logs/hpc_solve2.sh
Job number 19468564 was submitted
Jobfile created in H2CN0402/antrax/logs/hpc_solve3.sh
$ cat H2CN0402/antrax/logs/hpc_solve1.sh
#!/bin/bash
#SBATCH --job-name=slv1:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve1_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 1 --mcr
$ cat H2CN0402/antrax/logs/hpc_solve2.sh
#!/bin/bash
#SBATCH --job-name=slv2:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve2_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-5%5
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 2 --mcr
$ cat H2CN0402/antrax/logs/hpc_solve3.sh
#!/bin/bash
#SBATCH --job-name=slv3:H2CN0402
#SBATCH --output=H2CN0402/antrax/logs/hpc_solve3_%a.log
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=72:00:00
#SBATCH --mem-per-cpu=2000
#SBATCH --array=1-60%60
#SBATCH --mail-type=ALL
srun -N1 antrax solve H2CN0402/ --session antrax --movlist $SLURM_ARRAY_TASK_ID --nw 1 --step 3 --mcr
from antrax.
The log of the missing file is complaining about a possibly corrupt MAT file. The file is physically there, what do you think could have caused the problem?
$ cat matlab_solve_m_2.log
08:07:21 -I- Reading video information from file
08:07:27 -I- Loading trgraph from antrax/graphs/graph_2_2.mat
Error using load
Unable to read MAT-file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.
Error in trgraph.load (line 886)
Error in trhandles/loaddata (line 607)
Error in solve_single_movie (line 52)
Error in antrax_mcr_interface (line 30)
MATLAB:load:unableToReadMatFile
from antrax.
Can you try and load the file in matlab using the load command?
If its indeed corrupted, it's possible that something interrupted the writing of the file, so it might be just a random thing. Is track step on this video finished properly? Try re-running track for that video.
I'll take a look at the --missing
issue tomorrow.
from antrax.
Matlab has the same complaint:
>> addpath(genpath(['.','/matlab']));
>> load antrax/graphs/graph_2_2_trjs.mat
Error using load
Unable to read MAT-file /media/jana/HDD/bw/H2CN0402/antrax/graphs/graph_2_2_trjs.mat. File might be corrupt.
I think I know what I did wrong: I might have started the next step before the previous one finished. On the up side, it was otherwise a very smooth process, from track to solve.
from antrax.
No, something is still wrong. After re-slicing the videos and starting everything from scratch, I had errors during solve steps 2 and 3.
In step 2 it was either MATLAB:badsubscript or MATLAB:load:cantReadFile (?):
$ grep -rHnoL "Done" matlab_solve_g_*
matlab_solve_g_3.log
matlab_solve_g_4.log
matlab_solve_g_5.log
$ cat matlab_solve_g_3.log
00:43:23 -I- Reading video information from file
00:43:32 -I- solving graph from movies 25-36
00:43:32 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
Error in trhandles/loaddata (line 607)
Error in solve_across_movies (line 70)
Error in antrax_mcr_interface (line 53)
MATLAB:load:cantReadFile
$ cat matlab_solve_g_4.log
00:43:51 -I- Reading video information from file
00:43:58 -I- solving graph from movies 37-48
00:43:58 -I- Loading trgraph from antrax/graphs/graph_37_37.mat
00:44:10 -I- Loading trgraph from antrax/graphs/graph_38_38.mat
00:44:14 -I- Loading trgraph from antrax/graphs/graph_39_39.mat
00:44:16 -I- Loading trgraph from antrax/graphs/graph_40_40.mat
00:44:17 -I- Loading trgraph from antrax/graphs/graph_41_41.mat
00:44:19 -I- Loading trgraph from antrax/graphs/graph_42_42.mat
00:44:21 -I- Loading trgraph from antrax/graphs/graph_43_43.mat
00:44:22 -I- Loading trgraph from antrax/graphs/graph_44_44.mat
00:44:24 -I- Loading trgraph from antrax/graphs/graph_45_45.mat
00:44:25 -I- Loading trgraph from antrax/graphs/graph_46_46.mat
00:44:27 -I- Loading trgraph from antrax/graphs/graph_47_47.mat
00:44:28 -I- Loading trgraph from antrax/graphs/graph_48_48.mat
00:44:28 -I- Finished loading trgraph with 10016 tracklets
00:44:30 -I- Loading ids
00:44:33 -I- Finding single ant nodes
00:44:33 -I- Some preperations
00:44:34 -I- Filtering out tracklets identified as non-ant
00:44:34 -I- ...690 tracklets classified as no-ant were filtered
00:44:34 -I- ...729 short, unconnected and unidentified tracklets were filtered
00:44:35 -I- Propagating ids from src tracklets
00:44:36 -I- ...finished 1000/7355
00:44:36 -I- ...finished 2000/7355
00:44:36 -I- ...finished 3000/7355
00:44:36 -I- ...finished 4000/7355
00:44:36 -I- ...finished 5000/7355
00:44:36 -I- ...finished 6000/7355
00:44:36 -I- ...finished 7000/7355
00:44:36 -I- Propagation loops
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 522)
Error in trgraph/solve (line 150)
Error in solve_across_movies (line 72)
Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript
In step 3:
$ grep -rHnoL "Done" matlab_export_m_*
matlab_export_m_25.log
matlab_export_m_28.log
matlab_export_m_35.log
matlab_export_m_36.log
$ cat matlab_export_m_25.log
00:58:53 -I- Reading video information from file
00:58:58 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
Error using load
Cannot read file /pfs/data5/home/fr/fr_fr/fr_jm1121/H2CN0402/antrax/graphs/graph_25_25.mat.
Error in trgraph.load (line 879)
Error in trhandles/loaddata (line 607)
Error in export_single_movie (line 51)
Error in antrax_mcr_interface (line 42)
MATLAB:load:cantReadFile
None of the previous logs showed the errors.
from antrax.
The above was partially solved by re-running the track step for movies 25,28,35,36 on HPC. Step 2 showed the MATLAB:badsubscript
error in all logs (5 graphs in total), but step 3 finished successfully and the missing mat/csv files have been generated.
$ cat matlab_solve_g_3.log
09:26:30 -I- Reading video information from file
09:26:36 -I- solving graph from movies 25-36
09:26:36 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
09:26:48 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
09:26:52 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
09:26:54 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
09:26:57 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
09:26:58 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
09:26:59 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
09:27:00 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
09:27:01 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
09:27:09 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
09:27:19 -I- Finished loading trgraph with 17914 tracklets
09:27:21 -I- Loading ids
09:27:25 -I- Finding single ant nodes
09:27:26 -I- Some preperations
09:27:28 -I- Filtering out tracklets identified as non-ant
09:27:28 -I- ...8544 tracklets classified as no-ant were filtered
09:27:28 -I- ...6359 short, unconnected and unidentified tracklets were filtered
09:27:29 -I- Propagating ids from src tracklets
09:27:31 -I- ...finished 1000/7421
09:27:31 -I- ...finished 2000/7421
09:27:31 -I- ...finished 3000/7421
09:27:31 -I- ...finished 4000/7421
09:27:31 -I- ...finished 5000/7421
09:27:31 -I- ...finished 6000/7421
09:27:31 -I- ...finished 7000/7421
09:27:31 -I- Propagation loops
Index in position 1 exceeds array bounds (must not exceed 14008).
Error in trgraph/solve>propagate_all (line 522)
Error in trgraph/solve (line 150)
Error in solve_across_movies (line 72)
Error in antrax_mcr_interface (line 53)
MATLAB:badsubscript
from antrax.
So, if I understand correctly, the corrupted file issue was solved by the rerun?
Regarding the new MATLAB:badsubscript error, it is different than the previous one we had above. I'm not sure what's going on there. After you tracked some of the videos again, did you also run the classify and solve1?
Step 2 actually "stitch" the graphs of individual videos, and propagate information from one video to another. In practice, it is not actually required, and that is why step 3 is able to finish properly. The tracking might be sub optimal at the interface between the videos.
from antrax.
So, if I understand correctly, the corrupted file issue was solved by the rerun?
Yes. It looks like there was some strange error happening that was not reflected in the logs, but produced some corrupt graph MAT files during track
. At least that's my best explanation.
After you tracked some of the videos again, did you also run the classify and solve1?
I tried both actually, both worked. But I went with the latter one. What consequences would re-running track
and then going directly to solve
have on detections?
from antrax.
Theoretically, the algorithm is completely deterministic, so the two runs should have the same tracklet graph and tracklet names. However, there are occasionally some small misalignments between runs that I cannot explain.. Also, when you run track, it cleans some of the data generated by later steps, so it is better to run also the downstream steps.
I'm not sure what you mean by "both worked". Was the latest MATLAB:badsubscript in step 2 solved?
from antrax.
Sorry, I made it too confusing. It looks like I've been dealing with two separate problems (they just looked like one at first): xy files not being generated after step 3 and step 2 showing different errors (either MATLAB:badsubscript
with MCR or Index in position 1 exceeds array bounds
with MATLAB). With "both worked" I was referring to the first problem that was caused by the corrupt graph files generated during track
and fixed by re-running either just track
and then solve
, or track
, classify
, and solve
.
Was the latest MATLAB:badsubscript in step 2 solved?
No, it is still happening.
from antrax.
ok, so let's try to understand this new MATLAB:badsubscript better (its the same error on MCR/matlab, just reported differently). As I said, it's a different one than the one we had before on this thread. We'll have to do it the painful way, as I can't reproduce it on my side.
I've added a few lines of code to report some info on the problematic place.
Run it on interactive matlab:
Trck = trhandles(uigetdir);
solve_across_movies(Trck, 'g', 3);
from antrax.
Hmm, maybe I am doing something wrong here:
>> addpath(genpath(['.','/matlab']));
>> Trck = trhandles(uigetdir);
Warning: uigetdir is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display. For more information, see "Changes to
-nodisplay and -noFigureWindows Startup Options" in the MATLAB Release Notes. To view the release note in your system browser, run
web('www.mathworks.com/help/matlab/release-notes.html#br5ktrh-3', '-browser')
> In warnfiguredialog (line 21)
In uigetdir (line 60)
Error using javaObjectEDT
Scalar input must be a java object
Error in matlab.ui.internal.dialog.Dialog/getParentFrame (line 46)
obj.ParentFrame = javaObjectEDT(com.mathworks.hg.peer.utils.DialogUtilities.createParentWindow);
Error in matlab.ui.internal.dialog.FileSystemChooser/getParentFrame (line 129)
parframe = [email protected](obj);
Error in matlab.ui.internal.dialog.FolderChooser/doShowDialog (line 70)
javaMethodEDT('showOpenDialog', obj.Peer, getParentFrame(obj));
Error in matlab.ui.internal.dialog.FolderChooser/show (line 48)
doShowDialog(obj)
Error in uigetdir_helper (line 32)
dirdlg.show();
Error in uigetdir (line 61)
[directoryname] = uigetdir_helper(varargin{:});
>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204)
at java.awt.Window.<init>(Window.java:536)
at java.awt.Frame.<init>(Frame.java:420)
at javax.swing.JFrame.<init>(JFrame.java:233)
at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:108)
at com.mathworks.mwswing.MJFrame.<init>(MJFrame.java:101)
at com.mathworks.hg.peer.utils.DialogUtilities$1.runWithOutput(DialogUtilities.java:56)
at com.mathworks.jmi.AWTUtilities$Invoker$2.watchedRun(AWTUtilities.java:475)
at com.mathworks.jmi.AWTUtilities$WatchedRunnable.run(AWTUtilities.java:436)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
from antrax.
from antrax.
Sorry, that was silly of me :-/ With the dataset that had the problem:
>> Trck = trhandles('.');
21:25:09 -I- Loading tracking session from expdir
21:25:17 -I- Reading video information from file
>> solve_across_movies(Trck, 'g', 3);
Error using solve_across_movies (line 11)
Expected a string scalar or character vector for the parameter name.
>>
from antrax.
from antrax.
>> solve_across_movies(Trck, 3);
21:45:08 -I- solving graph from movies 25-36
21:45:08 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
21:45:37 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
21:45:55 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
21:46:11 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
21:46:24 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
21:46:31 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
21:46:37 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
21:46:42 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
21:46:49 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
21:46:57 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
21:47:02 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
21:47:06 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
21:47:08 -I- Finished loading trgraph with 80451 tracklets
21:47:12 -I- Loading ids
21:47:31 -I- Finding single ant nodes
21:47:33 -I- Some preperations
21:47:38 -I- Filtering out tracklets identified as non-ant
21:47:38 -I- ...1082 tracklets classified as no-ant were filtered
21:47:39 -I- ...13588 short, unconnected and unidentified tracklets were filtered
21:47:41 -I- Propagating ids from src tracklets
21:47:45 -I- ...finished 1000/25235
21:47:45 -I- ...finished 2000/25235
21:47:45 -I- ...finished 3000/25235
21:47:45 -I- ...finished 4000/25235
21:47:45 -I- ...finished 5000/25235
21:47:45 -I- ...finished 6000/25235
21:47:45 -I- ...finished 7000/25235
21:47:45 -I- ...finished 8000/25235
21:47:45 -I- ...finished 9000/25235
21:47:45 -I- ...finished 10000/25235
21:47:45 -I- ...finished 11000/25235
21:47:45 -I- ...finished 12000/25235
21:47:45 -I- ...finished 13000/25235
21:47:45 -I- ...finished 14000/25235
21:47:45 -I- ...finished 15000/25235
21:47:45 -I- ...finished 16000/25235
21:47:45 -I- ...finished 17000/25235
21:47:45 -I- ...finished 18000/25235
21:47:45 -I- ...finished 19000/25235
21:47:45 -I- ...finished 20000/25235
21:47:45 -I- ...finished 21000/25235
21:47:46 -I- ...finished 22000/25235
21:47:46 -I- ...finished 23000/25235
21:47:46 -I- ...finished 24000/25235
21:47:46 -I- ...finished 25000/25235
21:47:46 -I- Propagation loops
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 522)
score = G.assignment_scores(assigned_nodes(i),idix(j));
Error in trgraph/solve (line 150)
propagate_all(G);
Error in solve_across_movies (line 72)
solve(G,false,true);
>>
from antrax.
You don't seem to have my code changes. Did you pull my latest commit?
from antrax.
BTW, did you re-track your videos with different parameters? You have 80K tracklets in the latest run, while in the previous run you had 17K for the same videos.
from antrax.
hmm, I was on the wrong branch. Sorry :-/
BTW, did you re-track your videos with different parameters?
I have two datasets, both containing 60 hours of videos, but significantly different in tracklet numbers. Both datasets had issues with step2, their parameters are not identical, and their classifiers are different.
Dataset-1:
22:08:20 -I- solving graph from movies 25-36
22:08:20 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
22:08:50 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
22:09:09 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
22:09:24 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
22:09:37 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
22:09:45 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
22:09:50 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
22:09:56 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
22:10:02 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
22:10:10 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
22:10:15 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
22:10:19 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
22:10:21 -I- Finished loading trgraph with 80451 tracklets
22:10:25 -I- Loading ids
22:10:45 -I- Finding single ant nodes
22:10:47 -I- Some preperations
22:10:51 -I- Filtering out tracklets identified as non-ant
22:10:51 -I- ...1082 tracklets classified as no-ant were filtered
22:10:52 -I- ...13588 short, unconnected and unidentified tracklets were filtered
22:10:54 -I- Propagating ids from src tracklets
22:10:58 -I- ...finished 1000/25235
22:10:58 -I- ...finished 2000/25235
22:10:58 -I- ...finished 3000/25235
22:10:58 -I- ...finished 4000/25235
22:10:58 -I- ...finished 5000/25235
22:10:58 -I- ...finished 6000/25235
22:10:58 -I- ...finished 7000/25235
22:10:58 -I- ...finished 8000/25235
22:10:58 -I- ...finished 9000/25235
22:10:58 -I- ...finished 10000/25235
22:10:58 -I- ...finished 11000/25235
22:10:58 -I- ...finished 12000/25235
22:10:58 -I- ...finished 13000/25235
22:10:58 -I- ...finished 14000/25235
22:10:58 -I- ...finished 15000/25235
22:10:58 -I- ...finished 16000/25235
22:10:58 -I- ...finished 17000/25235
22:10:58 -I- ...finished 18000/25235
22:10:58 -I- ...finished 19000/25235
22:10:58 -I- ...finished 20000/25235
22:10:59 -I- ...finished 21000/25235
22:10:59 -I- ...finished 22000/25235
22:10:59 -I- ...finished 23000/25235
22:10:59 -I- ...finished 24000/25235
22:10:59 -I- ...finished 25000/25235
22:10:59 -I- Propagation loops
22:10:59 -E- error in propagate_all
22:10:59 -I- node is 1
22:10:59 -I- size of assignment scores is 0 0
22:10:59 -I- size of assignment ids is 80451 76
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 523)
score = G.assignment_scores(assigned_nodes(i),idix(j));
Error in trgraph/solve (line 150)
propagate_all(G);
Error in solve_across_movies (line 72)
solve(G,false,true);
>>
Dataset-2:
>> solve_across_movies(Trck, 3);
22:15:29 -I- solving graph from movies 25-36
22:15:29 -I- Loading trgraph from antrax/graphs/graph_25_25.mat
22:15:37 -I- Loading trgraph from antrax/graphs/graph_26_26.mat
22:15:39 -I- Loading trgraph from antrax/graphs/graph_27_27.mat
22:15:40 -I- Loading trgraph from antrax/graphs/graph_28_28.mat
22:15:42 -I- Loading trgraph from antrax/graphs/graph_29_29.mat
22:15:42 -I- Loading trgraph from antrax/graphs/graph_30_30.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_31_31.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_32_32.mat
22:15:43 -I- Loading trgraph from antrax/graphs/graph_33_33.mat
22:15:44 -I- Loading trgraph from antrax/graphs/graph_34_34.mat
22:15:45 -I- Loading trgraph from antrax/graphs/graph_35_35.mat
22:15:50 -I- Loading trgraph from antrax/graphs/graph_36_36.mat
22:15:56 -I- Finished loading trgraph with 17914 tracklets
22:15:57 -I- Loading ids
22:16:00 -I- Finding single ant nodes
22:16:00 -I- Some preperations
22:16:01 -I- Filtering out tracklets identified as non-ant
22:16:01 -I- ...8544 tracklets classified as no-ant were filtered
22:16:01 -I- ...6359 short, unconnected and unidentified tracklets were filtered
22:16:02 -I- Propagating ids from src tracklets
22:16:03 -I- ...finished 1000/7421
22:16:03 -I- ...finished 2000/7421
22:16:03 -I- ...finished 3000/7421
22:16:03 -I- ...finished 4000/7421
22:16:03 -I- ...finished 5000/7421
22:16:03 -I- ...finished 6000/7421
22:16:03 -I- ...finished 7000/7421
22:16:03 -I- Propagation loops
22:16:03 -E- error in propagate_all
22:16:03 -I- node is 2
22:16:03 -I- size of assignment scores is 0 0
22:16:03 -I- size of assignment ids is 17914 76
Index in position 1 exceeds array bounds.
Error in trgraph/solve>propagate_all (line 523)
score = G.assignment_scores(assigned_nodes(i),idix(j));
Error in trgraph/solve (line 150)
propagate_all(G);
Error in solve_across_movies (line 72)
solve(G,false,true);
>>
from antrax.
What does this error mean exactly? I am wondering if I am causing it by using the software not as intended...
In my assay, the ants are free to come into and leave the frame. They all have access to the nest and inevitably some unmarked ants come in. I trained the classifier to recognize unmarked ants as any_ant
just to separate them from the marked ants I am interested in. But any_ant
should be at one place at a time, while in reality there can be more than one unmarked ants in the frame. Hence, it's very possible that at the intersection between movies the data does not look as expected.
I guess if this is the problem, I should either classify unmarked ants as NoAnt
to remove them from my data or skip step 2. What would you recommend?
from antrax.
No, it was a real bug, and I was able to reproduce it.
There was a variable that was assigned during solve1, but was cleared after solve2. So, when you try to run solve2 again without running solve1, it complains. However, I think this small bug only masks the problem from above, which is on the same variable. So try running all solve steps again..
Your any_ant solution is fine, but not ideal as you understand. Do you have good separation between this class and the Unknown class (marked ant that cannot be identified because of posture, bad image etc)? If so, you can try defining the any_ant class in the "NoAnt" category, which sounds weird, but all it does is telling the algorithm this is a category that cannot be individually tracked. I use it for larvae, food items etc.
from antrax.
So try running all solve steps again..
It worked locally with MATLAB with the dataset that has fewer tracklets! Will also test it on the other dataset on HPC, but it will probably take very long.
Do you have good separation between this class and the Unknown class (marked ant that cannot be identified because of posture, bad image etc)?
It's probably not very good right now because I did not consider this at all when I was selecting the examples. I think I will stick with any_ant
ant class solution for these two datasets and try the alternative with the next experiments.
from antrax.
Great, let me know.
It's probably not very good right now because I did not consider this at all when I was selecting the examples. I think I will stick with
any_ant
ant class solution for these two datasets and try the alternative with the next experiments.
Sounds good. You want to get to the point where the tracking pipeline works, and the performance can be easily estimated. You can then tune the pipeline accordingly, depending on what you actually need in order to do your science.
from antrax.
With the fist dataset (many tracklets!) I had another problem during step 2: the graphs that did not show the error, would fail because the job would run out of memory. 2CPU / 4 GB per CPU was plenty for the the second dataset with less tracklets, while this one would fail with 10GB per CPU. I increased it to 24GB per CPU and then it was running for two days, I ended up cancelling the job because I wasn't sure if it was stuck in an infinite loop. I tried running it locally too, it indeed used a lot of memory (over 30GB of RAM a couple of hours into the job on a machine with 64GB of RAM).
Is that something you would expect with a dataset with a very high number of tracklets?
from antrax.
It's probably worth mentioning that one of the classify
jobs in from that "heavy" dataset takes about 60 hours to finish with 2 CPU / 2 GB, many other ones take more than one day. Hmmm....
from antrax.
Why does your dataset have so many tracklets? What is your video duration and frame rate? I was under the impression that your tracking is expected to be very sparse. Do you have many detections of non-ant blobs?
Generally though, the classify step can benefit from more cpus, especially if the average tracklet length is longer than a few seconds. The best way to chose the resources per job is to run locally one task, and see what is the typical cpu/mem consumption.
The solve step is not resource heavy usually, not in cpu and not in memory. 30GB is something I never encountered. What is the typical file size in session/graphs/*.mat?
from antrax.
What is your video duration and frame rate?
60 minutes, 12 videos per subdir, @ 25fps. Cataglyphis are fast...
I was under the impression that your tracking is expected to be very sparse.
Not always.
Do you have many detections of non-ant blobs?
Some, but not many. The vast majority of detections are ants.
What is the typical file size in session/graphs/*.mat?
Highly variable, from 1MB to 70+ MB.
I will write you an email later, it will make much more sense if I explain you the experiment.
from antrax.
The above was partially solved by re-running the track step for movies 25,28,35,36 on HPC. Step 2 showed the
MATLAB:badsubscript
error in all logs (5 graphs in total), but step 3 finished successfully and the missing mat/csv files have been generated.$ cat matlab_solve_g_3.log 09:26:30 -I- Reading video information from file 09:26:36 -I- solving graph from movies 25-36 09:26:36 -I- Loading trgraph from antrax/graphs/graph_25_25.mat 09:26:48 -I- Loading trgraph from antrax/graphs/graph_26_26.mat 09:26:52 -I- Loading trgraph from antrax/graphs/graph_27_27.mat 09:26:54 -I- Loading trgraph from antrax/graphs/graph_28_28.mat 09:26:57 -I- Loading trgraph from antrax/graphs/graph_29_29.mat 09:26:57 -I- Loading trgraph from antrax/graphs/graph_30_30.mat 09:26:58 -I- Loading trgraph from antrax/graphs/graph_31_31.mat 09:26:59 -I- Loading trgraph from antrax/graphs/graph_32_32.mat 09:26:59 -I- Loading trgraph from antrax/graphs/graph_33_33.mat 09:27:00 -I- Loading trgraph from antrax/graphs/graph_34_34.mat 09:27:01 -I- Loading trgraph from antrax/graphs/graph_35_35.mat 09:27:09 -I- Loading trgraph from antrax/graphs/graph_36_36.mat 09:27:19 -I- Finished loading trgraph with 17914 tracklets 09:27:21 -I- Loading ids 09:27:25 -I- Finding single ant nodes 09:27:26 -I- Some preperations 09:27:28 -I- Filtering out tracklets identified as non-ant 09:27:28 -I- ...8544 tracklets classified as no-ant were filtered 09:27:28 -I- ...6359 short, unconnected and unidentified tracklets were filtered 09:27:29 -I- Propagating ids from src tracklets 09:27:31 -I- ...finished 1000/7421 09:27:31 -I- ...finished 2000/7421 09:27:31 -I- ...finished 3000/7421 09:27:31 -I- ...finished 4000/7421 09:27:31 -I- ...finished 5000/7421 09:27:31 -I- ...finished 6000/7421 09:27:31 -I- ...finished 7000/7421 09:27:31 -I- Propagation loops Index in position 1 exceeds array bounds (must not exceed 14008). Error in trgraph/solve>propagate_all (line 522) Error in trgraph/solve (line 150) Error in solve_across_movies (line 72) Error in antrax_mcr_interface (line 53) MATLAB:badsubscript
Hi @janamach , I encountered similar issues as you, but on a much larger scale. I have 6 colonies per video, and 211 out of 334 videos had at least one colony failed at the solve --step 1
. I am wondering, if rerun the track
is the only solution you discovered so far?
from antrax.
Hi @lizimai . What happens when you run solve for one colony that failed during the solve step by adding the --clist
option? What do the logs say for track and solve for the affected videos?
In my experience, many problems in the later steps are caused by issues during the track
step and re-running track can sometimes help. Another issue that I found on my side was corrupt video files. In such cases, the track
step would almost finish and exit with an error. In such cases I would re-encode the videos (or trim if possible). Unfortunately I am not aware of alternative solutions that would not involve rerunning track
.
P.S. are you using the latest anTraX version? The problems I described in this issue were solved in 417223f as far as I remember...
from antrax.
Zimai, Jana is right - the first thing you should do is make sure the tracking step finished ok for the failed cases by looking at the corresponding logs. Can you post the errors you see in a new issue thread? This one is closed and actually very convoluted with what turned out to be many small problems.
from antrax.
Hi both, thanks for the suggestion and sorry for bringing up this thread again. I will redo the track and open a new thread.
from antrax.
from antrax.
Related Issues (20)
- `Dot indexing is not supported for variables of this type` error when running `validate` HOT 2
- MATLAB:badsubscript error crashes solve step HOT 2
- MCR Interface scaling on high resolution displays HOT 4
- Many small non-ant objects still tracked and classified HOT 2
- Installation issue on m1 Mac HOT 6
- MATLAB:nonLogicalConditional during export HOT 5
- "ValueError: invalid literal for int() with base 10: 'images' " during classification step HOT 4
- [enhancement] Printing details of validation session
- Opening anTrax issue HOT 14
- Problems with compiling mex file HOT 1
- Extract Trainset Responding Slow HOT 3
- Problems with using command train HOT 7
- solve step 1 produce error contains `MATLAB:UndefinedFunction` HOT 4
- Solve step 2 Non-binary MAT file HOT 5
- The multi-colony masks do not get generated when using MATLAB full installation
- Potential graph corruption at solve 2 resulting three types of error messages in matlab_export_*.log HOT 5
- Classifying issues HOT 6
- problems with setting up the ffmpeg HOT 1
- Installation issue
- Problem with anTraX-MATLAB link HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from antrax.