Failed to interpret mfcc.npz file as a pickle about beer HOT 13 CLOSED

beer-asr commented on August 20, 2024

Failed to interpret mfcc.npz file as a pickle

from beer.

Comments (13)

EomSooHwan commented on August 20, 2024 1

Actually, I think I figured out what the problem was. I had to do perl run.pl JOB=1:9 $log_dir/${name}.JOB.log cat "$split_dir/x0JOB" \| eval $cmd because the code was reading the whole command as some kind of directory.

Thank you for helping me a lot for this problem!

from beer.

lucasondel commented on August 20, 2024

Hey,

The unpickling error comes from the fact that the features extraction failed and therefore you have some empty features archive and nothing to load.

The recipe assumes a SGE-like cluster (i.e. the qsub command) to parallelize the features extraction but apparently your environment doesn't have it. In order to be able to use beer you will definitely need a cluster (we use it for the feature extraction and the training) and also a GPU (for later stages).

Could you tell us more about your computing environment ? Do you have access to a distributed cluster ? If yes is it using SGE or something else like SLURM ?

from beer.

EomSooHwan commented on August 20, 2024

I am sorry I am not familiar with distributed cluster.
Can you help me to find whether my server have a distributed cluster?
I am working on remote server shared with my lab members.
Also, our server have 8 GPUs with each around 12GB.

from beer.

lucasondel commented on August 20, 2024

The best is probably to ask your admin system and/or your colleagues about the computing facilities of your lab.

It seems that you don't have a SGE-like environment (no qsub command). If you have a slurm environment then you would have the sbatch command. If you just have access to a single machine (like a server) then you can either use the Kaldi script https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/parallel/run.pl or alternately the GNU parallel software (the run.pl is probably easier).

In all cases, since you don't have qsub, you will need to create a new directory, say my_parallel_env, in https://github.com/beer-asr/beer/tree/master/recipes/aud/utils/parallel/ that contains 2 scripts:

parallel.sh that launches n parallel jobs (this is for extracting the features, training)
single.sh that launches just one task.

Have a look at the example here: https://github.com/beer-asr/beer/tree/master/recipes/aud/utils/parallel/sge.
Once this is done, you can specify my_parallel_env as the parallel environment when calling submit_parallel.sh.

from beer.

EomSooHwan commented on August 20, 2024

Thank you for your detailed advices!
It seems my server does not have slurm-client installed, so I think I should check Kaldi script. I will try the methods you have suggested and let you know if any problem happens.

from beer.

EomSooHwan commented on August 20, 2024

Sorry, I am currently having problems on how to use run.pl inside the code.

I have added my_parallel_env in recipes/hshmm/utils/parallel with run.pl, but I am confused on how to write parallel.sh and single.sh for run.pl. The code currently cannot interpret run.pl as a command. Also, should the command run.pl JOB=1:$njobs $log_dir/${name}.JOB.log "$cmd" $split_dir || exit 1 be okay?

Sorry for the inconvinence.

from beer.

lucasondel commented on August 20, 2024

Not sure I understand your issue here. You need to download the run.pl and write a new parallel.sh that will call this file directly, something like:

/my/path/to/run.pl JOB=1:$nbjobs ...

As how to write parallel.sh: the purpose of this script is to execute a task, say to extract features, on several input files in parallel. The input files are divided with the split command so they all have the form x[0-9]*. For instance this is how I treat them with the sge qsub command ` file:

beer/recipes/aud/utils/parallel/sge/jobarray.qsub

Line 26 in d53d2a1

time cat $splitdir/x*(0)${SGE_TASK_ID} | eval $cmd || exit 1

So I think your parallel.sh should look like this (not tested):

# Given as argument to the script.
splitdir=...

# This option is necessary to use the pattern `x*(0)JOB`.
shopt -s extglob

# I use JOBID as job identifier but run.pl uses JOB. 
cmd=$(echo $cmd | sed s/JOBID/JOB/g)
cmd="cat $splitdir/x*(0)JOB \| eval $cmd"

run.pl JOB=1:$njobs $log_dir/${name}.JOB.log "$cmd"

from beer.

EomSooHwan commented on August 20, 2024

I think cmd="cat $splitdir/x*(0)JOB \| eval $cmd" is not working because run.pl reads x*(0)JOB as x*(0)1 x*(0)2 and so on. Is there any way I could indiciate x0001 x002... using JOB?

I have tried various methods including perl run.pl JOB=1:$njobs $log_dir/${name}.JOB.log "cat $split_dir/x$(printf "%04d" $JOB) | eval $cmd" however it failed to access to each x0001 x002... and so on.

Also, even if it can access to the split directory, it seems to make an error message that

cat /mnt/hdd/workspace/features/alffa/sw/train/split/x0001 | eval beer features extract conf/mfcc.yml - /mnt/hdd/workspace/features/alffa/sw/train/mfcc_tmp: No such file or directory

However, I checked mfcc_tmp is made every time, and I also checked conf/mfcc.yml so I am not sure where does this error come from.

from beer.

lucasondel commented on August 20, 2024

I think cmd="cat $splitdir/x*(0)JOB | eval $cmd" is not working because run.pl reads x*(0)JOB as x*(0)1 x*(0)2 and so on. Is there any way I could indiciate x0001 x002... using JOB?

Yes, I knew this won't work so easily. The split command is very annoying with the output file names. I think the easiest solution is that your script rename all the files:

x00...1 -> x1
x00...2 -> x2
...

Also, even if it can access to the split directory, it seems to make an error message that
cat /mnt/hdd/workspace/features/alffa/sw/train/split/x0001 | eval beer features extract conf/mfcc.yml - /mnt/hdd/workspace/features/alffa/sw/train/mfcc_tmp: No such file or directory
However, I checked mfcc_tmp is made every time, and I also checked conf/mfcc.yml so I am not sure where does this error come from.

This one I'm not so sure. Perhaps try to provide absolute paths, it is possible that run.pl execute the process from a different working directory. Otherwise, you can check the features extraction script: https://github.com/beer-asr/beer/blob/master/beer/cli/subcommands/features/extract.py to try to see which file cannot be found.

from beer.

EomSooHwan commented on August 20, 2024

Thank you for your advice. I have changed parallel.sh to execute run.pl for each number of digits (1 to 9, 10 to 99...) so I think this problem is partially fixed.

However, the issue is that cat /mnt/hdd/workspace/features/alffa/sw/train/split/x01 | eval beer features extract /mnt/hdd/workspace/beer/recipes/hshmm/conf/mfcc.yml - /mnt/hdd/workspace/features/alffa/sw/train/mfcc_tmp still gives me No such file or directory error message. Moreover, when I run this command on terminal, beer features extract works totally fine (I did changed extract.py a bit so that it makes the directory for the save path). I am not sure which part is causing this error.

from beer.

lucasondel commented on August 20, 2024

Could you show me the result of:

head /mnt/hdd/workspace/features/alffa/sw/train/split/x01

And also the content of parallel.sh (and related scripts). My guess is that the paths in /mnt/hdd/workspace/features/alffa/sw/train/split/x01 is relative to your working directory.

from beer.

EomSooHwan commented on August 20, 2024

The paths in /mnt/hdd/workspace/features/alffa/sw/train/split/x01 are absolute paths. Also, my parallel.sh is

#!/bin/bash

if [ $# -ne 6 ]; then
    echo "$0 <name> <opts> <njobs> <split-dir> <cmd> <log-dir>"
    exit 1
fi

name=$1
opts=$2
njobs=$3
split_dir=$4
cmd=$5
log_dir=$6

shopt -s extglob
cmd=$(echo $cmd | sed s/JOBID/JOB/g)

perl run.pl JOB=1:9 $log_dir/${name}.JOB.log "cat $split_dir/x0JOB | eval $cmd"
perl run.pl JOB=10:32 $log_dir/${name}.JOB.log "cat $split_dir/xJOB | eval $cmd"

from beer.

lucasondel commented on August 20, 2024

This look ok to me... Just for debugging purposes, could you please add

echo $cmd

just before the perl ... statements ? and let me know the output

from beer.

Failed to interpret mfcc.npz file as a pickle about beer HOT 13 CLOSED

Comments (13)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent