-
Notifications
You must be signed in to change notification settings - Fork 9
yunitate segmentation outside audio duration #153
Description
yunitate.sh seems to produce rttm files with segments that go beyond (or even are completely outside) the duration of the source wave file.
the audio I'm using is this:
vagrant ssh -c "sox --i '/vagrant/data/0513.wav'"
Input File : '/vagrant/data/0513.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:10:04.12 = 26641575 samples = 45308.8 CDDA sectors
File Size : 53.3M
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM
So that amounts to 604.12 seconds duration.
After running vagrant ssh -c "yunitate.sh data/", I get the following rttm (only last few lines shown):
SPEAKER 0513.rttm 1 601.4 0.1 CHI
SPEAKER 0513.rttm 1 601.5 1.2 FEM
SPEAKER 0513.rttm 1 602.7 2.1 CHI
where the last segment starts inside the source wave file's duration, but goes beyond the end (602.7 + 2.1 = 604.9).
When running vagrant ssh -c "yunitate.sh data/ english" things become even stranger:
SPEAKER 0513.rttm 1 601.6 0.6 FEM
SPEAKER 0513.rttm 1 603.3 0.1 CHI
SPEAKER 0513.rttm 1 603.6 0.1 CHI
SPEAKER 0513.rttm 1 603.9 0.3 CHI
SPEAKER 0513.rttm 1 604.2 0.1 FEM
Here the last segment starts after the end of the original source.
This becomes problematic when using the latter file for vagrant ssh -c "~/launcher/WCE_from_SAD_outputs.sh /vagrant/data/ yunitator_english". Here, the tool finishes without error message, but doesn't produce the word count output. The wav_tmp folder is still present and contains this empty (corrupt?) wav file:
Input File : '/vagrant/data/wav_tmp/yunitator_english_0513_00604200-00000100.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM
And finally, if I use this file in the analyze.sh pipeline, I get the following message:
(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: MED_2s_100ms_htk.conf
(MSG) [2] in cComponentManager : successfully registered 96 component types.
(MSG) [2] in cComponentManager : successfully finished createInstances
(19 component instances were finalised, 1 data memories were finalised)
(MSG) [2] in cComponentManager : starting single thread processing loop
(MSG) [2] in cComponentManager : Processing finished! System ran for 60436 ticks.
sox WARN trim: End position is after expected end of audio.
sox WARN trim: Last 1 position(s) not reached.
/home/vagrant/utils/analyze.sh: line 40: /vagrant/data//detailed_outputs/WCE_yunitator_english_0513.rttm: No such file or directory
paste: /vagrant/data//wce.temp: No such file or directory
vcm_0513.rttm and yunitator_english_0513.rttm are present in detailed_output, but the corresponding wce_0513.rttm is missing.
One hackish solution might be to append a second or two of silence to the end of the source wave file, I suppose. I haven't tried that yet.