feelingstack
diff --git a/‎egs/iban/README‎
Lines changed: 84 additions & 0 deletions b/‎egs/iban/README‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎egs/iban/s5/RESULTS‎
Lines changed: 16 additions & 0 deletions b/‎egs/iban/s5/RESULTS‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎egs/iban/s5/cmd.sh‎
Lines changed: 2 additions & 0 deletions b/‎egs/iban/s5/cmd.sh‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎egs/iban/s5/conf/decode.config‎
Lines changed: 4 additions & 0 deletions b/‎egs/iban/s5/conf/decode.config‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎egs/iban/s5/conf/decode_dnn.config‎ b/‎egs/iban/s5/conf/decode_dnn.config‎
diff --git a/‎egs/iban/s5/conf/mfcc.conf‎
Lines changed: 1 addition & 0 deletions b/‎egs/iban/s5/conf/mfcc.conf‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎egs/iban/s5/conf/mfcc_hires.conf‎
Lines changed: 10 additions & 0 deletions b/‎egs/iban/s5/conf/mfcc_hires.conf‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎egs/iban/s5/conf/online_cmvn.conf‎
Lines changed: 1 addition & 0 deletions b/‎egs/iban/s5/conf/online_cmvn.conf‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎egs/iban/s5/local/arpa2G.sh‎
Lines changed: 115 additions & 0 deletions b/‎egs/iban/s5/local/arpa2G.sh‎
Lines changed: 115 additions & 0 deletions
diff --git a/‎egs/iban/s5/local/nnet3/run_ivector_common.sh‎
Lines changed: 95 additions & 0 deletions b/‎egs/iban/s5/local/nnet3/run_ivector_common.sh‎
Lines changed: 95 additions & 0 deletions
@@ -0,0 +1,84 @@
+###
+# Iban Data collected by Sarah Samson Juan and Laurent Besacier
+# Prepared by Sarah Samson Juan and Laurent Besacier
+# Created in GETALP, Grenoble, France
+###
+
+
+## INTRODUCTION ##
+This package has iban text and speech corpora used for Automatic Speech Recognition (ASR) experiments. Data is available in the subdirectories of /data. The subdirectories contain:
+a. train - train transcript for training ASR system using Kaldi ASR (http://kaldi.sourceforge.net/)
+b. test - test transcript for testing ASR system (also Kaldi ASR format)
+c. wav - speech corpus
+
+We have provided text corpus and language model in the /LM directory, while, the pronunciation dictionary in /lang directory.
+
+###PUBLICATION ON IBAN DATA AND ASR #####
+Details on the corpora and the our experiments on iban ASR can be found in the following list of publication. We appreciate if you cite them if you intend to publish.
+
+@inproceedings{Juan14,
+	Author = {Sarah Samson Juan and Laurent Besacier and Solange Rossato},
+	Booktitle = {Proceedings of Workshop for Spoken Language Technology for Under-resourced (SLTU)},
+	Month = {May},
+	Title = {Semi-supervised G2P bootstrapping and its application to ASR for a very under-resourced language: Iban},
+	Year = {2014}}
+
+
+@inproceedings{Juan2015,
+  	Title = {Using Resources from a closely-Related language to develop ASR for a very under-resourced Language: A case study for Iban},
+  	Author = {Sarah Samson Juan and Laurent Besacier and Benjamin Lecouteux and Mohamed Dyab},
+  	Booktitle = {Proceedings of INTERSPEECH},
+  	Year = {2015},
+  	Address = {Dresden, Germany},
+  	Month = {September}}
+
+
+###IBAN SPEECH CORPUS
+News data provided by a local radio station in Sarawak, Malaysia.
+
+Directory: data/train
+Files: text (training transcript), wav.scp (file id and path to audio file), utt2spk (file id and audio id), spk2utt(audio id and file id), wav (.wav files).
+For more information about the format, please refer to Kaldi website http://kaldi.sourceforge.net/data_prep.html
+Description: training data in Kaldi format about 7 hours. Note: The path of wav files in wav.scp MUST BE MODIFIED to point to the actual location.
+
+Directory: data/test
+Files: text (test transcript), wav.scp (file id and path to audio file), utt2spk (file id and audio id), spk2utt(audio id and file id), wav (.wav files).
+Description: testing data in Kaldi format about 1 hour. Note: The path of wav files in wav.scp MUST BE MODIFIED to point to the actual location.
+
+The audio files have the format:
+ib[m|f]_SPK_UTT where, m refers to male and f refers to female speaker, SPK denotes speaker id and UTT is the utterance id.
+
+#### IBAN TEXT CORPUS
+Directory: /LM/
+Files: iban-bp-2012.txt, iban-lm-o3.arpa
+
+# /iban-bp-2012.txt
+Contains 2 M Words. Full text data crawled from an online newspaper and cleaned as much as we could.
+
+# /iban-lm-o3.arpa
+The language model build on SRILM (http://www.speech.sri.com/projects/srilm/) using iban-bp-2012.txt
+
+
+#### LEXICON/PRONUNCIATION DICTIONARY
+Directory: /lang
+Files : lexicon.txt (lexicon), nonsilence_phones.txt (speech phones), optional_silence.txt (silence phone)
+Description: lexicon contains words and their respective pronunciation, non-speech sound and noise in Kaldi format. Details on the development of the dictionary can be found in our papers. (For this package, we provided the Iban-Hybrid version.)
+
+
+#TO DOWNLOAD THE REPOSITORY
+
+svn co https://github.com/sarahjuan/iban
+
+### SCRIPTS
+In /kaldi-scripts, you can find all scripts that can be used to train and test models from the existing data and lang directory. Note: Path needs to changed to make it work in your own directory. 
+
+You can launch run.sh to prepare data & language model, make mfccs and train acoustic models.
+
+
+### WER RESULTS OBTAINED USING OUR CORPORA AND SETTINGS. RESULTS OBTAINED AFTER UPDATING TEST TRANSCRIPT. THE ONES REPORTED IN OUR PAPERS WERE BEFORE THIS UPDATE##
+
+See the latest results in s5/RESULTS file (they will not match the results from the paper)
+       
+##ACKNOWLEDGEMENT ###
+We would like to thank the Ministry of Higher Education Malaysia for providing financial support to conduct this study. We also thank The Borneo Post news agency for providing online materials for building the text corpus and also to Radio Televisyen Malaysia (RTM), Sarawak, Malaysia, for providing the news data.
+
@@ -0,0 +1,16 @@
+%WER 15.32 [ 1686 / 11006, 220 ins, 338 del, 1128 sub ] exp/sgmm2_5b2/decode_dev.big/wer_18_0.0
+%WER 15.36 [ 1691 / 11006, 214 ins, 322 del, 1155 sub ] exp/nnet3/nnet_tdnn_h_sp_4_850_170/decode_dev.big/wer_18_0.0
+%WER 15.50 [ 1706 / 11006, 212 ins, 327 del, 1167 sub ] exp/nnet3/nnet_tdnn_h_sp_4_850_170/decode_dev.rescored/wer_18_0.0
+%WER 15.84 [ 1743 / 11006, 242 ins, 332 del, 1169 sub ] exp/sgmm2_5b2/decode_dev.rescored/wer_15_0.0
+%WER 17.45 [ 1921 / 11006, 252 ins, 326 del, 1343 sub ] exp/nnet3/nnet_tdnn_h_sp_4_850_170/decode_dev/wer_15_0.0
+%WER 17.55 [ 1932 / 11006, 266 ins, 323 del, 1343 sub ] exp/sgmm2_5b2/decode_dev/wer_13_0.0
+%WER 19.08 [ 2100 / 11006, 245 ins, 503 del, 1352 sub ] exp/tri3b/decode_dev.rescored/wer_20_0.0
+%WER 20.92 [ 2302 / 11006, 263 ins, 518 del, 1521 sub ] exp/tri3b/decode_dev/wer_19_0.0
+%WER 24.19 [ 2662 / 11006, 243 ins, 900 del, 1519 sub ] exp/tri2b/decode_dev.rescored/wer_14_0.0
+%WER 25.26 [ 2780 / 11006, 294 ins, 736 del, 1750 sub ] exp/tri3b/decode_dev.si/wer_16_0.0
+%WER 26.44 [ 2910 / 11006, 292 ins, 832 del, 1786 sub ] exp/tri2b/decode_dev/wer_13_0.0
+%WER 30.99 [ 3411 / 11006, 245 ins, 1391 del, 1775 sub ] exp/tri1/decode_dev.rescored/wer_12_0.0
+%WER 33.31 [ 3666 / 11006, 260 ins, 1428 del, 1978 sub ] exp/tri1/decode_dev/wer_12_0.0
+%WER 33.81 [ 3721 / 11006, 241 ins, 1585 del, 1895 sub ] exp/tri2a/decode_dev.rescored/wer_11_0.0
+%WER 35.69 [ 3928 / 11006, 243 ins, 1750 del, 1935 sub ] exp/tri2a/decode_dev/wer_12_0.0
+%WER 39.41 [ 4338 / 11006, 190 ins, 1237 del, 2911 sub ] exp/mono/decode_dev/wer_11_0.0
@@ -0,0 +1,2 @@
+export train_cmd="run.pl --max-jobs-run 32"
+export decode_cmd="run.pl --max-jobs-run 32"
@@ -0,0 +1,4 @@
+# Use wider-than-normal decoding beams for RM.
+first_beam=16.0
+beam=20.0
+lattice_beam=10.0
@@ -0,0 +1 @@
+--use-energy=false   # only non-default option.
@@ -0,0 +1,10 @@
+# config for high-resolution MFCC features, intended for neural network training
+# Note: we keep all cepstra, so it has the same info as filterbank features,
+# but MFCC is more easily compressible (because less correlated) which is why 
+# we prefer this method.
+--use-energy=false   # use average of log energy, not energy.
+--num-mel-bins=40     # similar to Google's setup.
+--num-ceps=40     # there is no dimensionality reduction.
+--low-freq=20     # low cutoff frequency for mel bins... this is high-bandwidth data, so
+                  # there might be some information at the low end.
+--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) 
@@ -0,0 +1 @@
+# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+# Copyright 2013-2014  Johns Hopkins University (authors: Yenda Trmal, Daniel Povey)
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+#Simple utility script to convert the gzipped ARPA lm into a G.fst file
+
+
+oov_prob_file=
+unk_fraction=
+cleanup=true
+#end configuration section.
+
+
+
+echo $0 $@
+
+[ -f ./path.sh ] && . ./path.sh
+[ -f ./cmd.sh ]  && . ./cmd.sh
+. parse_options.sh || exit 1;
+
+if [ $# -ne 3 ]; then
+  echo "Usage: $0 [options] <arpa-lm-file> <lang-dir> <dest-dir>"
+  echo "Options: --oov-prob-file <oov-prob-file>   # e.g. data/local/oov2prob"
+  echo "           # with this option it will replace <unk> with OOVs in G.fst."
+  exit 1;
+fi
+
+set -e           #Exit on non-zero return code from any command
+set -o pipefail  #Exit if any of the commands in the pipeline will
+                 #return non-zero return code
+
+lmfile=$1
+langdir=$2
+destdir=$3
+
+mkdir $destdir 2>/dev/null || true
+
+
+if [ ! -z "$oov_prob_file" ]; then
+  if [ ! -s "$oov_prob_file" ]; then
+    echo "$0: oov-prob file $oov_prob_file does not exist"
+    exit 1;
+  fi
+  if [ -z "$unk_fraction" ]; then
+    echo "--oov-prob option requires --unk-fraction option";
+    exit 1;
+  fi
+
+  min_prob=$(gunzip -c $lmfile | perl -e '  $minlogprob = 0.0;
+     while(<STDIN>) { if (m/\\(\d)-grams:/) { $order = $1; }
+      if ($order == 1) { @A = split;
+       if ($A[0] < $minlogprob && $A[0] != -99) { $minlogprob = $A[0]; }}} print $minlogprob')
+  echo "Minimum prob in LM file is $min_prob"
+
+  echo "$0: creating LM file with unk words, using $oov_prob_file, in $destdir/lm_tmp.gz"
+  gunzip -c $lmfile | \
+    perl -e ' ($oov_prob_file,$min_prob,$unk_fraction) = @ARGV; $ceilinged=0;
+      $min_prob < 0.0 || die "Bad min_prob"; # this is a log-prob
+      $unk_fraction > 0.0 || die "Bad unk_fraction"; # this is a prob
+      open(F, "<$oov_prob_file") || die "opening oov file";
+      while (<F>) { push @OOVS, $_; }
+      $num_oovs = @F;
+      while(<STDIN>) {
+      if (m/^ngram 1=(\d+)/) { $n = $1 + $num_oovs; print "ngram 1=$n\n"; }
+      else { print; } # print all lines unchanged except the one that says ngram 1=X.
+      if (m/^\\1-grams:$/) {
+        foreach $l (@OOVS) {
+          @A = split(" ", $l);
+          @A == 2 || die "bad line in oov2prob: $_;";
+          ($word, $prob) = @A;
+          $log10prob = (log($prob * $unk_fraction) / log(10.0));
+          if ($log10prob > $min_prob) { $log10prob = $min_prob; $ceilinged++;}
+          print "$log10prob $word\n";
+       }
+     }} print STDERR "Ceilinged $ceilinged unk-probs\n";' \
+       $oov_prob_file $min_prob $unk_fraction | gzip -c > $destdir/lm_tmp.gz
+  lmfile=$destdir/lm_tmp.gz
+fi
+
+if [[ $lmfile == *.bz2 ]] ; then
+  decompress="bunzip2 -c $lmfile"
+elif [[ $lmfile == *.gz ]] ; then
+  decompress="gunzip -c $lmfile"
+else
+  decompress="cat $lmfile"
+fi
+
+$decompress | \
+  grep -v '<s> <s>' | grep -v '</s> <s>' |  grep -v '</s> </s>' | \
+  arpa2fst - | \
+  fstprint | \
+  utils/eps2disambig.pl | \
+  utils/s2eps.pl | \
+  fstcompile --isymbols=$langdir/words.txt \
+  --osymbols=$langdir/words.txt  --keep_isymbols=false --keep_osymbols=false | \
+  fstrmepsilon | fstarcsort --sort_type=olabel > $destdir/G.fst || exit 1
+fstisstochastic $destdir/G.fst || true;
+
+if $cleanup; then
+  rm $destdir/lm_tmp.gz  2>/dev/null || true;
+fi
+
+exit 0
@@ -0,0 +1,95 @@
+#!/bin/bash
+
+## Script was adapted from WSJ (login) and RM (some settings)
+
+. cmd.sh
+mfccdir=mfcc
+
+stage=1
+
+. cmd.sh
+. ./path.sh
+. ./utils/parse_options.sh
+
+
+if [ $stage -le 1 ]; then
+    for datadir in train; do
+      utils/perturb_data_dir_speed.sh 0.9 data/${datadir} data/temp1
+      utils/perturb_data_dir_speed.sh 1.1 data/${datadir} data/temp2
+      utils/combine_data.sh data/${datadir}_tmp data/temp1 data/temp2
+      utils/validate_data_dir.sh --no-feats data/${datadir}_tmp
+      rm -r data/temp1 data/temp2
+
+      mfccdir=mfcc_perturbed
+      steps/make_mfcc.sh --cmd "$train_cmd" --nj 17 \
+        data/${datadir}_tmp exp/make_mfcc/${datadir}_tmp $mfccdir || exit 1;
+      steps/compute_cmvn_stats.sh data/${datadir}_tmp exp/make_mfcc/${datadir}_tmp $mfccdir || exit 1;
+      utils/fix_data_dir.sh data/${datadir}_tmp
+
+      utils/copy_data_dir.sh --spk-prefix sp1.0- --utt-prefix sp1.0- data/${datadir} data/temp0
+      utils/combine_data.sh data/${datadir}_sp data/${datadir}_tmp data/temp0
+      utils/fix_data_dir.sh data/${datadir}_sp
+      rm -r data/temp0 data/${datadir}_tmp
+    done
+fi
+
+mkdir -p exp/nnet3
+
+if [ $stage -le 2 ]; then
+    steps/align_fmllr.sh --nj 16 --cmd "$train_cmd" \
+      data/train_sp data/lang exp/tri3b exp/nnet3/tri3b_ali_sp || exit 1
+fi
+
+mfccdir=mfcc_hires
+if [ $stage -le 3 ]; then
+   utils/copy_data_dir.sh data/train_sp data/train_hires || exit 1
+   steps/make_mfcc.sh --nj 16 --mfcc-config conf/mfcc_hires.conf \
+     --cmd "$train_cmd" data/train_hires exp/make_hires/train $mfccdir || exit 1;
+   steps/compute_cmvn_stats.sh data/train_hires exp/make_hires/train $mfccdir || exit 1;
+
+   for datadir in  dev; do
+     utils/copy_data_dir.sh data/$datadir data/${datadir}_hires || exit 1
+     steps/make_mfcc.sh --nj 6 --mfcc-config conf/mfcc_hires.conf \
+       --cmd "$train_cmd" data/${datadir}_hires exp/make_hires/$datadir $mfccdir || exit 1;
+     steps/compute_cmvn_stats.sh data/${datadir}_hires exp/make_hires/$datadir $mfccdir || exit 1;
+  done
+fi
+
+if [ $stage -le 4 ]; then
+  # Train a small system just for its LDA+MLLT transform.  We use --num-iters 13
+  # because after we get the transform (12th iter is the last), any further
+  # training is pointless.
+  steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
+    --realign-iters "" --splice-opts "--left-context=3 --right-context=3" \
+    5000 10000 data/train_hires data/lang \
+     exp/nnet3/tri3b_ali_sp exp/nnet3/tri5b || exit 1
+fi
+
+if [ $stage -le 5 ]; then
+  steps/online/nnet2/train_diag_ubm.sh --cmd "$train_cmd" --nj 16  --num-frames 200000 \
+     data/train_hires 256 exp/nnet3/tri5b exp/nnet3/diag_ubm || exit 1
+fi
+
+if [ $stage -le 6 ]; then
+  # even though $nj is just 10, each job uses multiple processes and threads.
+ steps/online/nnet2/train_ivector_extractor.sh --cmd "$train_cmd" \
+    --nj 10 --num-processes 1 --num-threads 2  --ivector-dim 50\
+    data/train_hires exp/nnet3/diag_ubm exp/nnet3/extractor || exit 1;
+fi
+
+if [ $stage -le 7 ]; then
+  # having a larger number of speakers is helpful for generalization, and to
+  # handle per-utterance decoding well (iVector starts at zero).
+  steps/online/nnet2/copy_data_dir.sh --utts-per-spk-max 2 data/train_hires \
+    data/train_hires_max2 || exit 1
+
+  steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj 16\
+    data/train_hires_max2 exp/nnet3/extractor exp/nnet3/ivectors_train || exit 1
+fi
+
+if [ $stage -le 8 ]; then
+    steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj 6 \
+      data/dev_hires exp/nnet3/extractor exp/nnet3/ivectors_dev || exit 1
+fi
+
+exit 0;
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+export train_cmd="run.pl --max-jobs-run 32"`
	`2`	`+export decode_cmd="run.pl --max-jobs-run 32"`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+--use-energy=false # only non-default option.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh`