On the Vulnerability of Speaker Verification to Realistic Voice Spoofing

This package contains the experiments done in the following paper published at IEEE BTAS 2015:

@inproceedings{avspoof,
  author = {Serife Kucur Erg\"unay and Elie Khoury and Alexandros Lazaridis and S\'ebastien Marcel },
  title = {On the Vulnerability of Speaker Verification to Realistic Voice Spoofing},
  booktitle = {IEEE Intl. Conf. on Biometrics: Theory, Applications and Systems (BTAS)},
  year = {2015},
  url = {https://publidiap.idiap.ch/downloads//papers/2015/KucurErgunay_IEEEBTAS_2015.pdf}
}

I- Installation

xspear.btas2015 is based on the BuildOut python linking system. You only need to use buildout to bootstrap and have a working environment ready for experiments:

$ python bootstrap
$ ./bin/buildout

This also requires that bob (>= 2.0) is installed.

II- Running experiments

The above two commands will automatically download all desired packages from pypi and generate some scripts in the bin directory. The interface for the AVspoof database (bob.db.avspoof) will be downloaded automatically. You only need to download the AVspoof data (it is free of charge but requires to sign the EULA)

$ https://www.idiap.ch/dataset/avspoof

Then to run the I-Vector scripts:

$ bin/train_ivector.py -vv -d avspoof -p mod-4hz -e mfcc-60 -a ivec-avspoof -s ivec --groups world -g demanding
$ bin/verify.py -vv -d avspoof -p energy-2gauss -e mfcc-60 -a ivec-avspoof -s ivec --groups {dev,eval} -g demanding --skip-projector-training

To run the ISV scripts:

$ bin/train_isv.py -vv -d avspoof -p mod-4hz -e mfcc-60 -a ivec-avspoof -s isv --groups world -g demanding
$ bin/verify.py -vv -d avspoof -p energy-2gauss -e mfcc-60 -a ivec-avspoof -s isv --groups {dev,eval} -g demanding --skip-projector-training

Notice that the pre-processing of the training data is done using 4Hz modulation energy (mod-4hz) based voice activity detection (VAD) while the preprocessing of the DEV and EVAL set is done using Two-Gaussians energy-based VAD.

To evaluate the two systems:

$ bin/evaluate_vulnerability.py  -d /path/to/ivec/nonorm/scores-dev -e /path/to/ivec/nonorm/scores-eval
$ bin/evaluate_vulnerability.py  -d /path/to/isv/nonorm/scores-dev -e /path/to/isv/nonorm/scores-eval

For I-vectors, the expected output error rates are

---------------  male  -----------------
----------------------------------------------
EER = 6.9%       Threshold = 43.973
replay_phone1 : 29.1%
replay_phone2 : 27.7%
replay_laptop : 39.8%
replay_laptop_HQ : 77.4%
speech_synthesis_logical_access : 96.5%
speech_synthesis_physical_access : 60.6%
speech_synthesis_physical_access_HQ : 93.5%
voice_conversion_logical_access : 92.6%
voice_conversion_physical_access : 84.0%
voice_conversion_physical_access_HQ : 88.8%

---------------  female  -----------------
----------------------------------------------
EER = 17.5%      Threshold = 44.632
replay_phone1 : 11.8%
replay_phone2 : 11.1%
replay_laptop : 32.2%
replay_laptop_HQ : 69.4%
speech_synthesis_logical_access : 81.5%
speech_synthesis_physical_access : 69.5%
speech_synthesis_physical_access_HQ : 83.7%
voice_conversion_logical_access : 71.6%
voice_conversion_physical_access : 75.8%
voice_conversion_physical_access_HQ : 73.0%

For ISV, the expected output error rates are

---------------  male  -----------------
----------------------------------------------
EER = 4.9%       Threshold = 0.597
replay_phone1 : 19.2%
replay_phone2 : 45.9%
replay_laptop : 45.3%
replay_laptop_HQ : 74.1%
speech_synthesis_logical_access : 97.0%
speech_synthesis_physical_access : 65.9%
speech_synthesis_physical_access_HQ : 94.1%
voice_conversion_logical_access : 93.4%
voice_conversion_physical_access : 77.4%
voice_conversion_physical_access_HQ : 89.3%

---------------  female  -----------------
----------------------------------------------
EER = 10.6%      Threshold = 0.690
replay_phone1 : 12.2%
replay_phone2 : 23.1%
replay_laptop : 35.7%
replay_laptop_HQ : 68.5%
speech_synthesis_logical_access : 83.5%
speech_synthesis_physical_access : 67.9%
speech_synthesis_physical_access_HQ : 83.7%
voice_conversion_logical_access : 71.2%
voice_conversion_physical_access : 50.7%
voice_conversion_physical_access_HQ : 73.0%

Table Of Contents

This Page