This package contains the experiments done in the following paper published at IEEE BTAS 2015:
@inproceedings{avspoof,
author = {Serife Kucur Erg\"unay and Elie Khoury and Alexandros Lazaridis and S\'ebastien Marcel },
title = {On the Vulnerability of Speaker Verification to Realistic Voice Spoofing},
booktitle = {IEEE Intl. Conf. on Biometrics: Theory, Applications and Systems (BTAS)},
year = {2015},
url = {https://publidiap.idiap.ch/downloads//papers/2015/KucurErgunay_IEEEBTAS_2015.pdf}
}
xspear.btas2015 is based on the BuildOut python linking system. You only need to use buildout to bootstrap and have a working environment ready for experiments:
$ python bootstrap
$ ./bin/buildout
This also requires that bob (>= 2.0) is installed.
The above two commands will automatically download all desired packages from pypi and generate some scripts in the bin directory. The interface for the AVspoof database (bob.db.avspoof) will be downloaded automatically. You only need to download the AVspoof data (it is free of charge but requires to sign the EULA)
$ https://www.idiap.ch/dataset/avspoof
Then to run the I-Vector scripts:
$ bin/train_ivector.py -vv -d avspoof -p mod-4hz -e mfcc-60 -a ivec-avspoof -s ivec --groups world -g demanding
$ bin/verify.py -vv -d avspoof -p energy-2gauss -e mfcc-60 -a ivec-avspoof -s ivec --groups {dev,eval} -g demanding --skip-projector-training
To run the ISV scripts:
$ bin/train_isv.py -vv -d avspoof -p mod-4hz -e mfcc-60 -a ivec-avspoof -s isv --groups world -g demanding
$ bin/verify.py -vv -d avspoof -p energy-2gauss -e mfcc-60 -a ivec-avspoof -s isv --groups {dev,eval} -g demanding --skip-projector-training
Notice that the pre-processing of the training data is done using 4Hz modulation energy (mod-4hz) based voice activity detection (VAD) while the preprocessing of the DEV and EVAL set is done using Two-Gaussians energy-based VAD.
To evaluate the two systems:
$ bin/evaluate_vulnerability.py -d /path/to/ivec/nonorm/scores-dev -e /path/to/ivec/nonorm/scores-eval
$ bin/evaluate_vulnerability.py -d /path/to/isv/nonorm/scores-dev -e /path/to/isv/nonorm/scores-eval
For I-vectors, the expected output error rates are
--------------- male -----------------
----------------------------------------------
EER = 6.9% Threshold = 43.973
replay_phone1 : 29.1%
replay_phone2 : 27.7%
replay_laptop : 39.8%
replay_laptop_HQ : 77.4%
speech_synthesis_logical_access : 96.5%
speech_synthesis_physical_access : 60.6%
speech_synthesis_physical_access_HQ : 93.5%
voice_conversion_logical_access : 92.6%
voice_conversion_physical_access : 84.0%
voice_conversion_physical_access_HQ : 88.8%
--------------- female -----------------
----------------------------------------------
EER = 17.5% Threshold = 44.632
replay_phone1 : 11.8%
replay_phone2 : 11.1%
replay_laptop : 32.2%
replay_laptop_HQ : 69.4%
speech_synthesis_logical_access : 81.5%
speech_synthesis_physical_access : 69.5%
speech_synthesis_physical_access_HQ : 83.7%
voice_conversion_logical_access : 71.6%
voice_conversion_physical_access : 75.8%
voice_conversion_physical_access_HQ : 73.0%
For ISV, the expected output error rates are
--------------- male -----------------
----------------------------------------------
EER = 4.9% Threshold = 0.597
replay_phone1 : 19.2%
replay_phone2 : 45.9%
replay_laptop : 45.3%
replay_laptop_HQ : 74.1%
speech_synthesis_logical_access : 97.0%
speech_synthesis_physical_access : 65.9%
speech_synthesis_physical_access_HQ : 94.1%
voice_conversion_logical_access : 93.4%
voice_conversion_physical_access : 77.4%
voice_conversion_physical_access_HQ : 89.3%
--------------- female -----------------
----------------------------------------------
EER = 10.6% Threshold = 0.690
replay_phone1 : 12.2%
replay_phone2 : 23.1%
replay_laptop : 35.7%
replay_laptop_HQ : 68.5%
speech_synthesis_logical_access : 83.5%
speech_synthesis_physical_access : 67.9%
speech_synthesis_physical_access_HQ : 83.7%
voice_conversion_logical_access : 71.2%
voice_conversion_physical_access : 50.7%
voice_conversion_physical_access_HQ : 73.0%