Groupe Audio Acoustique

Wiki du groupe de recherche en audio et acoustique du LIMSI

Outils pour utilisateurs

Outils du site


groupeaa.limsi.fr:downloads:sus

Semantically Unpredictable Sentences for Text-to-Speech Synthesis Evaluation and Speech Reception Threshold Measurement in French


SUS Phrase List in French

The Phrase List is available here: sus-evasy.pdf

The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.

In the EVASY/SUS campaign, an intelligibility test based on semantically unpredictable sentences (SUS) has been designed. The SUS paradigm allows for word level intelligibility testing.

A list of 288 semantically unpredictable sentences was built, divided in blocks including 4 syntactic structures:

  1. adverb det. Noun1 Verb-t-pron. det. Noun2 Adjective ?d
  2. eterminer Noun1 Adjective Verb determiner Noun2
  3. det. Noun1 Verb1 determiner Noun2 qui (“that”) Verb2
  4. determiner Noun1 Verb preposition determiner Noun2

Structure 3 was not kept, because it only contained 3 target words (nouns, verbs or adjectives, here written with a capital initial letter) instead of 4 in the other structures. Each block was composed of 12 sentences.

In order to have comparable sentences and blocks, all content words were singular, monosyllabic (unless a final schwa was uttered) and had a high frequency of use according to the BRULEX lexicon.

Prepositions were also monosyllabic: e.g. sur (“on”). Determiners were definite articles (“the”) le, la or l’ before a vowel. Adjectives which are normally located before nouns in French or which were homonyms of verb forms were excluded. For the remaining ones, the agreement in gender with nouns and determiners was checked. In each sentence, the first noun (which also had to agree grammatically with the anaphoric pronoun in the first construction) was different from the second noun.

Verbs, in the third person present tense, had to be transitive in structures 1 and 2 (as Verb1 in structure 3) and could be intransitive in structure 4 (as Verb2 in structure 3). Whether they were transitive or intransitive, the possibility that they could be used with no complement was carefully watched: e.g. songe (“thinks”). Finally, some tokens which might have raised pronunciation issues (such as heterophonous homographs) were discarded.

  • La loi brille par la chance creuse.
  • La classe gaie montre le frein.
  • Quand le lien signe-t-il l’onde pleine ?
  • Le test clair mange la haine.
  • L’or jaune porte le dôme.
  • Comment la soif lance-t-elle le bol proche ?
  • Le mur siffle la buée qui vole.
  • La banque dit la dinde qui plaît.
  • La terre dresse la boîte qui rage.
  • Où l’oeuf cite-t-il le thé doué ?
  • Le nom luit sur le bras nu.
  • Le choix tape dans la queue close.

sample of semantically unpredictable sentences with the type of their syntactic structure.

Once the word material was designed (over 400 target words), SUS lists were randomly generated. Several trials were made, and the one which provided the most balanced distribution of phonemes across the blocks was retained. The phoneme repartition by block was compared to that of two authoritative French lexica — BRULEX (Content et al., 1994) and LEXIQUE (New et al., 2004) — according to chi-square tests.

The resulting list was tuned manually, and some words were exchanged within a same block if by chance an automatically yielded word sequence made sense. This way, no block was favoured: configurations which could have helped understanding were avoided. The final SUS list was definitely meaningless, it was more thoroughly controlled than those of previous studies, and can be used as a reference for various experiments.

The SUS list was read by a professional male speaker in a soundproof booth, and the recordings were sampled at at 16 kHz (16 bits, mono) in the Wave format. In addition to this natural reference, 6 systems were tested. The participating teams had to synthesise the 288 sentences mentioned above within a few hours, at the same sampling rate as the natural reference.


SUS Phrase Audio Corpus

In the context of the OPERA project, we created an audio corpus of the above Semantically Unpredictable Sentences (SUS) for use in the measurement of Speech Reception Threshold (SRT) in French.

Abstract of the study:

  • We propose a new method for measuring the threshold of 50% sentence intelligibility in noisy or multi-source speech communication situations (Speech Reception Threshold, SRT). Our SRT-test complements those available e.g. for English, German, Dutch, Swedish and Finnish by a French test method. The approach we take is based on semantically unpredictable sentences (SUS), which can principally be created for various languages. This way, the proposed method enables better cross-language comparisons of intelligibility tests. As a starting point for the French language, a set of 288 sentences (24 lists of 12 sentences each) was created from the above mentioned SUS-phrase list (available here: phraselist). Each of the 24 lists is optimized for homogeneity in terms of phoneme-distribution as compared to average French, and for word occurrence frequency of the employed monosyllabic keywords as derived from French language databases. Based on the optimized text material, a speech target sentence database has been recorded with a trained speaker. A test calibration was carried out to yield uniform measurement results over the set of target sentences. First intelligibility measurements show good reliability of the method.

After the calibration validation, 20 of the 24 lists were found to yield uniform measurement results and were therefor retained to create the SUS Audio Corpus.

The Calibrated audio corpus is available for download here: sus-corpus.zip

This corpus is provided as is. It is available for educational use under the conditions that the above paper is properly referenced.

Details of the development and validation of the phrase list and corpus are documented in the following publication by Alexandre Raake and Brian FG Katz:

  • A. Raake and B. Katz, “SUS-based method for speech reception threshold measurement in French,” in Fifth Conf. on Language Resources and Evaluation (LREC), (Genoa, Italy), pp. 2028–2033, May 2006. pdf bibtex

Related works using this corpus include the following:

  • P. Boula de Mareüil, C.d’Alessandro, A. Raake, G. Bailly, M.-N. Garcia, M. Morel, « A joint intelligibility evaluation of French text-to-speech synthesis systems: the EvaSy SUS/ACR campaign », Fifth International Conference on Language Resources and Evaluation (LREC), 2006, Gènes pp. 2034-2037.
  • Christophe d’Alessandro, Philippe Boula de Mareüil, Marie-Neige Garcia, Gérard Bailly, Michel Morel, Alexander Raake, Frédéric Béchet, Jean Véronis et Romain Prudon (2008) « La campagne EvaSy d’évaluation de la synthèse de la parole à partir du texte », in « L’évaluation technologique dans le domaine du traitement automatique de la langue : l’expérience du programme Technolangue », S. Chaudiron & K. Choukri, Eds, Hermès, Paris. ISBN 978-2-7462-1992-2, pages 183-208
  • A. Raake, B. Katz, and G. Perez, “Vorhersage und kontrolle der sprachverständlichkeit in räumlich dargebotenen audio-konferenzschaltungen. (Prediction and control of speech intelligibility in spatially rendered audio-conferences),” in 33rd German Annual Conf. on Acoustics (DAGA), (Stuttgart), pp. 1–2, Mar 2007.
  • A. Raake and B. F. G. Katz, “Measurement and prediction of speech intelligibility in a virtual chat room,” in 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, pp. 40–43, 2006.
  • B. Katz and A. Raake, “Prédicteur d’intelligibilité dans une scène audio multi-locuteur,” 2007. International Deposit Digital Number IDDN.FR.001.340011.000.S.P.2007.000.10400.

groupeaa.limsi.fr/downloads/sus.txt · Dernière modification: 2015/01/20 15:38 par Brian Katz