Wiki du groupe de recherche en audio et acoustique du LIMSI
The Phrase List is available here: sus-evasy.pdf
The EVASY project is dedicated to the evaluation of speech synthesis systems for the French language. The project is financed by the French Ministry of Research in the context of the Technolangue programme.
In the EVASY/SUS campaign, an intelligibility test based on semantically unpredictable sentences (SUS) has been designed. The SUS paradigm allows for word level intelligibility testing.
A list of 288 semantically unpredictable sentences was built, divided in blocks including 4 syntactic structures:
Structure 3 was not kept, because it only contained 3 target words (nouns, verbs or adjectives, here written with a capital initial letter) instead of 4 in the other structures. Each block was composed of 12 sentences.
In order to have comparable sentences and blocks, all content words were singular, monosyllabic (unless a final schwa was uttered) and had a high frequency of use according to the BRULEX lexicon.
Prepositions were also monosyllabic: e.g. sur (on). Determiners were definite articles (the) le, la or l’ before a vowel. Adjectives which are normally located before nouns in French or which were homonyms of verb forms were excluded. For the remaining ones, the agreement in gender with nouns and determiners was checked. In each sentence, the first noun (which also had to agree grammatically with the anaphoric pronoun in the first construction) was different from the second noun.
Verbs, in the third person present tense, had to be transitive in structures 1 and 2 (as Verb1 in structure 3) and could be intransitive in structure 4 (as Verb2 in structure 3). Whether they were transitive or intransitive, the possibility that they could be used with no complement was carefully watched: e.g. songe (thinks). Finally, some tokens which might have raised pronunciation issues (such as heterophonous homographs) were discarded.
sample of semantically unpredictable sentences with the type of their syntactic structure.
Once the word material was designed (over 400 target words), SUS lists were randomly generated. Several trials were made, and the one which provided the most balanced distribution of phonemes across the blocks was retained. The phoneme repartition by block was compared to that of two authoritative French lexica BRULEX (Content et al., 1994) and LEXIQUE (New et al., 2004) according to chi-square tests.
The resulting list was tuned manually, and some words were exchanged within a same block if by chance an automatically yielded word sequence made sense. This way, no block was favoured: configurations which could have helped understanding were avoided. The final SUS list was definitely meaningless, it was more thoroughly controlled than those of previous studies, and can be used as a reference for various experiments.
The SUS list was read by a professional male speaker in a soundproof booth, and the recordings were sampled at at 16 kHz (16 bits, mono) in the Wave format. In addition to this natural reference, 6 systems were tested. The participating teams had to synthesise the 288 sentences mentioned above within a few hours, at the same sampling rate as the natural reference.
In the context of the OPERA project, we created an audio corpus of the above Semantically Unpredictable Sentences (SUS) for use in the measurement of Speech Reception Threshold (SRT) in French.
Abstract of the study:
After the calibration validation, 20 of the 24 lists were found to yield uniform measurement results and were therefor retained to create the SUS Audio Corpus.
The Calibrated audio corpus is available for download here: sus-corpus.zip
This corpus is provided as is. It is available for educational use under the conditions that the above paper is properly referenced.
Details of the development and validation of the phrase list and corpus are documented in the following publication by Alexandre Raake and Brian FG Katz:
Related works using this corpus include the following: