Orthographic, grapho-phonological, and morphological characteristics
of written words from French elementary textbooks


Manulex_infra v.2. - General description

The Manulex_Infra database provides quantitative data on the orthographic and grapho-phonological characteristics of written words that appear in French school books (Grade 1-5). The corpus of 45 000 words and their frequency of occurrence are derived from the Manulex database. The first version of Manulex_Infra was initially published in 2007 (see reference below) and could be downloaded on the University of Bourgogne’s website or queried online on The newest version 2 is available on this site only, and introduces major changes to the previous version. Version 2 also provides additional data that facilitate the search and selection of words according to grapho-phonological characteristics. Further description of the difference between versions can be consulted under the 'changes from version 1' tab.

Manulex_Infra mainly provides the statistical characteristics of graphemes (G) and phonemes (Ph) within printed words. The relation between graphemes and phonemes is analyzed for reading (from G to Ph) as well as writing (from Ph to G). The latest version allows users to consider larger units corresponding to the phonological rime of words (/uR/ in ‘tour’; /al/ in ‘mal’). The analyses involve all the  lexical forms included in the database or focus specifically on lexemes (also called lemmas). In the latter case, the analyses exclude lexical forms inflected for gender (feminine) or number (plural), as well as verbal inflections (person, tense, mode).

Other analyses provide word length indexes (letters, graphemes, syllables), orthographic word neighborhoods ('n-count', Levenshtein distance), bigrammic and trigrammic frequencies (groupings of two or three adjacent letters), and non-homograph homophones (saut - sceau). Detailed descriptions are available under 'Understanding Manulex_Infra' tab.

The current version of the database is v.2. Several major changes have been made to the previous version. December 2021.

We kindly request that you cite the following publication whenever using Manulex_Infra for your work: Peereman, R., Lété, B., & Sprenger-Charolles, L. (2007). Manulex-Infra: Distributional characteristics of grapheme-phoneme mappings, infra-lexical and lexical units in child-directed written material. Behavior Research Methods, 39, 579-589. Please also mention the version of the database used (v.2) and the website address. The reference announcing the new version will be added soon.

Licence Creative Commons Manulex Infra is licensed under Creative Commons Attribution - NonCommercial use - ShareAlike 4.0 International (CC BY-NC-SA 4.0).