Orthographic, grapho-phonological, and morphological characteristics
of written words from French elementary textbooks


Manulex-Morpho v.2. - General description

Manulex_morpho provides the frequency and the consistency of the Grapheme-Phoneme (G-Ph) and Phoneme-Grapheme (Ph-G) correspondences of about 10,000 words from the Manulex database which lists the word frequency values for about 49,000 words present in elementary school textbooks. The selection encompasses approximately 20% of the lexical entries in Manulex, and more than 90% of the textual entries.

The main contribution of Manulex_morpho compared to Manulex_infra is that the grapho-phonological associations are analyzed separately according to whether or not they correspond to morphological markers, located essentially at  word end. A particular source of difficulties in learning to read and write is that nominal inflections ('e', 's' and 'x') and verbal inflections ('e', 'es', 'ent'...) are often silent in speech. The coding and analysis of morphological marks was carried out at the grapho-phonemic level in four cases: nominal inflections of gender and number; verbal inflections (e.g., 'er', 'ont', 'ant', 'ais' but not ir' or 'oir' with more than one phoneme); final consonants that may be silent in the root word ('d' of 'grand') but heard in the gender inflected and/or derived forms ('grande' - 'grandeur'); the final '-ent' of manner adverbs in '-ment' ('rarement', 'vraiment').

For each word, Manulex_morpho provides the frequency and consistency of G-Ph and Ph-G associations as a function of their position in words: initial (first G and Ph of the word), final (last G and Ph), and intermediate (G and Ph in the middle of words). Frequency and consistency of the associations are computed by type (lexical frequency) and by token (textual frequency). Lexical frequency (i.e. count by-type) reflects the number of words in the database that include a G-Ph or Ph-G correspondence of interest while each word is only counted once. Textual frequency (i.e. count by-token) reflects the number of words in the texts that include a correspondence while each word is counted as often as it appears in the text corpus. For example, the grapheme ‘ch’ appears in 562 French words, including 9 words with 'ch' pronounced /k/ ('chorale’) and 553 with 'ch' pronounced /ʃ/ (‘cheval’). Consistency is a percentage corresponding to the ratio between the frequency of the G-Ph correspondence and the global frequency of the grapheme. For the association 'ch' - /k/, the lexical consistency is therefore 1.60% (9/[553 + 9]) x 100 = 1.60%). The textual consistency is similarly calculated except it uses textual frequency data. Knowing that the total frequency is 184 for the 9 words with 'ch' pronounced /k/ and 22,388 for the 553 words with 'ch' pronounced /ʃ/, the textual consistency of the 'ch' - /k/ association is 0.82% (184/[22,388 + 184]) x 100 = 0.82%).

We kindly request that you cite the following publication in articles and reports which have benefited from use of Manulex_morpho: Peereman, R., Sprenger-Charolles, L., & Messaoud-Galusi, S. (2013). The contribution of morphology to the consistency of spelling-to-sound relations: A quantitative analysis based on French elementary school readers. Année Psychologique, 113, 3-33. Please also mention the version of the database used (v.2) and the website address

Licence Creative Commons Manulex-Morpho is licensed under Creative Commons Attribution - NonCommercial use - ShareAlike 4.0 International (CC BY-NC-SA 4.0).