Download
Two types of files (.xlsx format) are available for download (detailed description below):
1. Lexical databases including the words and their
characteristics (ver. 2.4.2). The ManuAll
file contains all the lexical entries whereas the ManuLemme file
contains only the orthographic forms corresponding to the
lemmas. The grapho-phonological statistics of the words are
independent. The ManuLemme file allows to characterize the
grapho-phonological properties of words independently of
gender/number inflections and verbal inflections.Note that words
that appear in schoolbooks in an inflected form only are not
included in this second analysis.
Filters are added at the top of the word lists to help selection
2. General statistics derived from the lexical databases.
Several files are available:
• Consistency and frequency of G-Ph, Ph-G, and word rime
associations. Statistics are generated from all lexical entries
in the ManuAll-Associations
file and from the lemmas in the ManuLemme-Associations
file.
• Other orthographic statistics computed from the lexical corpus
of ManuAll. These statistics, gathered in the file ManuAll-OrthoStat
, are a) the frequency of letters and b) the
frequency of bigrams and trigrams. These data are identical to
those described in Manulex_Infra version 1.
Note: Google Sheets allows you to browse the files from your
Google Drive. To import files directly into your Google Drive,
use Chrome and the "Save to Google Drive" extension available
on the Chrome Web Store. Then right-click on the file link to
save it to your Google Drive
FILE DESCRIPTION
ManuAll
• Orthographic and phonological codes
• Grammatical category
• Number of letters, phonemes, graphemes, syllables
• Graphemic complexity (n of letters / n of phonemes)
• Syllabification (phonological)
• Word frequency in Grade 1 (CP), Grade 2 (CE1), and Grade 1 to
Grade 5 (cp-cm2) according to the Manulex database (U values
taking into account the frequency dispersion of words in
textbooks)
• Number of heterographic homophones (e.g., port-porc-pore) for
singular adjectives and nouns
• Orthographic neighborhood (N-Count and Levenshtein OLD20
index)
• Average bigram frequency (values per type and per token), and
bigram frequency as a function of position (initial bigram,
internal bigram(s), final bigram)
• G-Ph segmentation and Ph-G segmentation
• Phonological rime and orthographic counterpart
• Frequency and consistency of G-Ph associations (values per
type and per token) as a function of the position within the
word (initial, internal, final)
• Frequency and consistency of Ph-G associations (values per
type and per token) as a function of the position within the
word (initial, internal, final)
• Least frequent and least consistent G-Ph and Ph-G associations
in the word
• Consistency and frequency of orthography-to-phonology (reading
direction) or phonology-to-orthography (direction of spelling)
associations on the phonological rime of words. Values by type
and token.
(note: token values are based on word frequency from Grade 1
to Grade 5)
ManuLemme
• Orthographic and phonological codes
• Grammatical category
• Number of letters, phonemes, graphemes, syllables
• Graphemic complexity (n of letters / n of phonemes)
• Syllabification (phonological)
• Word frequency in Grade 1 (CP), Grade 2 (CE1), and Grade 1 to
Grade 5 (cp-cm2) according to the Manulex database (U values
taking into account the frequency dispersion of words in
textbooks)
• G-Ph segmentation and Ph-G segmentation
• Phonological rime and orthographic counterpart
• Frequency and consistency of G-Ph associations (values per
type and per token) as a function of the position within the
word (initial, internal, final)
• Frequency and consistency of Ph-G associations (values per
type and per token) as a function of the position within the
word (initial, internal, final)
• Least frequent or least consistent G-Ph and Ph-G associations
in the word
• Consistency and frequency of orthography-to-phonology (reading
direction) or phonology-to-orthography (direction of spelling)
associations on the phonological rime of words. Values by type
and token.
(note: token values are based on word frequency from Grade 1
to Grade 5)
ManuAll-Associations and ManuLemme-Associations
• G-Ph and Ph-G associations, rime (orthography-to-phonology;
phonology-to-orthography)
• Frequency and consistency of associations (type and token
values) as a function of the position within the word (initial,
internal, final)
• Entropy and ‘surprisal’ of associations (type and token
values) as a function of the position within the word (initial,
internal, final)
(note: token values are based on word frequency from Grade 1
to Grade 5)