Main changes in Manulex_Infra version 2
• Separate G-Ph and Ph-G segmentations. In version 1 of
Manulex-infra, the description of grapho-phonological
associations focused mainly on grapheme-to-phoneme mappings
(reading direction). The same associations were then used to
analyze the associations between phonemes and graphemes (writing
direction). However, this procedure causes problems especially
-though not exclusively- when the words include silent letters.
Indeed, Ph-G consistency is defined as the probability of
writing a particular grapheme from the pronounced phoneme. In
the case of a silent grapheme, no phoneme is produced and it is
difficult to predict for sure what letter should be written
(unless one knows the exact spelling of the word). Therefore
associations must be described differently when considering
reading (G-Ph) and writing (Ph-G). This information is now
available in version 2 (see tab 'Understanding Manulex_Infra').
• Analyses of the consistency and frequency of
grapho-phonological associations on the final rime unit of
words.
• Information theory measures (surprisal, entropy) are computed
on G-Ph and Ph-G associations
• Analyses of lexemes (lemmas)
• For each word the least consistent and the least frequent G-Ph
or Ph-G association. Note that the least consistent association
is not always the least frequent, and vice-versa.
• Phonological codes and segmentations into graphemes and
phonemes were modified
• The distinction between the two 'a' (/a/ of 'patte' and /ɑ/ of
'pâte') is removed from consistency calculation. They are
considered as the same phoneme.
• Words including the grapheme 'ai' ('maison', 'laine') can be
transcribed as /E/ or /e/. Therefore consistency calculation
consider the G-Ph association as the same.
• The difference between the 'e' that are obligatorily
pronounced, obligatorily silent, or with optional schwa (see
'phonetic codes' tab) is now included in the analyses.
• The syllabic segmentation of words in accordance with the
coding of silent or non silent 'e' is included in the analyses.
• The G-Ph consistency for the grapheme 'e' whose schwa is
optional ('gare', 'parle') is set to 100 since the 'e' may or
may not be pronounced.
• In the case of Ph-G associations only, the few rare silent
consonants in internal position (e.g. 'm' in 'automne', 'p' in
'baptême') are not present in the speech signal, and their Ph-G
consistency is therefore 0%.
• Case of 'e' followed by two identical consonants. In version 1
of Manulex_infra, the orthographic sequences 'emm' and 'enn'
were coded as one single graphemic group while the 'e' followed
by other doublets ('err', 'ett') were coded as two (e.g. 'e.rr'
in 'terre'). The coding of 'emm' and 'enn' responded to the
segmentation principle aimed at highlighting inconsistencies in
word pronunciation, as these two orthographic sequences were
pronounced differently in 'antenne' and 'flemme' than in
'solennel' and 'patiemment'. However, given the high number of
adverbs in '-emment' sharing the association 'emm'-/am/
(patiemment, évidemment, récemment), a word like “femme” was
described as consistent. The coding for reading was standardized
by sorting 'e' when followed by two identical consonants as
.e[CC]. (with CC to indicate 2 identical consonants) so the word
'femme' is now coded as 'f.e[CC].mm.e'. This change in coding
now describes the word “femme” with a low consistency score
because 'e' followed by a doublet is usually pronounced /e/ or
/E/. This coding of 'e[CC]' only occurs for G-Ph associations
but not for Ph-G associations since, in French, nothing signals
the presence of the doublet when words are pronounced orally.
This coding of G-Ph associations with 'e[CC]' applies to all
words in order to highlight inconsistencies, thus also to words
such as 'ennui' coded 'e[CC]-@.nn-n.u-8.i-i'.
• Coding has been modified when -eill or -eil are not preceded
by 'u' ('abeille', 'bienveillant', 'sommeil'). 'eil' and 'eill'
are now single blocks where 'il' or 'ill' are always associated
with the semi-vowel /j/, never with the consonant /l/.
• Verb endings in -ent are, when reading is considered (G-Ph),
segmented 'en.t'. It is comparable to the description of verb
endings in '-ant', '-ont', '-ons', ..., (an .t, on.t, on.s).
• By-token values are computed using a log transform of word
frequency, log10(frequency+1). ver.2.4
• In order to eliminate some rare G-Ph or Ph-G associations,
proper names are excluded from the analyses.