Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
OPTIMISING MEASURES OF LEXICAL VARIATION IN EFL LEARNER CORPORA Sylviane Granger and Martin Wynne Université Catholique de Louvain, University of Lancaster 1 Introduction While the earliest English corpora such as the LOB and the BROWN represented the standard varieties of the language, some of the more recent collections have begun to include varieties which diverge to a greater or lesser extent from the standard norms. These 'special corpora', as Sinclair (1995: 24) calls them, constitute a challenge for corpus linguists since the methods and tools commonly used in the field were designed for or trained on the standard varieties and it is very much an open question whether they can be applied to more specialised varieties. Computer learner corpora, which contain spoken and written texts produced by foreign/second language learners, are a case in point. Their degree of divergence from the native standard norm(s) is a function of the learners' proficiency level: the lower the level, the wider the gap. In this paper we investigate to what extent the lexical variation measures commonly used in corpus linguistics studies can be used to assess the lexical richness in essays written by advanced EFL learners. One of the most commonly used measures of lexical richness in texts is the type/token (T/t) ratio. More precisely, the type/token ratio measures lexical variation, which is the number of different words in a text. It is computed by means of the following formula: Number of word types x 100 T/t ratio = ------------------------------------ Number of word tokens x 1 This measure has proved useful in a variety of linguistic investigations, most notably in variation studies. Chafe and Danielewicz (1987) , for instance, have compared samples of written and spoken English and found that the spoken samples had lower T/t ratios than the written samples, a phenomenon which they attribute to the restrictions of online production. The necessarily rapid production of spoken language consistently produces a less varied vocabulary. Type-token ratio is also one of the linguistic features in Biber's (1988) multidimensional analysis. Biber finds that a high type-token ratio is associated with a more informational style, especially non-technical informational style, while a low
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Sylviane Granger and Martin Wynne type-token ratio is indexical of a more involved style. Of all the text types Biber investigates, press reviews have the highest type/token ratio and telephone conversations the lowest. Type/token ratio has also been used in EFL studies as one way among many to investigate lexical richness in learner productions. Linnarud (1975), for instance, used a variety of measures, among which the type-token ratio, to assess lexical proficiency in Swedish secondary school pupils and found that the Swedish learners varied their lexis much less than the native speakers. However, investigating type/token ratio in learner corpora presents some specific difficulties which are often disregarded by EFL specialists. A learner corpus may
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/02/2010 for the course EARTH SCIE gasificati taught by Professor Frey during the Summer '10 term at Imperial College.

Page1 / 10


This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online