Further+explorations+13.doc - Explorations in automated...

This preview shows page 1 - 4 out of 34 pages.

Explorations in automated language classificationEric W. HolmanUniversity of California, Los AngelesSøren WichmannMax Planck Institute for Evolutionary Anthropology & Leiden UniversityCecil H. BrownNorthern Illinois UniversityViveka VelupillaiJustus-Liebig-Universität GiessenAndré MüllerLeipzig UniversityDik BakkerUniversity of Amsterdam & University of Lancaster0. IntroductionAn earlier paper, to which some authors of the present paper have contributed(Brown et al., 2008), describes a method for automating language classificationbased on the 100-item referent list of Swadesh (1955). Here we discuss arefinement of the method, involving calculation of relative stabilities of list itemsand reduction of the list to a shorter one by eliminating least stable items. Theresult is a 40-item referent list. The method for determining stabilities isexplained, as well as a method for comparing the classificatory performance ofdifferent-sized reduced lists with that of the full 100-item list. A statisticalinvestigation of the relationship of lexical similarity of languages to their
geographical proximity is presented. Finally, we test the possibility thatinformation involving typological features of languages can be combined withlexical data to enhance classificatory accuracy.1. Summary of Brown et al. (2008)The earlier paper describes a procedure for automated comparisons of word lists—henceforth ASJP for ‘automated similarity judgment program’. The approachon the whole is similar to lexicostatistics (Swadesh 1950, 1955), but differs in twofundamental ways:(1) the judgment of similarities is done by a computerprogram following a consistent set of rules, and (2) graphic branching structuresillustrating language relatedness (family trees) are generated through use ofstandard software and algorithms originally developed for the use of biologists instudying phylogenetic relationships.A 100-item Swadesh list is assembled for each language to be compared.Words on all lists are transcribed into a standardized orthography — ASJPcode —which employs only symbols of the standard QWERTY keyboard. ASJPcode has7 different vowel symbols, merging two or more vowels under a single specificsymbol when a language has more than 7 vowel qualities. Nasalized vowels areindicated, but vowel length, tone, and stress are not. The orthography alsoemploys 34 consonant symbols. These symbols are used for phonologicalsegments defined by the most common points and manners of articulation.Rarersegments are represented by symbols for the more common segments they mostclosely resemble in terms of point and manner of articulation.There are also modifiers to indicate that a single segment is composed ofthesounds corresponding to two symbols (and, occasionally three), typically in2
the cases of labialization, aspiration, and palatalization. Other modifiers indicateglottalization and nasalization. Word-initial glottal stops are not recorded and

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 34 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Spring
Professor
DavidPlansCasal
Tags
Music, languages, Pearson product moment correlation coefficient, Language family

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture