This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Ling 110Lg Lab Study 2 Corpora Studies General This is a long handout, so it looks like lots of work, but actually, it is long because it is very detailed and has illustrations in it, so not to worry we think you will be able to complete the lab in the allotted time without any problems. One of the primary sources for learning about language has been the study of corpora that is, the study of texts which exist, and which we can scrutinize for issues having to do with vocabulary, word order, relations between words, etc. The study of corpora is important in the development and the confirmation of both morphological theories and syntactic theories; it is extremely helpful in studying the organization of concepts; it is a crucial tool in the study of language development in children, and has been massively useful in the development of computer applications for language. At this time and age, given the prevalence of computer technology and the virtually unlimited memory computers now have, the study of corpora has become considerably easier, and hence, considerably more valuable in the development and the confirmation of language-related studies. This lab is intended to give you a taste of what can be done with such corpora studies. Corpora studies sometimes appear mechanical, but one should never forget that a large part of the work of the scientist is simply collecting facts. Admittedly, collecting facts is not nearly as much fun as telling stories about them, but without facts, a scientific theory has no leg to stand on In this lab you will learn: How to use computer tools to calculate word frequency in texts How to use computer tools to compare word frequency in different texts How to use computer tools to find collocates in a text The texts we will be using are approximately one third of the novel Emma by Jane Austen, and approximately one third of the novel Sense and Sensibility, also by Jane Austen. We have downloaded these texts from the web and shortened them a bit. If you are interested in downloading texts of books, we recommend Project Gutenberg which you can find at http://gutenberg.org . Whether you decide to visit that site is up to you and is NOT PART OF THIS ASSIGNMENT. You will be using the computer program W ORDSMITH TOOLS to analyze text. Follow closely the instructions in the following pages. You can take notes for yourself either at the back of these pages, or in a notebook. WORDSMITH TOOLS is not a shareware, and is only available on USC computers. We therefore strongly advise you to complete the 'computer' part of your lab in the lab . If you do not, you will have to complete it on your own in some computer facility at USC. On the instruction sheet below, it is specified which of the steps can be completed at home, and which must use the software WordSmith Tools and hence cannot be completed at home....
View Full Document