elsif count 0 delete tref chunk else die Decrementing a chunk chunk not in the

Elsif count 0 delete tref chunk else die decrementing

This preview shows page 331 - 334 out of 514 pages.

elsif ($count == 0) { delete $$tref { $chunk } ; } else { die "Decrementing a chunk ( \ "$chunk \ ") not in the table"; } } return $tref; } We would be justified in working fairly hard to make this computation effi- cient; fortunately, Perl has done a very good job of implementing hash tables, so the basic lookup implied by { $chunk } will do fine for most conceivable applications. Notice that there are no explicit hash objects in the two routines (hash object names would start with "%" ). That’s a bit startling and not ideal for reading the code, but natural to the use of references in Perl . Whenever we
Image of page 331
318 CHAPTER 8. COMPUTING WITH TEXT access or set an element of the hash table, we refer to the scalar element (a leading "$" ) of the reference to the table, which is itself a scalar, $tref ; hence, the prevalent idiom of $$tref$chunk . These routines do some error checking, and use the standard Perl die statement. If that sounds a bit drastic in code intended to be used from R , not to worry. The RSPerl interface does a nice job of wrapping the resulting error message and exiting the calling R expression cleanly, with no permanent damage. 8.6 Examples of Text Computations In this section we examine or re-examine some examples, looking both at R and Perl . Choosing and designing computations for text data involves many tradeoffs. Nearly any example can be treated in multiple ways, more than one of which might be suitable depending on the experience of the programmer and/or the size or detailed characteristics of the application. The examples illustrate some of the choices and the tradeoffs they in- volve. Data with repeated values A common departure from strictly “rectangular” or data-frame-like struc- ture comes when some variables are observed repeatedly, so that the ob- servation is not a single number or quantity, but several repetitions of the same quantity. If the number of repetitions varies from one observation to the next, the data has a list-like structure: in R terminology, each observa- tion is an element in the list consisting of a vector of the values recorded for that observation. Either R or Perl can deal with such data in a simple way. The differences are useful to consider. To import such data, there must be a way to distinguish the repeated values from other variables. The simplest case diverts the repeated values to a separate file, written one line per set of repeated observations. In Section 8.2, page 296, we showed a computation for this case, based on reading the lines of repeated values as separate strings and then splitting them by calling strplit() . Here’s an alternative, allowing a more flexible form of data. In this version, successive lines may have different formats, provided each of the lines is interpretable by the scan() function. The lines might come in pairs, with the first line of each pair having non-repeated variables and the second the repeated values.
Image of page 332
8.6. EXAMPLES OF TEXT COMPUTATIONS 319 For example, the first line might be data for a state in the United States, with the abbreviation, population, area, and center (as in the state data of the R datasets package). The following line might list data for the largest
Image of page 333
Image of page 334

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture