8 Pages

P94-1020

Course: P 94, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 5486

Document Preview

Disambiguation Word-Sense Using Decomposable Models and Janyce Wiebe C o m p u t i n g Research Lab and D e p a r t m e n t of C o m p u t e r Science New M e x i c o S t a t e U n i v e r s i t y Las C r u c e s , N M 88003 rbruce@cs.nmsu.edu, wiebe@cs.nmsu.edu Abstract Most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> P 94

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Disambiguation Word-Sense Using Decomposable Models and Janyce Wiebe C o m p u t i n g Research Lab and D e p a r t m e n t of C o m p u t e r Science New M e x i c o S t a t e U n i v e r s i t y Las C r u c e s , N M 88003 rbruce@cs.nmsu.edu, wiebe@cs.nmsu.edu Abstract Most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguation of the noun interest. We describe a method for formulating probabilistic models that use multiple contextual features for word-sense disambiguation, without requiring untested assumptions regarding the form of the model. Using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data. Introduction This paper presents a method for constructing probabilistic classifiers for word-sense disambiguation that offers advantages over previous approaches. Most previous efforts have not attempted to systematically identify the interdependencies among contextual features (such as collocations) that can be used to classify the meaning of an ambiguous word. Many researchers have performed disambiguation on the basis of only a single feature, while others who do consider multiple contextual features assume that all contextual features are either conditionally independent given the sense of the word or fully independent. Of course, all contextual features could be treated as interdependent, but, if there are several features, such a model could have too many parameters to estimate in practice. We present a method for formulating probabilistic models that describe the relationships among all variables in terms of only the most important interdependencies, that is, models of a certain class that are good approximations to the joint distribution of contextual features and word meanings. This class is the set of decomposable models: models that can be expressed as a product of marginal distributions, where each marginal is composed of interdependent variables. The test used to evaluate a model gives preference to those that have the fewest number of interdependencies, thereby selecting models expressing only the most systematic variable interactions. To summarize the method, one first identifies informative contextual features (where "informative" is a well-defined notion, discussed in Section 2). Then, out of all possible decomposable models characterizing interdependency relationships among the selected variables, those that are found to produce good approximations to the data are identified (using the test mentioned above) and one of those models is used to perform disambiguation. Thus, we are able to use multiple contextual features without the need for untested assumptions regarding the form of the model. Further, approximating the joint distribution of all variables with a model identifying only the most important systematic interactions among variables limits the number of parameters to be estimated, supports computational efficiency, and provides an understanding of the data. The biggest limitation associated with this method is the need for large amounts of sense-tagged data. Because asymptotic distributions of the test statistics are used, the validity of the results obtained using this approach are compromised when it is applied to sparse data (this point is discussed further in Section 2). To test the method of model selection presented in this paper, a case study of the disambiguation of the noun interest was performed. Interest was selected because it has been shown in previous studies to be a difficult word to disambiguate. We selected as the set of sense tags all non-idiomatic noun senses of interest defined in the electronic version of Longman's Dictionary of Contemporary English (LDOCE) ([23]). Using the models produced in this study, we are able to assign an LDOCE sense tag to every usage of interest in a heldout test set with 78% accuracy. Although it is difficult to compare our results to those reported for previous disambiguation experiments, as will be discussed later, we feel these results are encouraging. The remainder of the paper is organized as follows. Section 2 provides a more complete definition of the Rebecca Bruce 139 methodology used for formulating decomposable models and Section 3 describes the details of the case study performed to test the approach. The results of the disambiguation case study are discussed and contrasted with similar efforts in Sections 4 and 5. Section 6 is the conclusion. Decomposable Models sufficient statistics for the model describing contextual features one and two as independent but all other variables as interdependent are, for all i, j, k, m, n (in this and all subsequent equations, f is an abbreviation for feature): t~[count(f2 = j, f3 = k, f4 = m , tag = n)] = In this Section, we address the problem of finding the models that generate good approximations to a given discrete probability distribution, as selected from among the class of decomposable models. Decomposable models are a subclass of log-linear models and, as such, can be used to characterize and study the structure of data ([2]), that is, the interactions among variables as evidenced by the frequency with which the values of the variables co-occur. Given a data sample of objects, where each object is described by d discrete variables, let x = ( z z , z 2 , . . . , zq) be a q-dimensional vector of counts, where each zi is the frequency with which one of the possible combinations of the values of the d variables occurs in the data sample (and the frequencies of all such possible combinations are included in x). The log-linear model expresses the logarithm of E[x] (the mean of x) as a linear sum of the contributions of the "effects" of the variables and the interactions among the variables. Assume that a random sample consisting of N independent and identical tridls (i.e., all trials are described by the same probability density function) is drawn from a discrete d-variate distribution. In such a situation, the outcome of each trial must be an event corresponding to a particular combination of the values of the d variables. Let Pi be the probability that the ith event (i.e., the i th possible combination of the values of all variables) occurs on any trial and let zi be the number of times that the i th event occurs in the random sample. Then (zt, x 2 , . . . , zq) has a multinomiM distribution with parameters N and P l , . . . , Pq- For a given sample size, N, the likelihood of selecting any particular random sample is defined once the population parameters, that is, the Pi'S or, equivalently, the E[xi]'s (where E[zi] is the mean frequency of event i), are known. Log-linear models express the value of the logarithm of each E[~:i] or p; as a linear sum of a smaller (i.e., less than q) number of new population parameters that characterize the effects of individual variables and their interactions. The theory of log-linear models specifies the sufficient slatislics (functions of x) for estimating the effects of each variable and of each interaction among variables on E[x]. The sufficient statistics are the sample counts from the highest-order marginals composed of only interdependent variables. These statistics are the maximum likelihood estimates of the mean values of the corresponding marginals distributions. Consider, for example, a random sample taken from a population in which four contextual features are used to characterize each occurrence of an ambiguous word. The E Xfx=i,f2=j,f3=k,f4=m,tag=n i and l~[count(fl = i, f3 = k, f4 = m , tag = n)] = E Xfa=i,f2=j,f3=k,f4=rn,tag=n J Within the class of decomposable models, the maximum likelihood estimate for E[x] reduces to the product of the sufficient statistics divided by the sample counts defined in the marginals composed of the common elements in the sufficient statistics. As such, decomposable models are models that can be expressed as a product of marginals, 1 where each marginal consists of only interdependent variables. Returning to our previous example, the maximum likelihood estimate for E[x] is, for all i , j , k, m , n: E[z11=i,l~=j,13=k,1,=m,t~g=n ] = ]~[count(fl = i, f3 = k, f4 = m , tag -- n)] ]~[count(f2 = j, f3 = k, f4 = m , tag = n)] -]~[count(/a = k, f4 = m , tag = n)] Expressing the population parameters as probabilities instead of expected counts, the equation above can be rewritten as follows, where the sample marginal relative frequencies are the m a x i m u m likelihood estimates of the population marginal probabilities. For all i,j,k,m,n: P ( f t = i, f2 = j, f3 = k, f4 = m , tag -- n) = = i = A = m, tag = n) P ( f 2 = j I f3 = k, f4 = m , t a g = P ( f 3 : k, f4 = m , t a g = n) n) The degree to which the data is approximated by a model is called the fit of the model. In this work, the likelihood ratio statistic, G 2, is used as the measure of the goodness-of-fit of a model. It is distributed asymptotically as Xz with degrees of freedom corresponding to the number of interactions (and/or variables) omitted from (unconstrained in) the model. Accessing the fit 1The marginal distributions can be represented in terms of counts or relative frequencies, depending on whether the parameters are expressed as expected frequencies or probabilities, respectively. 140 of a model in terms of the significance of its G 2 statistic gives preference to models with the fewest number of interdependencies, thereby assuring the selection of a model specifying only the most systematic variable interactions. Within the framework described above, the process of model selection becomes one of hypothesis testing, where each pattern of dependencies among variables expressible in terms of a decomposable model is postulated as a hypothetical model and its fit to the data is evaluated. The "best fitting" models are identified, in the sense that the significance of their reference X2 values are large, and, from among this set, a conceptually appealing model is chosen. The exhaustive search of decomposable models can be conducted as described in [12]. What we have just described is a method for approximating the joint distribution of all variables with a model containing only the most important systematic interactions among variables. This approach to model formulation limits the number of parameters to be estimated, supports computational efficiency, and provides an understanding of the data. The single biggest limitation remaining in this day of large memory, high speed computers results from reliance on asymptotic theory to describe the distribution of the maximum likelihood estimates and the likelihood ratio statistic. The effect of this reliance is felt most acutely when working with large sparse multinomials, which is exactly when this approach to model construction is most needed. When the data is sparse, the usual asymptotic properties of the distribution of the likelihood ratio statistic and the maximum likelihood estimates may not hold. In such cases, the fit of the model will appear to be too good, indicating that the model is in fact over constrained for the data available. In this work, we have limited ourselves to considering only those models with sufficient statistics that are not sparse, where the significance of the reference X 2 is not unreasonable; most such models have sufficient statistics that are lower-order marginal distributions. In the future, we will investigate other goodness-of-fit tests ([18], [1], [22]) that are perhaps more appropriate for sparse data. The Experiment Unlike several previous approaches to word sense disambiguation ([29], [5], [7], [10]), nothing in this approach limits the selection of sense tags to a particular number or type of meaning distinctions. In this study, our goal was to address a non-trivial case of ambiguity, but one that would allow some comparison of results with previous work. As a result of these considerations, the word interest was chosen as a test case, and the six non-idiomatic noun senses of interest defined in LDOCE were selected as the tag set. The only restriction limiting the choice of corpus is the need for large amounts of on-line data. Due to availability, the Penn Treebank Wall Street Journal corpus was selected. In total, 2,476 usages 2 of interest as a noun 3 were automatically extracted from the corpus and manually assigned sense tags corresponding to the LDOCE definitions. During tagging, 107 usages were removed from the data set due to the authors' inability to classify them in terms of the set of LDOCE senses. Of the rejected usages, 43 are metonymic, and the rest are hybrid meanings specific to the domain, such as public interest group. Because our sense distinctions are not merely between two or three clearly defined core senses of a word, the task of hand-tagging the tokens of interest required subtle judgments, a point that has also been observed by other researchers disambiguating with respect to the full set of LDOCE senses ([6], [28]). Although this undoubtedly degraded the accuracy of the manually assigned sense tags (and thus the accuracy of the study as well), this problem seems unavoidable when making semantic distinctions beyond clearly defined core senses of a word ([17], [11], [14], [15]). Of the 2,369 sentences containing the sense-tagged usages of interest, 600 were randomly selected and set aside to serve as the test set. The distribution of sense tags in the data set is presented in Table 1. We now turn to the selection of individually informative contextual features. In our approach to disambiguation, a contextual feature is judged to be informative (i.e., correlated with the sense tag of the ambiguous word) if the model for independence between that feature and the sense tag is judged to have an extremely poor fit using the test described in Section 2. The worse the fit, the more informative the feature is judged to be (similar to the approach suggested in [9]). Only features whose values can be automatically determined were considered, and preference was given to features that intuitively are not specific to interest (but see the discussion of collocational features below). An additional criterion was that the features not have too many possible values, in order to curtail sparsity in the resulting data matrix. We considered three different types of contextual features: morphological, collocation-specific, and classbased, with part-of-speech (POS) categories serving as the word classes. Within these classes, we choose a number of specific features, each of which was judged to be informative as described above. We used one morphological feature: a dichotomous variable indicating the presence or absence of the plural form. The values of the class-based variables are a set of twenty-five POS tags formed, with one exception, from the first letter of the tags used in the Penn Treebank corpus. Two different sets of class-based variables were selected. The 2For sentences with more than one usage, the tool used to automatically extract the test data ignored all but one of them. Thus, some usages were missed. 3The Penn Treebank corpus comes complete with POS tags. 141 first set contained only the POS tags of the word immediately preceding and the word immediately succeeding the ambiguous word, while the second set was extended to include the POS tags of the two immediately preceding and two succeeding words. A limited number of collocation-specific variables were selected, where the term collocation is used loosely to refer to a specific spelling form occurring in the same sentence as the ambiguous word. All of our colloeational variables are dichotomous, indicating the presence or absence of the associated spelling form. While collocation-specific variables are, by definition, specific to the word being disambiguated, the procedure used to select them is general. The search for collocationspecific variables was limited to the 400 most frequent spelling forms in a data sample composed of sentences containing interest. Out of these 400, the five spelling forms found to be the most informative using the test described above were selected as the collocational variables. It is not enough to know that each of the features described above is highly correlated with the meaning of the ambiguous word. In order to use the features in concert to perform disambiguation, a model describing the interactions among them is needed. Since we had no reason to prefer, a priori, one form of model over another, all models describing possible interactions among the features were generated, and a model with good fit was selected. Models were generated and tested as described in Section 2. of the graph and the sets of conditionally independent variables in the model. The semantics of the graph topology is that all variables that are not directly connected in the graph are conditionally independent given the values of the variables mapping to the connecting nodes. For example, if node a separates node b from node c in the graphical representation of a markov field, then the variables mapping to node b are conditionally independent of the variables mapping to node c given the values of the variables mapping to node a. In the case of Model 4, Figure 1 graphically depicts the fact that the value of the morphological variable ending is conditionally independent the of values of all other contextual features given the sense tag of the ambiguous word. ~ E LP S 1O' L2POS .I Figure 1 I Results Both the form and the performance of the model selected for each set of variables is presented in Table 2. Performance is measured in terms of the total percentage of the test set tagged correctly by a classifier using the specified model. This measure combines both precision and recall. Portions of the test set that are not covered by the estimates of the parameters made from the training set are not tagged and, therefore, counted as wrong. The form of the model describes the interactions among the variables by expressing the joint distribution of the values of all contextual features and sense tags as a product of conditionally independent marginals, with each marginal being composed of non-independent variables. Models of this form describe a markov field ([8], [21]) that can be represented graphically as is shown in Figure 1 for Model 4 of Table 2. In both Figures 1 and 2, each of the variables short, in, pursue, rate(s), percent (i.e., the sign '%') is the presence or absence of that spelling form. Each of the variables rlpos, r2pos, llpos, and 12pos is the POS tag of the word 1 or 2 positions to the left (/) or right (r). The variable ending is whether interest is in the singular or plural, and the variable tag is the sense tag assigned to interest. The graphical representation of Model 4 is such that there is a one-to-one correspondence between the nodes The Markov field depicted in Figure 1 is represented by an undirected graph because conditional independence is a symmetric relationship. But decomposable models can also be characterized by directed graphs and interpreted according to the semantics of a Bayesian network ([21]; also described as "recursive causal models" in [27] and [16]). In a Bayesian network, the notions of causation and influence replace the notion of conditional independence in a Markov field. The parents of a variable (or set of variables) V are those variables judged to be the direct causes or to have direct influence on the value of V; V is called a "response" to those causes or influences. The Bayesian network representation of a decomposable model embodies an explicit ordering of the n variables in the model such that variable i may be considered a response to some or all of variables {i + 1 , . . . , n}, but is not thought of as a response to any one of the variables {1 . . . . , i - 1}. In all models presented in this paper, the sense tag of the ambiguous word causes or influences the values of all other variables in the model. The Bayesian network representation of Model 4 is presented in Figure 2. In Model 4, the variables in and percent are treated as influencing the values of rate, short, and pursue in order to achieve an ordering of variables as described above. 142 [ ~ ~ LPS PS IO, O L 2 Figure 2 Comparison to Previous Work Many researchers have avoided characterizing the interactions among multiple contextual features by considering only one feature in determining the sense of an ambiguous word. Techniques for identifying the optimum feature to use in disambiguating a word are presented in [7], [30] and [5]. Other works consider multiple contextual features in performing disambiguation without formally characterizing the relationships among the features. The majority of these efforts ([13], [31]) weight each feature in predicting the sense of an ambiguous word in accordance with frequency information, without considering the extent to which the features cooccur with one another. Gale, Church and Yarowsky ([10]) and Yarowsky ([29]) formally characterize the interactions that they consider in their model, but they simply assume that their model fits the data. Other researchers have proposed approaches to systematically combining information from multiple contextual features in determining the sense of an ambiguous word. Schutze ([26]) derived contextual features from a singular value decomposition of a matrix of letter four-gram co-occurrence frequencies, thereby assuring the independence of all features. Unfortunately, interpreting a contextual feature that is a weighted combination of letter four-grams is difficult. Further, the clustering procedure used to assign word meaning based on these features is such that the resulting sense clusters do not have known statistical properties. This makes it impossible to generalize the results to other data sets. Black ([3]) used decision trees ([4]) to define the relationships among a number of pre-specified contextual features, which he called "contextual categories", and the sense tags of an ambiguous word. The tree construction process used by Black partitions the data according to the values of one contextual feature before considering the values of the next, thereby treating all features incorporated in the tree as interdependent. The method presented here for using information from multiple contextual features is more flexible and makes better use of a small data set by eliminating the need to treat all features as interdependent. The work that bears the closest resemblance to the work presented here is the maximum entropy approach to developing language models ([24], [25], [19] and [20]). Although this approach has not been applied to wordsense disambiguation, there is a strong similarity between that method of model formulation and our own. A maximum entropy model for multivariate data is the likelihood function with the highest entropy that satisfies a pre-defined set of linear constraints on the underlying probability estimates. The constraints describe interactions among variables by specifying the expected frequency with which the values of the constrained variables co-occur. When the expected frequencies specified in the constraints are linear combinations of the observed frequencies in the training data, the resulting maximum entropy model is equivalent to a maximum likelihood model, which is the type of model used here. To date, in the area of natural language processing, the principles underlying the formulation of maximum entropy models have been used only to estimate the parameters of a model. Although the method described in this paper for finding a good approximation to the joint distribution of a set of discrete variables makes use of maximum likelihood models, the scope of the technique we are describing extends beyond parameter estimation to include selecting the form of the model that approximates the joint distribution. Several of the studies mentioned in this Section have used interest as a test case, and all of them (with the exception of Schutze [26]) considered four possible meanings for that word. In order to facilitate comparison of our work with previous studies, we re-estimated the parameters of our best model and tested it using data containing only the four LDOCE senses corresponding to those used by others (usages not tagged as being one of these four senses were removed from both the test and training data sets). The results of the modified experiment along with a summary of the published results of previous studies are presented in Table 3. While it is true that all of the studies reported in Table 3 used four senses of interest, it is not clear that any of the other experimental parameters were held constant in all studies. Therefore, this comparison is only suggestive. In order to facilitate more meaningful comparisons in the future, we are donating the data used in this experiment to the Consortium for Lexical Research (ftp site: clr.nmsu.edu) where it will be available to all interested parties. Conclusions and Future Work In this paper, we presented a method for formulating probabilistic models that use multiple contextual features for word-sense disambiguation without requiring untested assumptions regarding the form of the model. In this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data. Further, different types of variables, such as class-based and collocation-specific ones, can be used in combination 143 with one another. We also presented the results of a study testing this approach. The results suggest that the models produced in this study perform as well as or better than previous efforts on a difficult test case. We are investigating several extensions to this work. In order to reasonably consider doing large-scale wordsense disambiguation, it is necessary to eliminate the need for large amounts of manually sense-tagged data. In the future, we hope to develop a parametric model or models applicable to a wide range of content words and to estimate the parameters of those models from untagged data. To those ends, we are currently investigating a means of obtaining maximum likelihood estimates of the parameters of decomposable models from untagged data. The procedure we are using is a variant of the EM algorithm that is specific to models of the form produced in this study. Preliminary results are mixed, with performance being reasonably good on models with low-order marginals (e.g., 63% of the test set was tagged correctly with Model 1 using parameters estimated in this manner) but poorer on models with higher-order marginals, such as Model 4. Work is needed to identify and constrain the parameters that cannot be estimated from the available data and to determine the amount of data needed for this procedure. We also hope to integrate probabilistic disambiguation models, of the type described in this paper, with a constraint-based knowledge base such as WordNet. In the past, there have been two types of approaches to word sense disambiguation: 1) a probabilistic approach such as that described here which bases the choice of sense tag on the observed joint distribution of the tags and contextual features, and 2) a symbolic knowledge based approach that postulates some kind of relational or constraint structure among the words to be tagged. We hope to combine these methodologies and thereby derive the benefits of both. Our approach to combining these two paradigms hinges on the network representations of our probabilistic models as described in Section 4 and will make use of the methods presented in [21]. Acknowledgements The authors would like to thank Gerald Rogers for sharing his expertise in statistics, Ted Dunning for advice and support on software development, and the members of the NLP group in the CRL for helpful discussions. References [1] Baglivo,...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - P - 06
Segment-based Hidden Markov Models for Information ExtractionZhenmei Gu Nick Cercone David R. Cheriton School of Computer Science Faculty of Computer Science University of Waterloo Dalhousie University Waterloo, Ontario, Canada N2l 3G1 Halifax, Nova
UPenn - FNACT - 99
UNIFORM STATUTE AND RULE CONSTRUCTION ACT (1995) Drafted by the NATIONAL CONFERENCE OF COMMISSIONERS
Wisconsin - CASE - 19
Jan 5, 2003 McMurdo Station 00 UTC140012001000800 Theta 6004002000 268269270271272273274275276277278
Wisconsin - CASE - 7
Dec 18, 2000 McMurdo Station 00 UTC1600140012001000800T Td6004002000 -25 -20 -15 -10 -5 0 5 10 15 20
Wisconsin - CASE - 11
Jan 20, 2002 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -30 -25 -20 -15 -10 -5 0 5 10 15
Wisconsin - CASE - 15
Oct 11, 2002 McMurdo Station 12 UTC140012001000800600T Td U V SPD4002000 -35 -25 -15 -5 5 15
Wisconsin - CASE - 11
Jan 16, 2002 McMurdo Station 12 UTC140012001000800600T Td U V SPD4002000 -30 -25 -20 -15 -10 -5 0 5 10 15
Wisconsin - CASE - 15
Oct 12, 2002 McMurdo Station 12 UTC140012001000800600T Td U V SPD4002000 -35 -25 -15 -5 5 15
Wisconsin - CASE - 16
Oct 28, 2002 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -35 -25 -15 -5 5 15 25
Wisconsin - CASE - 21
Feb 3, 2003 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -25 -20 -15 -10 -5 0 5 10 15 20
Wisconsin - CASE - 18
Dec 30, 2002 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -20 -15 -10 -5 0 5 10
Wisconsin - CASE - 30
Feb 17, 2005 McMurdo Station 12 UTC1000 900 800 700 600 500 400 300 200 100 0 260 Theta262264266268270272
Wisconsin - CASE - 21
Feb 3, 2003 McMurdo Station 12 UTC140012001000800 Theta 6004002000 260262264266268270272274
Wisconsin - CASE - 4
Oct 29, 2000 McMurdo Station 12 UTC140012001000800600T Td U V SPD4002000 -35 -30 -25 -20 -15 -10 -5 0 5 10 15
Wisconsin - CASE - 16
Oct 28, 2002 McMurdo Station 00 UTC140012001000800 Theta 6004002000 255256257258259260261262263
Wisconsin - CASE - 17
Dec 7, 2002 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -30 -25 -20 -15 -10 -5 0 5
Wisconsin - CASE - 24
Jan 12, 2004 McMurdo Station 00 UTC140012001000800 Theta 6004002000 266268270272274276
Wisconsin - CASE - 25
McMjurdo Station Jan 17, 2004 00 UTC140012001000800600T Td U V SPD4002000 -30 -25 -20 -15 -10 -5 0 5 10 15
Wisconsin - CASE - 14
Oct 5, 2002 McMurdo Station 00 UTC140012001000800 Theta 6004002000 250252254256258260262264
Wisconsin - CASE - 18
Dec 30, 2002 McMurdo Station 12 UTC140012001000800 Theta 6004002000 270271272273274275276
Wisconsin - CASE - 6
Nov 29, 2000 McMurdo Station 00 UTC140012001000800 Theta 6004002000 268270272274276278280
Wisconsin - CASE - 29
Feb 4, 2005 McMurdo Station 12 UTC1000 900 800 700 600 500 400 300 200 100 0 -20 -15 -10 -5 0 5 10T U V SPD Td
Wisconsin - CASE - 30
Feb 18, 2005 McMurdo Station 12 UTC1000 900 800 700 600 500 400 300 200 100 0 -25 -20 -15 -10 -5 0 5 10T U V SPD Td
Wisconsin - CASE - 18
Dec 29, 2002 McMurdo Station 00 UTC140012001000800600T Td U V SPD4002000 -20 -15 -10 -5 0 5 10
Wisconsin - CASE - 25
McMurdo Station Jan 18, 2004 00 UTC140012001000800600T Td U V SPD4002000 -30 -25 -20 -15 -10 -5 0 5 10
Wisconsin - CASE - 6
P H DelZ Richardson Theta T Td U V AveT DelT RiT RiB SPD DIR DelV DelU 2.8 32.9 0.0 0.0000 1443.65 -4.4 -5.4 8.7 -7.3 0.0 0.0 0.0 0.0 11.3 130 0.0 0.0 994.6
Wisconsin - CASE - 1
There is Oct 4 0 UTC data in the Oct 5 0 UTC file. There appears to be no data for Oct 5 at 0 UTC.M. Lazzara Feb, 2007
Wisconsin - CASE - 6
P H DelZ Richardson Theta T Td U V AveT DelT RiT RiB SPD DIR DelV DelU 5.6 32.9 0.0 0.0000 1187.12 -3.7 -4.2 4.8 4.0 0.0 0.0 0.0 0.0 6.2 50 0.0 0.0 0.0
Wisconsin - CASE - 16
P H DelZ Richardson Theta T Td U V AveT DelT RiT RiB SPD DIR DelV DelU 988.1 32.9 0.0 0.0000 262.05 -12.0 -18.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 980.9
Wisconsin - CASE - 21
P H DelZ Richardson Theta T Td U V AveT DelT RiT RiB SPD DIR DelV DelU 985.1 32.9 0.0 0.0000 285.47 11.1 7.0 9.3 0.0 0.0 0.0 0.0 0.0 9.3 90 0.0 0.0 975.8
Wisconsin - CASE - 7
P H DelZ Theta T Td AveT DelT RiT 7.0 32.9 0.0 1112.49 -4.0 -17.2 0.0 0.0 0.0 995.8 121.9 89.0 268.47 -5.0 -15.7 690.5 * * 988.6 178.9 57.0 268.63 -5.4 -16.0 268
UPenn - EMTM - 900
Search, Search Mechanisms and Price Discrimination in Electronic MarketsRavi AronSearchIn the beginning There were no brands or so the economists would have us believe. If two stores that sold exactly the same products were located next to each
Wisconsin - ENGR - 476
Task and Scheduling Enhancement at UWHC Internal Medicine ClinicsAndrew Forecki Melinawati Tedjo San Phanphiphat Curtis LandryCurrent ProgressMet with clinic managers to discuss the goals of observation Completed 8/10 observation trips (5 clin
Wisconsin - PC - 2002
Distributed Network Monitoring in the Wisconsin Advanced Internet LabPaul Barford Computer Science Department University of Wisconsin Madison Spring, 2002Motivation Many applications that run over the Internet have minimum performance requiremen
Wisconsin - PCW - 2008
Grid Computing at The HartfordCondor Week 2008 Robert Nordlundrobert.nordlund@hartfordlife.comAbout The Hartford Headquartered in Hartford, CT Founded in 1810 Fortune 100 31,000 Employees Worldwide $26.5 Billion Revenues $2.9 Billion Core
Wisconsin - CONDORWEEK - 2005
Managing S toragewith Ne T SNick Le & Je We r Roy ff be C pute S nce De om r cie s partm nt e Unive rsity of Wisconsin-Madison condor-adm cs.wisc.e in@ du http:/www.cs.wisc.e du/condor/ne stOve w of Ne T rvie S NeST: Network StorageTechnology L
UPenn - C - 92
The Typology of Unknown Words: An Experimental Study of Two CorporaXiaobo Ren and Francois Perraultxren@ccrit.doc.ca, perra ult@ ccrit.doc.ca CCRIT, Communications Canada, 1575 Chomedey Bid, Laval, Qu6bec, Canada, H7V 2X2Table of contentsIntrodu
UPenn - T - 87
They say it's a new sort of engine: but the SUMP's still there Karen Sparck Jones Computer Laboratory, University of Cambridge Corn Exchange Street, Cambridge CB2 3QG, England sparckjones%cl.cam.ac.uk@cs.ucl.ac.uk I shall lump the specific semantic f
UPenn - P - 84
T H E S Y N T AX A N D SEMANTICS OF USER-DEFINED MODIFIERS IN A TRANSPORTABLE NATURAL LANGUAGE PROCESSORBruce W. Ballard Dept. of C o m p u t e r Science D u k e University Durham, N.C. 27708ABSTRACTThe Layered Domain Class system (LDC) is an e
UPenn - J - 01
Book ReviewsKnowledge Representation: Logical, Philosophical, and Computational Foundations John E SowaPacific Grove, CA: Brooks/Cole, 2000, xiv+594 pp; hardbound, ISBN 0-534-94965-7, $67.95Reviewed by Stuart C. Shapiro University at Buffalo, The
UPenn - C - 00
A Statistical Theory of Dependency SyntaxChrister Salnuelsson Xerox Resem:ch Centre Europe 6, c h e m i n d e M a u p e r t u i s 38240 M e y l a n , F R A N C E C h r i s ' c e r . S a m u e : l . s s o n x r c e . x e r o x , comAbstractA gene
UPenn - CSE - 381
CIS381 Tutorial #2 Shells, Redirection, and Job Control**With a little Signal Info thrown in for good measure.Brought to you in living color by your friendly co-instructors: Sandy Clark and Micah Sherr September 16, 2008TopicsAdvanced Process R
UPenn - CSE - 330
Relational Query OptimizationSusan B. DavidsonUniversity of Pennsylvania CIS330 Database Management SystemsNovember 20, 2008Slide content courtesy of Raghu Ramakrishnan.Highlights of System R Optimizer Impact:Most widely used currently;
UPenn - C - 82
COLING 82, J. Horeck~ (ed.) North.Holland Publialu'ngCompany Academ~ 1982Towardsa mechanicalanalysisof F r e n c htenseformsin texts.I)ChristianRohrer StuttgartUniversit~tIn this p a p e r we w a n t to p r e s e n t a s y s
UPenn - C - 82
THE TRANSFER OP FINITE VERB FORMS IN A MACHINE TRANSLATION SYSTEMBenteMaegaardInstitut for anvendt og matematlsk lingvlstik, K~benhsvns Universltet NJalsgade 96, DK-2300 K~benhavn S, De:m~arkT h i s p a p e r i s based on work done J o i n t
UPenn - C - 92
PRI~;DICTING NOUN~PIIRASE SURFACh; I~'ORMS USING Q~ONTEXTUAL [NFORMA'PION Takayuki Y A M A O K A : Hitoshi IIDA~ and Hidekazu AILITA~ ATR Interpreting Telephony Research Laboratories, Souraku-guu, Kyoto, JAPAN tMitsubishi Electric Corporation, Amagas
UPenn - H - 05
Combining Multiple Forms of Evidence While FilteringJamie Callan Yi Zhang Information System and Technology Management Language Technologies Institute School of Engineering School of Computer Science University of California, Santa Cruz Carnegie Me
UPenn - C - 82
A PROCEDURE OF AN AUTOMATIC G R A P H E ~ E - T O - P H O ~ TRANSFORHATION OF GERMANSabine Koch, Wolfgang Menzel, Ingrid Starke Zentralinstitut fur Sprachwissenschaft, AdW DDR, Berlin, DDRThe automatic transformation of texts graphemically stored
UPenn - C - 90
AN I N T E G R A T E D S Y S T E M F O R M O R P H O L O G I C A L LANGUAGEANALYSIS OF THE SLOVENEToma~ Erjavec, Peter Tancig NLU Lab., Department of Computer Science and Informatics Joker Stefan Institute Jamova 39, 61000 Ljubljana YugoslaviaAB
UPenn - P - 04
Part-of-Speech Tagging Considering Surface Form for an Agglutinative LanguageDo-Gil Lee and Hae-Chang Rim Dept. of Computer Science & Engineering Korea University 1, 5-ka, Anam-dong, Seongbuk-ku Seoul 136-701, Korea dglee, rim @nlp.korea.ac.kr Abstr
UPenn - C - 90
A MORPHOLOGICALPARSERFOR AFRIKAANSL G deStadlerand M W C o e t z e r of S t e l l e n b o s c h AfricaUniversity SouthThe p a r s e r has in a t e x t - t o - s p e e c hbeen developed system.as a c o m p o n e n tThe system, w h
UPenn - A - 94
Spelling Correction in Agglutinative LanguagesKemal Oflazer and Cemaleddin G/izey D e p a r t m e n t of C o m p u t e r E n g i n e e r i n g a n d I n f o r m a t i o n Science Bilkent University A n k a r a , 06533, T u r k e y ko@cs, bilkent, ed
UPenn - CSE - 140
UPenn - CSE - 140
CSE 140 Assignment 1 Solutions1. Running AC-3 on the cheese leaves the following labels: L1, L5, L6 12 A1, A3 5 3 4 L1, L5, L6 Result of AC-3 on cheese drawing.A1, A3L1, L5, L6Since there were ten different ways of arriving at this answer, I
UPenn - ECON - 002
PART 1 OF 3 NAME_RECITATION INSTRUCTOR_Instructions for Professor Eudey's Spring 2004 Econ 2 midterm: There are 3 parts to the exam. Write your answers in the space provided. This is a 60-minute examination. You have ten minutes for review.
UPenn - ECON - 002
Econ 2 Second Midterm Makeup Exam Honors ExamInstructions for Dr. Eudey's Spring 2005 Honors Econ 2 midterm: Write your answers in the bluebooks provided. This is a 60-minute examination. You have ten minutes for review. Show all work. Use di
UPenn - ECON - 001
Econ Honors: Midterm 1 (Anthony Yuen) October 13, 2007Instructions: This is a 60-minute examination. Show all work. Use diagrams where appropriate and label all diagrams carefully. This exam is given under the rules of Penn's Honor system. The
UPenn - ECON - 001
Econ Honors: Midterm 2 (Anthony Yuen) November 14, 2007Instructions: This is a 60-minute examination. Show all work. Use diagrams where appropriate and label all diagrams carefully. This exam is given under the rules of Penn's Honor system. The
UPenn - CSE - 380
University of Pennsylvania CSE380 Operating Systems 1st Midterm Exam 10/12/2004 c 2004 Matt BlazeInstructions: Write your answers in an exam book. Remember to print your name clearly on the exam book cover, to erase or cross out any material you
UPenn - CSE - 380
CSE 380 - Operating SystemsNotes for lecture 4 - 9/20/05 Matt BlazeInter-process communication So far, the process model has been a useful way to isolate running programs separate resources, state, etc narrow communication channel (wait, kill,
UPenn - CSE - 380
CSE380 - Operating SystemsNotes for Lecture 8 - 10/4/05 Matt Blaze (some examples by Insup Lee)Frequently Asked Questions about the midterm What will the exam cover? reading and class notes through 10/6/05 How should I study for it? emphasis