Unformatted Document Excerpt
Coursehero >>
Florida >>
University of Florida >>
GEO 6938
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Methods
Defining Nearest-Neighbor "connectivity" between points Point data can be used in various ways to measure the degree to which the point pattern exhibits spatial autocorrelation.
But first, care must be taken in describing the nature of connectivity or the degree to which any two points are likely to be defined as "connected" the extent to which they might be dependent based on distance or separation.
Conversion: Point data Areal Unit data Point data may also be converted to polygons to better understand the spatial distribution of potential point influence as defined by the implied areal units which divide up a study area based on an observed point pattern.
In short, the point pattern itself may also be used to create areal units (polygons) that divide up a study area. Instead of overlaying a grid or a series of cells, polygons may be drawn/derived to reflect the spacing of points and the extent to which individual points are separated from their neighbors... in effect to create polygons where all locations within a given polygon are closer to a corresponding point than to any other points in the study region (while allowing points to share polygon boundaries).
Common technique: convert point data polygon data
An alternative to area/grid-based methods or quadrat counts... is to evaluate the pattern using "Voronoi-Thiessen polygons" (aka Voronoi tessellation), by dividing the study region or plane up into non-overlapping polygons (or space into polyhedra).
The construction of a net of Thiessen polygons may proceed in two ways (a) the inter-point line method; and (b) the intersecting circles method.
Voronoi-Thiessen polygons
Thiessen polygon (shown in gray) constructed from k=5 surrounding or neighboring points.
Classic application of Thiessen polygons to John Snow's 1854 study of the spatial incidence of Cholera (cases shown in red) in the city of London; an outbreak said to be linked to tainted drinking water from public wells (blue dots).
In this case well locations were used to create the polygons.
Inter-point line method: 1. Lines are drawn from each point to each adjacent point; 2. Each of these inter-point lines is `bisected" to yield the midpoint of the line; and 3. From each mid-point, a boundary line is drawn at right-angles to the original inter-point line to create a series of convex polygons, such that the area within each polygon is nearer to the enclosed point than it is to any other point. Such polygons have been widely used in the central place theory literature to define the sphere of influence of urban centers under the assumption that each center dominates the area that lies nearest to it (geometrically speaking).
An illustration of the inter-point line method:
network based on adjacency
bisectors
Thiessen polygons
A B C
Intersection of bisector extension lines now forms corner of polygon
Kopec's intersecting circles method for delineation of Voronoi-Thiessen polygons: 1. Consider two adjacent points which are at a distance d apart. 2. Draw circles of radius d centered on each of the points. 3. The side of the polygon (between two points) is located by drawing a line through the points of intersection of the circles... a process that is continued until all sides of all polygons are identified.
An illustration of Kopec's intersecting circles method:
Another look at a series of Voronoi Thiessen polygons
The size (areal coverage) or perimeters of the generated Thiessen polygons can be used to test if a point pattern is significantly different from a random pattern based on expectations.
Thiessen Polygons are being used for edge detection in landscapes to separate or sub-divide continuous populations using information on barriers or genetic diversity.
States and their capitals with political boundaries
States represented as Thiessen polygons using capitals as nodes representing their center of gravity (equally weighted)
GIS Applications
Service facility locations
Zip coded dataset for service points in a regional service area in the state of Calif.
Zip coded polygons have a population attribute (say number of households)
Source: ET-GEO Wizards
Step 1. Build Thiessen Polygons over entire area
(using buffered convex hull Option to ensure large enough Buffer distance)
Step 2. Clip the Thiessen Polygons with ZIP code polygons to expose service areas.
(using buffered convex hull Option to ensure large enough Buffer distance)
Step 4. Service areas are Step 3. Isolate resulting service area for the smaller then overlaid with demand polygons region
Step 5. Transfer the population Step 6. Transfer attributes from attributes from the ZIP polygons the Thiessen polygons to the to the Thiessen polygons using Service Points using Spatial Join the "count" (sum proportion) function... yielding total number option. of households per Thiessen polygon size of circle... to closest service facility.
Point-pattern analysis using derived polygon area Given a point density parameter which defines (n/area), there are "expectations" regarding the area and perimeter of the system of Thiessen polygon obtained when the underlying point pattern is an outcome of a simple Poisson process to randomly allocate n points: (mean) parameter expected values variance polygon perimeter E() = 4 / na* polygon area E() = 1 / 0.28018 / * Note that the variance of the polygon perimeter variable must be found using simulations (based on permutations randomly allocating n points to a study area and summarizing the distributions of various statistics from a very large number of realizations).
Test statistic: Z Although the sampling distributions of these parameters are not known in a usable form, tests of significance can still be carried out when n is sufficiently large (n >> 40) and we appeal to normality theory as it relates to the central limit theorem. In the case of polygon area (), a logical test to employ would involve the use of standard normal deviates (Z), where ^ (1 / ) Z=
observed
[0.28018 / ] / (n)
For the case of polygon perimeter, the sampling distribution could be constructed empirically (via results of a Monte Carlo simulation experiment).
Visual and Network-related Tools for Analyzing Point Patterns..
Alternative ways to define connectivity or connectedness
Neighborhood networks or graphs
Point patterns may also be analyzed as a "graph" (from graph theory) as a system of points (vertices) which are connected by links (edges) as a "network" of points
Consider the following distribution of points
Various graphs can be used to denote the features of this point pattern and its underlying spatial structure or connectivity;
r
1. Disc or sphere graph -- inserting an edge between two points p(i) and p(j), i j, whenever the distance between p(i) and p(j) is smaller than some given or predetermined value r. Strategy: continue increasing r incrementally until all points are connected, while retaining connections up to that point.
2. Gabriel graph formed by inserting an edge between two points x and y in N if the closed disc centered at (x + y)/2 does not contain any other point in N.
Note: In this case we are talking about a fully connected graph.
The Gabriel graph algorithm connects points in ascending order of proximity and continues until each point is connected to at least one other point in the study area and meets the stated restriction.
y x y x
No edge connecting point pair x and y
Edge connecting point pair x and y
Note that (x + y)/2 is the midpoint of the line connecting points x and y. The Gabriel graph is a sub-graph of the Delaunay triangulation (i.e., every edge of the Gabriel graph is also an edge of the Delaunay graph). midpoint
3. Delaunay Tessellation the "dual" of the Voronoi tessellation, constructed from the triangles formed by the points where connecting lines are drawn across shared boundaries of the polygons ( also called Delaunay triangulation or Delaunay graph)... where there are no cross-over connections.
Delaunay Tesselations/Triangulations (shown in black) with Voronoi Tesselations/Thiessen Polygons (shown by the light gray lines)
Another look at Thiessen polygons (broken lines) and Delaunay Tesselations or Triangulations (solid lines denoting connections between points based on minimum distance)
r
Note: radius r can vary across locations depending on the size of nodes (for a hierarchy) or variability in point density over space.
3. Sphere-of-influence graph inserting edges between points x and y in N wherever they fall on the boundary or within the disc of radius r centered about a given point or location (which can be constant or vary over space in response to point density or mass).
reflexive point pair
non-reflexive point pair
4. Nearest-Neighbor graph formed by inserting an edge between two points x and y in N that are nearest neighbors (note that there are reflexive and non-reflexive pairs of points).
k=3 nearest neighbors
5. k-Neighbor graph (k>1) shown for k=3, where edges are inserted between two points x and y in N whenever y is one of the k-nearest neighbors of x (for k=1,2,.. m), starting with the first-nearest neighbor, then the second-nearest, the third-nearest,... up to some order m.
Each point is connected to at least 3 neighbors depending on reflexivity
6. Minimal Spanning Tree a connected graph, tree or network with V vertices and V-1 linkages; minimally, yet fully connected with minimal aggregate travel distance (minimum total edge length), and linkages between closest sets of neighboring points.
o+
Radiating arms
n.b. Total edge or linkage length is typically not minimized as in in the case of a minimal spanning tree.
7. Radial spanning tree a "tree" in the graph theoretic sense (non-directed, acyclic, and connected), which is defined with respect to one particular location (a focal point that is called the "origin" o or a central location or center of gravity from which all links/arms radiate from).
Steiner Trees -Minimal Spanning Trees (MSTs) that provide solutions that minimize network length for a spanning tree rather than minimizing the length of a network of straight line segments connecting all points or vertices by permitting the inclusion of new or intermediate points.
Steiner Trees based on a (hypothetical) network
Adding intermediate points can reduce the network length, by a factor of up to (3/2). For example, any three neighboring points or vertices may be connected using an intermediate point (also known as a Steiner point)... effectively reducing the local minimal spanning tree (LMST), iff the angle between the two legs or links of the LMST is less than 120 and the points or vertices are un-weighted. [Many times, the optimal placement of a Steiner point is at an intermediate location where the angles between any two of the three original points formed by lines leading from the Steiner point are equal (120)].
Consider a set of points that are connected in minimal fashion. Minimal Spanning Tree (MST)
Note: < 120
Minimal Spanning Tree (MST)
MST w/1 Steiner Point a.k.a. Steiner Tree
a c
a
Points a and c not connected
Hence, points a and c are now connected
c
b
b Note: resultant angles = 120 Steiner Point
Note: < 120
MST w/1 Steiner Point
modified MST
Minimal Spanning Tree MST w/2 Steiner Points a.k.a. Steiner Tree
modified MST
a
Hence, points a and c are now connected
a c b Note: resultant angles = 120 Steiner Points
c
b
Note: resultant angles = 120
An alternative Steiner Tree
modified MST
a
c
b
Note: resultant angles = 120
Steiner Points
Yet another possible Steiner Tree
modified MST
a c b
Note: resultant angles = 120
Steiner Points
Multiple Stenier points may be added to a network. Note, however, that an MST may be systematically modified through the addition of Steiner points, and solutions may be found to reduce total network length, yet these solutions will not necessarily result in a Minimal Length Steiner Tree or MLST.
More advanced algorithms must be applied to identify minimal length Steiner trees, with considerations given to the direction and weighting of flows and the weighting of vertices or points in terms of influence of importance. Steiner solutions derived from any given number of new nodes (k) added to a network may also involve inter-linked Steiner points. This also further complicates the elusive search for a MLST.
Analysis of line segment lengths
The patterns or line segment lengths found by connecting points can be analyzed and compared to various outcomes for a random point distribution process (using simulation analysis to generate the distributions of expected outcomes via Monte Carlo methods or the use of various theoretical probability distributions)... beyond the scope and objective of this course. Observed Point Pattern for a given number of points n Find total link distance using a given criterion versus Generated Random Point Pattern (Simulated) m runs create theoretical sampling distribution for n points and compare observed value to tail values
Distance Methods for the Statistical Analysis of Spatial Point Patterns
Several widely used Nearest-Neighbor techniques...
Nearest Neighbor (NN) Statistics Higher-Order Nearest Neighbor Linear Nearest Neighbor Ripley's K
Note: Sometimes there is a need to identify second-order characteristics of the distances between points.
First-order properties identify the global or dominant pattern of a point-pattern distribution where is it centered, how far it spreads, point orientation, etc. Second-order properties are used to identify subregional or neighborhood patterns that exist within the overall distribution. Importance of 2nd-order Analysis: If there are clusters of points, the distribution of points may be influenced more by the local pattern rather than the regional/global pattern.
A Popular Distance-based method
Nearest-Neighbor (NN) Analysis a technique developed by botanists/ecologists (Clark and Evans, 1954), designed for measuring pattern in terms of the arrangement of a set of points in space (separated by known distances); with the points representing the location(s) of a given plant species or characteristic. The objective was to search for regularity/irregularity in a spatial process by analyzing the location tendencies of various types of plant species.
Consider a point pattern of n=8 points showing the spatial distribution of large-scale supermarkets for competing retail food chains in an urban market (shown below).
The stores appear to be regularly spaced or distributed over space. study area approx. 12x12 miles shown as square associated with the northern, eastern, western, and southern-most points.
The orientation of pairs of "nearest neighboring" points (symmetrical or asymmetrical):
A H B
A
B D C E F G
G
Illustrates a "reflexive pair"
Illustrates a "unique pair"
H
Distances are calculated for each nearest neighboring (NN) pair of points and summed to yield the total NN distance d*. The average NN distance is then calculated by dividing by the number of points.
Point (i) NN A B B A C D D F E C F D G F H G
di(NN) 5 5 4 3 4 3 3 6
Observed mean NN distance is d(NN)obs. = d* / n = 33 / 8
= 4.125 (mi.)
It is possible to calculate 3 "theoretical limiting values" of the mean NN distance for certain types of spatial point patterns (i.e., extreme or benchmark cases).
d* = 33 ..where distance (in miles) and d*
is the sum of the NN distances.
n
d* = di (NN)
i=1
1. The theoretical or "expected" mean NN distance for a point pattern that is "random" is
E( d(NN)) = d(NN)R = 1 / (2 p ), where p is point density (number of points per unit area) and p = (n / area).
In our example, the hypothetical study region is roughly 12x12 miles or an area of 144 sq. miles. Thus, point ~ density is p = 8 / 144 = .056 points per square mile.
Now we may calculate mean random NN distance d(NN)R = 1 / ( 2 0.056 ) ~ 2.11 miles = ... which will serve as our first reference number, showing the "expected value" of the mean NN distance if n=8 points were randomly allocated to an area of 144 sq. miles.
It is also possible to calculate the expected value of the mean NN distance for point allocated or arranged in a way that is "maximally dispersed"... ...generating a point pattern that conforms to a "triangular lattice" or triangle grid a dispersed point pattern where all points are of equal distance from one another (equally spaced and uniformly distributed), and when n is large, each point is surrounded by six nearest neighboring points whose outer boundary forms a hexagon composed of six equilateral triangles.
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
A triangular lattice a maximally dispersed distribution of points on a plane with adjacent equilateral triangles that form hexagons (most-efficient packing arrangement).
Hexagons and the triangular lattice
Another look at the triangular lattice
...an optimal packing arrangement
Based on the triangular lattice...
2. The theoretical or "expected" mean NN distance for a point pattern that is "maximally dispersed" is
E( d(NN)) = d(NN)md = = 2 2 / [
.5
4
3 p ],
.25
/ [3
p ]
In our example, it can be shown that dmd = 1.4142 / [1.3160 0.237 ] = 4.534 (approx.)
Another limiting value can be generated for a pattern of points that are as close as possible to their "nearest neighbors" for a pattern that is perfectly clustered in space. Theoretical perfectly clustered point pattern
3. The theoretical or "expected" mean NN distance for a point pattern that is "perfectly clustered" is
E(d(NN)) = d(NN)pc = 0.
Hence, a point pattern with a point density of 0.056 points per square unit distance will yield the following limiting values (shown in red):
perfectly clustered random maximally dispersed
0
2.11
4.534
Note that our "observed" mean NN distance d(NN)obs.is 4.125, a value that is indicative of a point pattern that is highly dispersed. In fact, it close to the maximally dispersed limiting value (as would be expected given the spatial point distribution).
Standardization of NN values the "nearest neighbor index"
A standardized nearest neighbor index (RNN) can be found by dividing the observed mean NN distance by the mean NN distance for a random point pattern; yielding a range of limiting values that are applicable for any point pattern and density. The Standardized NN index is RNN = d(NN)obs. / d(NN)R
Given that the index is standardized, the range of values of RNN is always between 0 and 2.15:
perfectly clustered random maximally dispersed
0
1.0
2.15
In our example, the observed nearest-neighbor index is 1.954.
perfectly clustered random maximally dispersed
0 The NN index is
1.0
RNN = d(NN)obs. / d(NN)R
2.15
RNN = 4.125 / 2.11 = 1.954 (approx.)
As d(NN)R = 1 / [ 2 (n / area) ] , then the nearest neighbor index may be expressed as...
RNN = d(NN)obs / {1 / [ 2
= d(NN)obs [ 2
(n / area) ] }
(n / area) ]
To test whether an observed mean nearest neighbor distance differs significantly from the expected mean for a random point pattern, the "Z-statistic" may be calculated and compared to critical values based on the normal distribution. This test statistic becomes more applicable and reliable as the number of points n increases.
The test statistic is defined as follows as Z: Z = d(NN)obs. - d(NN)R se(d)R where the standard error of the mean nearest neighbor distance (under a random process) is defined as se(d)R =
0.26136
n2 / area
... and Z is a "standard normal deviate" when n is relatively large (when say n > 30 or more approx. snd).
In our example, the standard error of the mean nearest-neighbor (NN) distance is 0.26136
82 / 144
se(d) =
~ .392 =
Thus, in our example... 4.125 2.11 Z= .392
~ = 5.14 (a very large value)
Hypothesis test-- Ho : random point pattern Ha : non-random point pattern
Note: If | Z | > | Z/2 | for a given value, the
we must "reject" the null hypothesis at the (1-) x 100% level of confidence.
In our example, testing at the = .05 level... 5.15 > 1.96 reject the null hypothesis at 95% confidence.
There is evidence to conclude that the point pattern is other than random.
Since Z is found to be significantly greater than zero, we can conclude that there is evidence to suggest that the observed point pattern is significantly more dispersed than would be expected under a random point allocation process... and, in this case, relatively close to a maximally dispersed point pattern.
Things to consider.... Note on large sample size: The Z-statistic is most reliable when the number of points is large (say n >> 40), and, therefore, its use is not recommended for "small" samples (as in the test example). One advantage of NN method: It does not require the pre-specification of an outer boundary for the study region (as we could us the most-northerly, -southerly -easterly, and westerly point)... nor does it require the calculation of an optimal quadrat size estimate.
Caveat. Nonetheless, using a Z-test and the standard normal distribution as a test criterion may be subject to criticism as the index RNN is bounded [0, 2.15] and the standard normal distribution is not [-, +]; suggesting that an adjustment be made to probability values dues to the fact that we are dealing with a truncated normal distribution). This problem is not critical, and can be overlooked given that it typically has little bearing on the statistical results.
Boundary and edge effects.. A much bigger concern Nevertheless, the demarcation of a study area before a NN analysis is undertaken is problematic. There is the issue concerning the treatment of points whose nearest neighbors lie just outside the boundary of the study area (and the possibility of re-defining the boundaries of the study area to include those points whose influence may spill over or spill inward across a boundary based on the assessment of reflexive and unique NN distances for points located at the edge of the study area).
spillover assumption
Data from the blue cells are used to obtain a representative value for the red cell in a raster GIS to take into account edge/boundary effects.
Windowing operations such as this are common in spatial analysis and provide a way to smooth the data using spatial moving averages.
Study area
Original boundary of study area Modifications shown in red, based on NN distances
Nearest neighbors used to define boundary limits
spillover
Similar procedures can be employed with point data, using information from nearby points (nearest neighbors); even including them as part of the study area (though outside the original boundary) adaptive boundary.
Alternative indexing of nearest-neighbor distance
An alternative distance-based approach involves the calculation of the RNN* index, defined as follows:
RNN* n 2 = ( ) { [ (di,NN) ] / n } i=1
where di,NN are the nearest-neighbor distances for points i=1,..n; = 3.14159; and = (n / area). Note: This index is most applicable when n is very large (say for n > 100) and when point density is relatively high.
Test of significance for the RNN* statistic: Under the null hypothesis of randomness in the point pattern, it can be shown that the statistic = (2 RNN*) is distributed (roughly) as 2 with (2) degrees of freedom. Theoretical Limiting values are as follows: RNN* = 0 when there is a perfect clustering of points; RNN* = 1 for a random point pattern; and RNN* = 3.63 for a maximally spaced/maximally dispersed point pattern
In our example, it can be shown that RNN* n 2 = ( ) { [ (di,NN) ] / n } i=1 = (3.14159 .0555) { [145] / 8 }
= 3.16
...a value that indicates that the index is tending toward the maximally dispersed case value of 3.63. Note: (23.16) = 6.32 > 5.99 (2 critical) at 95% confidence
Independence Assumption The suggested tests of significance for the nearest-neighbor statistics depend critically on the assumption that the n nearest-neighbor distances are "independent observations". If this assumption cannot be sustained, the tests are no longer valid. This proviso should not be taken lightly. Solution: To validate a nearest neighbor analysis, a sample of points should be used instead of examining all points in the study area the population.
Recommended strategy Hence, one could either (a) measure nearest-neighbor distances for a sample on n randomly selected points out of the population of N points located within the study area; or
(b) analyze nearest-neighbor distances for successive points on or along randomly selected transects that cuts across the study area.
Extensions of the Nearest-Neighbor Method-Higher-order Nearest Neighbors
Thus far, we have looked at distance-based analyses that consider a distance to a point's nearest-neighbor using a "first-order" model. This approach is a special case of a more generalized framework that can be used to evaluate point patterns relationships for higher-order models the inspection of second-nearest neighbors, third-nearest neighbors, so on and so forth.
Consider the ordered neighbors (for an order k=1...K, and K=5,) for a point i found within a portion of a study area: Ordered NN distances may be defined as follows: di1 < di2 < di3 < ... < diK ; 3 1
i
where dik represents the distance from a given point i to its k-th nearest neighbor, from order k=1 to K.
2 4
5
We may index whether the distance to points' k-th nearest neighbors departs dramatically from random expectation (i.e., , a value of 1.0). One may use the generalized version of the Dacey, Skellam, and Moore statistic:
n
RNN*(k) = ( ) { [ (di,k) ] / n }.
i=1
2
...although a useable significance test for this statistic does not exist in this form. Nevertheless, numerical simulations can provide approximations of variance estimates if needed.
The k-th order Nearest Neighbor (NN) Index A preferred statistical approach would involve the use of the k-order nearest neighbor index for a point pattern with n points. First, we would compute the mean observed k-th order nearest-neighbor distance (for a given k):
n
d(NN)k, obs. = [ di,k ] / n .
i=1
Next, we would need to find the expected value of the k-th order mean nearest-neighbor distance under a random process.
Under a random process of allocating n points to a study region whose area is A, the mean k-th order NN distance (mean random distance) may be defined as k (2k)! d(NN)k, R = (2 k!) (n/A)
k 2
where k is the predetermined order and the symbol "!" refers to the factorial of the associated integer values.
The k-th order NN index may now be defined as
RNN(k) = d(NN)k,obs. / d(NN)k,R
Note: A good significance test for the k-th nearest neighbor index does not exist due to the non-independence of points at orders k>1. Nonetheless, the index is useful for evaluating the overall spatial point distribution and can give a picture of how clustered a point distribution is at different k values or spatial lags.
Caveat While there are no restrictions governing the exact number of nearest neighbors (spatial lags) that can be calculated, it is known that the average mean NN distance will increase as k increases.
Thus, there is a potential for "bias" from edge effects that will increase with increasing values of k. It is suggested that no more than 50 nearest neighbors (maximum) be examined, even if n is excessively large.
Linear Nearest-Neighbor Analysis A variation of NN index, applied to a street or road network. Distances calculations are based on "Manhattan" (metric) distances... where travel is assumed to take place along a superimposed grid and is based on minimum grid-based travel distances between points.
While NN routines calculated expected distances between neighbors in a random distribution of n points using the geographic area of the study region, the linear nearestneighboring routine uses the total length of the street or road network overlaid onto the point pattern (one which allows a connection via the road network from each point every to other point in the network based on shortest travel paths/routes).
Shortest path along the grid (road network) highlighted in red. Point i's nearest neighbor is p, despite the fact that point q is closer in terms of `as-the-crow-flies' distance.. p
i
q
...working under the assumption that the distance from each point to the grid (a road) and nearest neighbor distance is minimized throughout the network.
First, the observed mean nearest-neighbor distance is calculated from the minimum grid-based linear distances (Ld)i(min) from each point (i) to its nearest neighboring point:
n
Ld(NN)obs.. = [ Ldi(min) ] / n .
i=1
Next, assuming a random point process in a study region with a spatial distribution of n points and a street/road network of total length L, the mean linear NN distance is known to be
Ld(NN)R = 0.5
L n-1
The LNN index may now be defined as
RLNN = Ld(NN)obs. / Ld(NN)R Note that the use of Manhattan distances will allow for a test of significance based on the Central Limit theorem.
Since the theoretical standard error for RLNN is not known, an approximate standard deviation (s) for the observed linear nearest-neighbor distance may be defined as
n
i=1
[ Ldi(min) - Ld(NN)obs. ] sLd(NN) =
(n - 1)
2
... and a standard error (se) calculated from seLd(NN) = (sLd(NN) / n ), when the number of points n is large when n > 40.
Since the empirical (estimated) standard deviation is being used instead of a theoretical value, the appropriate test statistic is t, where
t = [ Ld(NN) - Ld(NN)R ] / seLd(NN)
...with (n-2) degrees of freedom.
Hypothesis Test Ho: random point pattern If | t | > | t/2 |, then we must "reject" the null hypothesis of a random point pattern at the (1-) x 100% level of confidence (for two-tailed applications) non-random point pattern.
Hypothesis Test Ho: random point pattern -- continued If t > t , then we must "reject" the null hypothesis at the (1-) x 100% level of confidence (one-tailed application) non-random point pattern that exhibits a tendency to be significantly more dispersed than a random pattern. If t < -t , then we must "reject" the null hypothesis at the (1-) x 100% level of confidence (one-tailed application) non-random point pattern that exhibits a tendency to be significantly more clustered than a random pattern.
Sectoral (Regional) nearest-neighbor method A technique in which a circle is superimposed over a series of m reference points in a study region
The circle is divided up into k equal sized sectors or wedges (typically k=4, 6, or 8... for starters)
Sectoral (Regional) nearest-neighbor method... continued Distances between the reference point (i) and its first nearest neighbor in each sector j is measured and recorded for m reference points (i=1,...m). Note: The total number of points being evaluated n* = (mk) > 40.. or better yet >> 40. Typically, the m reference points are randomly chosen (although "centralized" reference points are preferred to "peripheral" reference points to sidestep issues of potential bias due to boundary or edge effects).
k=8 sectors (=45) i reference point VII Potential pool of m reference points
VIII
I
II
i
VI
III
V
IV
The mean NN distance is evaluated (as before) with respect to n*= (mk) total points; thus yielding a global NN measure. Alternatively, individual sector-specific mean NN distances may be found by evaluating the distances to nearest neighbors for m randomly selected reference points contained within each sector or wedge (for a circle drawn around the most central point in the study area); thus yielding k separate sectoral NN measures. The sectoral NN method may be used to test if directional differences are observed in the spatial distributions of points as they emanate from the center of the study area.
The global mean NN measure or individual sectoral mean NN distances are then tested to see if the observed mean NN distance is significantly different from that which would be expected under a random process assuming a simple (first order) Poisson process with as-the-crow-flies distances or Manhattan metric distances.
Reflexive nearest-neighbor methods In a random point pattern generated by a Poisson process, there are situations in which an excessively high proportion of points (say .500 or higher) occur as "reflexive pairs", suggesting that individual points/observations occur as highly dispersed or fairly evenly spaced `couples' clusters of two points that are widely dispersed over the study area.
Point pattern in which points are showing many reflexive nearest neighbors (first order k=1), where for a high proportion of points: di(min) = dj(min) = dij
distance to NN from i distance to NN from j distance from i to j
..making points i and j a "reflexive pair".
order process k=1 clustering k>1 random or dispersed pattern
Edge effects can bias the nearest-neighbor calculations. The inclusion of hard boundaries (a non-malleable study area) can exaggerate nearest-neighbor distances. This is especially for k-th ordered indices, when k become large... resulting in "overestimation" of the observed mean nearest neighbor distance suggesting that the actual distribution of points may be more clustered than the NN index is indicating. Result of edge effects biased indices will potential lean toward describing a point pattern as more dispersed than it actually is.
Correcting Edge/Boundary Effects: Four strategies... (1) Assume that a hypothetical point lies just outside the outer boundary of the study area for all "peripheral points" points that are closer to the border or boundary than to their observed and measured nearest neighboring points to the interior of the study region. If the computed distance to the boundary of the study region is less than the distance to an observed point, the distance to the hypothetical point that lies just outside the boundary is recorded.
Edge-correction dummy points (highlighted in red) for k=1
Strategy #1
Note: The hypothetical or dummy point serves as a proxy for a nearest neighbor. This edge-correction strategy has the effect of reducing the mean nearest neighbor distance (underestimating the true mean NN distance). The true value for the mean NN distance at order/spatial lag k lies somewhere between the "unrestricted" observed (measured mean NN distance) and the "edge-corrected" mean NN distance.
(2) Instead of specifying or assuming that the study area is a square or rectangle, redefine the study region as a circle (i.e., if the shape of the study area will allow), working under the assumption that hypothetical points now exist just outside the circle's edge; and are evaluated/treated using the logic employed in (1).
Depending on the size of the chosen radius r, this may cut out some of the peripheral observations. Nearest neighboring distances to those points can now be used providing real (observed) distance information, as opposed to information obtained from proxies.
Edge-correction "dummy points" (highlighted in red) for k=1
Real edge-correctors (highlighted in brown)
r
lost observations
Strategy #2
(3) Assume that a hypothetical point patter/distribution lies just outside the study area boundary, and assume it is the mirror image of the internal point pattern. Once again, these exterior points are evaluated/treated using the logic employed in (1) when found to be a NN.
Observed points near edge
Mirror image (hypothetical points, not observed) first-order nearest neighbors
boundary/edge
(4) Mathematical corrections modifying the data at peripheral locations of the study area to correct for potential biases that may occur due to the positioning of points about the boundary or edge. More on this later.
"Global" Measures of Spatial Association and Clustering (Part 2)
...for polygon & point count data
as defined for k-nearest neighbors
6. Cuzick-Edwards Test (for case-control data) 7. Global Quadrat Test (for case-control data) 8. Modified Cuzick-Edwards Test... as an extension of 6.
6. The Cuzick-Edwards Test (1990) is a global or general test for clustering for use with case-control data.
The test statistic is simple a count of the k-nearest neighbors of a case that are also cases, summed over all n data points observed in a study region:
n n
Tk = wij i j ,
i=1 j=1
where i (or j) is equal to 1 if location (point) i (or j) is a case, and is equal to zero if location (point) i (or j) is a control. The term wij is equal to 1 if j is a k-nearest neighbor of i, and equal to zero otherwise, where the order k is predefined.
Note: The choice of k determines the "spatial scale" at which the analysis is carried out (for k=1,2,...,m nearest neighbors).
Null hypothesis Ho: the set of all cases and controls has been assigned to the set of all locations "randomly"...
resulting in a random spatial pattern (an outcome known as random labeling), and the "expected value" of Tk is thus defined as E[Tk ] = [k n0(n0 1)] / (n 1) ,
where n0 is the number of cases and for n points or locations, n1 is the number of controls; and n is the total number of observed cases and controls, and n = n0 + n1.
The "variance" is estimated using the following equation:
V[Tk] = k1 + k2 k3 ,
with k1 = (kn + Ns) [p1(1 p1)]; k2 = {[(3k2 k) n] + Nt 2Ns} (p2 p12); k3 = {[k2(n2 2n)] + Ns Nt} (p12 p3);
where pj is
j
pj = (n0 i / n i) ; and ...
i=0
...and where Ns is twice the number of pairs of points that are k-th nearest neighbors of each other...
n n
Ns = wij wji ;
i=1 j=1
and Nt is defines as follows:
n n n i=1
Nt = [ wij ] {[ wij ] 1},
j=1 i=1
where wij = 1 represents a nearest-neighboring pair of points {i,j} at a predetermined nearest neighbor order k.
The test statistic
ZT = {Tk E[Tk ]} / V[Tk ]
k
may be used to evaluate the null hypothesis of a random spatial pattern (when sample size is relatively large), as it can be treated as a standard normal variable. Note that the Z-statistic defined above is "asymptotic normal", where normality is achieved only in very large samples. Hence, the test is not recommended for small samples. Note: n >>100 to achieve stability in the sampling distribution for this statistic.... with a fair amount of cases and/or controls np > 30.
7. Global Quadrat Test of clustering is a viable global test for "spatial randomness" (also for applications involving case-control data). The test is inherently based on the spatial distribution and spacing of points. Let C represent the number of observed cases and N be the number of controls. The test starts out be specifying each control location N(i) as the center of a Thiessen polygon; thus, locations inside the polygon are closer to its center than to any other control center. The null hypothesis is that the probability that an observed case falls into any particular polygon is 1/N... in other words, the spatial pattern of observed cases is random borrowing from the property of equal probability of occurrence over space (CSR and maximum entropy).
Let Ci be the number of observed cases or occurrences of a given phenomenon inside polygon i (denoting cases that are closer to control i than any other control). Conditional on the total number of observed cases C, the null hypothesis is that the cases/occurrences are distributed across polygons according to a multinomial distribution. Note that each polygon is assumed to have an expectation of C/N cases under the null (the assumption under maximum uncertainty equal probability). Polygon i control case Ci = 4 , where i=3 4 cases
We can apply a 2 test to compare the observed versus expected values in each polygon:
N
2G = { (Ci C/N)2 / (C/N) }
i=1 N
= (N/C) (Ci C/N)2
i=1 N
= {(N/C) (Ci)2 } C ,
i=1
...as the test statistic 2G will have a 2 distribution with (N 1) degrees of freedom under the null hypothesis; conditional upon the number of cases C. The above statistic is useful in identifying potential "hot spots" or hot zones.
Suppose we have N=15 controls (polygons)... and C=6 cases
polygon i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Ci
0 0 4 2 0 0 0 0 0 0 0 0 0 0 0
expected (C/N) Ci2
.4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 0 0 16 4 0 0 0 0 0 0 0 0 0 0 0 sum = 20
[Ci (C/N)]2 / (C/N)
.04 .04 32.4 6.4 .04 .04 .04 .04 .04 .04 .04 .04 .04 .04 .04 39.32 (traditional 2 )
In our problem, the Global Quadrat Test of Clustering... yields the following chi-square statistic: N 2 = {(N/C) Ci2 } C , i=1 = {(15/6) 20 } 6 = 50.0 4 = 46.0
Test Result at =.01 (at the 99% confidence level): 2 > 2 as 46.0 > 29.14 "Reject Ho: random/non-clustered pattern"; hence, there is statistical evidence that spatial clusters of cases do exist in sub-regions/polygons formed about the controls at =.01.
Probability density
2 distribution; =.01 with N 1 = 14 d.f.
46.0
2 0 Non-rejection region 29.14 Rejection region
Requirements and Limitations
The chi-square approximation under the null will only be realistic when N > 3 and C2/N > 1.0, and where expectations (C/N) are greater than or equal to 0.25. Of particular interest here are high values of 2 caused by a high number of cases in a subset of polygons formed around the controls. {high values "hot spots"} Potential limitation of this approach is that the requirement that the cases be centered about the controls rather than the cases. This could result in a spatial cluster of cases being split up evenly or unnaturally (across the conjoined or adjacent polygons); with individual cases in the cluster assigned to different controls, thereby weakening the power of the statistic to detect a cluster.
Control
Observed Case
Consider the following spatial distribution of observed cases and controls with a noticeable cluster.
Control
Observed Case
The Thiessen polygons formed about the controls in the highlighted circle tend to break up the cluster of observed cases unnaturally... and the maximum number of clustered observations is reduced from 19 total to 4 per polygon within the shaded region.
If the global quadrat statistic 2G is significant and the null hypothesis is rejected (or there are unnatural splits in the assignment of cases in a spatial cluster), then it will often be of interest to look at "local test statistics".
For example, one may evaluate the local Z-score for an i-th ploygon:
Zi = {Ci 0.5 C/N} / (C/N) ,
where the local Zi stats include a "correction" for continuity. This approach is only recommended when the expected number of cases in a given polygon of cell is large (say > 30 or more)... and is not intended for small/smaller samples.
In the case of Polygon i=3, with 4 out of 6 cases
Polygon i
control case
Zi = {Ci 0.5 C/N} / (C/N)
= { 4 0.5 0.4 } / (.4) = {3.1} / .6324 = 4.90 ... though not applicable here, as this result is for a small sample size.
Reversing the analysis recognition of "cool spots" We may also redo the analysis by generating Thiessen polygons around the C case locations and examine the distribution of the N controls across those polygons. Note that this approach adopts the perspective that the case locations are fixed, and interest lies in lies in evaluating how unusual the selection of a control is (somewhat counterintuitive)... yet under the null hypothesis, we are testing that there is no difference in the spatial distributions of cases and controls (which is viable).
Let Ni be the number of controls that are closer to case i than to any other case, and the chi-square statistic may be re-stated as
C
2G* = { (Ni N/C)2 / (N/C) }
i=1 N
= (C/N) (Ni N/C)2 ,
i=1
...a technique that may be useful in identifying locations that are "cool spots" or cool zones where there are significantly more controls near a particular observed case than expected under the null hypothesis.
8. The Modified Cuzick-Edwards Test is a distance-based extension of the original Cuzick-Edwards Test. Rather than rely on k-nearest neighbors to define the number and orientation of case-case pairs, we may wish to analyze the number of case-case pairs that are within a given distance (say h) of one another hence, we calculate a distance-constrained Cuzick-Edwards tests... computing Z-scores as before -- by subtracting the expectation of the number of case-case pairs from the test statistics and then dividing by the standard deviation.
As an alternative, one could employ a K-functions approach, defining the observed number of "case pairs" that lie within distance h of one another as ^ CCobs = [N02 K00(h) ] / R , where R is the size of the study area, N0 is the number of observed cases, and the hat value for K00 is the estimated K-function using the case locations; and ^ K00(h) = [ R / N02] Iij(h) ,
i=1 j=1 ji n n
where Iij(h) = 1 if case i and j are separated by a distance less than h, and Iij(h) = 0 if otherwise.
The "expected" count of case pairs at distance h may be found using CCexp = N1 {N2 + N3} where N1 = [(N0 1)/(N0 + N1 1)] ; N2 = [(N02 K00) / R] ; N3 = [(N0 N1 K01) / R]; N1 is the number of controls... and... N2 number of case locations within a distance h of each case (summed over all cases); N3 number of control locations within a distance h of each case (summed over all cases); and...
...K01 is the "cross K-function", defined as ^ K01(h) = [R / N0N1 ] Iij(h)c ,
i=1 j=1
n n
where Iij(h)c = 1 if control j and case i are separated by a distance less than h, and Iij(h)c = 0 otherwise.
The Null Hypothesis H0: Random Labeling (i.e., random case distribution).
The variance of the statistic is approximately equal to the expectation (the mean)... a fact that has been confirmed by simulation experiments. The Z-statistic is used as an approximation (and the test statistic is distributed as standard normal); where Z = [CCobs CCexp] / CCexp . This test is reasonable for relatively small values of h... and can essentially be run over a range of h values to examine the sensitivity of the test statistic to various distances.
Reference for this section: Rogerson and Yamada (2009)
Note: You might want to consider downloading the latest version of Crimestat or SatScan (freeware)... to assist you in performing your point-pattern analysis. These packages offer an array of tools and methods, as well as some sample data bases and tutorials that you should find very useful.
Feel free to use the software and available databases to construct labs and experiment.
http://www.icpsr.umich.edu/CrimeStat/
Link http://www.icpsr.umich.edu/CrimeStat/
SaTScanTM is a free software that analyzes spatial, temporal and space-time data using the spatial, temporal, or space-time scan statistics. It is designed for any of the following interrelated purposes: 1. Perform geographical surveillance of disease, to detect spatial or spacetime disease clusters, and to see if they are statistically significant. 2. Test whether a disease is randomly distributed over space, over time or over space and time. 3. Evaluate the statistical significance of disease cluster alarms. 4. Perform repeated time-periodic disease surveillance for early detection of disease outbreaks. 5. The software may also be used for similar problems in other fields such as archaeology, astronomy, botany, criminology, ecology, economics, engineering, forestry, genetics, geography, geology, history, neurology or zoology. Go to www.satscan.org for more information.
Link http://www.satscan.org/
Analysis of Pattern Measures
at the Local/Regional Scale
9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Local Moran's I The Score Statistic Also Getis G-statistic Tango's CF (which we will review later) 2 Cumulative test Maximum 2 test Local Quadrat Test Fuch's & Kennett's M Test(s) Diggle & Rowlingson's Maximum Likelihood Approach Kernel Density Estimators/Tests Besag & Newell's Test Kulldorff's Likelihood Ratio Test Rogerson's Method(s)
Local Statistics for analyzing spatial patterns and clusters 9. Local Moran used to determine if there is evidence of "local spatial autocorrelation" for a given variable (y: yi, i=1,..., n sample observations) around a specific sub-region, neighborhood, or locality. Local Moran's I may be defined (for a given i-th sub-region) as
n j=1 n j=1
Ii = [n (yi y ) / (yj y )2 ] wij(yj y ) , where wij are the typical elements of the spatial weights matrix W (nxn). Note that the sum of the local Moran values obtained for all sub-regions in the study region is equal to the "global Moran" coefficient multiplied by the sum of the spatial weights (wij). Anselin (1995)
Data Requirements
(a) Valued Point data for n points/locations; or (b) Raster/Grid cell or Polygon data i. Measured at interval or ratio scale ii. Count data or rate data (raw, smoothed, or adjusted) ...for n cells/polygons; and
(c) Specification of Spatial Weights/Connectivity Matrix Binary or Standardized Inverse Distance Weighting Shared Boundary (Rook's, Queen's case joins) Thiessen polygons, Delaunay Tessellation Gabriel graph k-th Nearest Neighbor
Note: Large Sample size preferred (N>100)
Output: Local Moran's i for each polygon, Z-score, and p-value
Global Moran's I a general test for global spatial autocorrelation (SA) Local Moran's Ii i=1,..., n tests for local SA
...for a given data set and pre-specified W (nxn) spatial weights/connectivity matrix.
Global Moran's I is an aggregate coefficient composed of the scaled sum of the Local Moran statistics. More formally,
n n i=1 j=1 n i=1
I = wij Ii .
As defined by Anselin (1995), the expected value of the local Moran coefficient is
n
E[Ii ] = wi / (n 1), where wi = wij ,
j=1
and the variance of Ii under the "randomization" hypothesis may be expressed as V[Ii] = A + B, where A = [wi(2) (n b2 )]/(n 1); and B = [2wi(kh)(2b2 n)/(n 1)(n 2)] wi2/(n 1)2 , where
n n n = wik wih, b2 = n (yi y)4/[(yi y)2 ]2 k=1 h=1 i=1 i=1 ki hi n n
wi(2) = wij2, wi(kh)
j=1 ji
Caveat. While the test of significance is typically carried out under the assumption that the test statistic has a normal distribution for the null hypothesis of no (zero) spatial autocorrelation (SA). Anselin has concluded that the normal distribution may be inappropriate to approximate the distribution of Ii, and will lead to possible over-rejection of the null when used with small samples (n < 100). This concern, however, would be less as sample size increases or as the number of neighbors increases or both. Nevertheless, Z-scores of Local Moran coefficients are widely used as a way to detect hot spots or spatial clusters... and Zi = {Ii E[Ii]} / V[Ii].
Suppose we are testing for hot spots or clusters of like values at the 90% confidence level...under the null hypothesis H0 : random spatial pattern, and N is very large > 100. If Z > +1.645 (or p < .10) Reject null hypothesis -- clustering of "like values" or positive spatial autocorrelation (high-high value cluster or low-low value cluster) If -1.645 < Z < 1.645 (or p > .10) Fail to Reject null hypothesis -- local pattern appears to be random (zero SA) If Z < -1.645 (or p < .10) Reject null hypothesis low (high) value surrounded by high (low) valued neighbors or a possible local outlier / extreme observation / negative SA
One interpretation (at 95% confidence) of positive spatial autocorrelation: High-high like value clusters Z > +1.96 hot spots Low-low like value clusters Z > +1.96 cool zones
Problem: Identifying the correct sampling distribution for Local Moran's I is difficult... and requires computer assistance. Solution: To find the sampling distribution of Ii, for a given sample size n, one must engage in a Monte Carlo simulation exercise... using a conditional randomization approach. Anselin suggests holding the observation value yi at subregion i fixed, while the remaining (n 1) values are randomly permuted over the other sub-regions (rearranging the values in sub-regions other than i) to recover a very large number of estimated Local Moran coefficients Ii . The resulting distribution of values can then be used to obtain a "pseudo-sampling distribution" (and variance estimate), as well as a pseudo-significance level. [Note that this can be done regardless of the presence or absence of global SA.]
Consider the pattern of the variable y (point counts) for the 6 sub-regions shown below. Is there statistical evidence of local spatial autocorrelation for region 6? 1 2
14
3
12
4 5
11
25
6
28
30
Suspected hot spot
For region i=6, the Local Moran coefficient may be written as
n j=1 n j=1
I6 = [n (y6 y ) / (yj y )2 ] w6j(yj y ) ,
= [6 (10) / 370] 13 = 2.108
n
...as n=6, y = 20, and (yj y )2 = 370.
j=1
The expected value and variance of the Local Moran coefficient for this problem are as follows: E[Ii=6 ] = 0.4; and V[Ii=6 ] = 1.04, respectively. The Z-score (used only as a "benchmark" given its known limitations when used on small samples...) is Z6 = [2.108 (0.4) ] / 1.04 = 2.459 ...a value suggesting local positive spatial autocorrelation about or around sub-region 6 (as Z6 > +1.96 at the 95% confidence level... again noting the limitations of this test for small samples). Implications hot spot.
Output from the statistical software TerraSeer
Local Moran's I for Lung Cancer rates
Source: Tony Blanchard and Wes James Social Science Research Center Mississippi State University, CDC data
Local Moran coefficient clusters - Natural Hazard Morality Rates (1970-2004) for k=11 different categories of natural hazards. Source: Int J Health Geogr, 2008 Borden and Cutter.
Spatial patterns of Economic Change (Orlando, FL)
N = 328 Census Tracts
CBD (Sarzynski et al., 2006)
8,000 jobs / mi2
Old Central City (Inner City)
1950 City of Orlando boundary
Annexed Central City Inner Suburbs (Lee and Leigh, 2005)
50% of Housing built 50~69
Suburban (Theobald, 2001)
60 DW / mi2
Rural
Source: J. Kim (2011)
Clustering of like change:
Choropleth map of Z-scores for Moran's I (change in number of people below the poverty line), Orlando MSA (2000-2010)
Local Moran's I stats (Z's) -Queen's case connectivity (shared boundary)
Hot spots (mostly high-high clusters) shown in orange/red Note: mostly rural/hinterland, outer suburban areas, with one small hot spot in the old central/ inner city area. (J. Kim, 2011 Student Projects)
Global Moran's I for change in number of people living below the Poverty Level (2000-2010); with Evidence of positive Spatial Autocorrelation using various connectivity structures.
Moran's I Results Queen Rook IDW 2
Delaunay Tri.
NN(2)
NN(5)
NN(8)
Moran's Index
Z Score P-value
0.0996
3.3680 0.00076
0.0959
2.9495 0.00318
0.0550
3.5573 0.00038
0.0794
2.6473 0.00811
0.0779
1.6680 0.0953
0.0717
2.3724 0.01767
0.0821
3.4167 0.00063
Global versus Local Measure The global measure suggests that changes in the number of people in poverty (per census tract) is a positively auto-correlated pattern/process (as implied by the table above), whereas the local measures as mapped on previous slide indicate hot spots of like clustered values.
Local Moran's I values for breast cancer counts (EBS smoothed data), N=351 towns in the state of Massachusetts
High-high cluster Low-low cluster Low neighborhood High neighborhood Not Significant EBS (smoothed) Breast Cancer Rates (1997-2003), for the state of Massachusetts (GeoDa).
Case-control studies of breast cancer risk in Massachusetts` Upper Cape Cod region (Vieira et al, Paulu et al 2008) showed spatial clustering of risk based on residential location.
Two hot spots identified
Link below will give you more information about the Breast Cancer study.
http://gis.uml.edu/mediawiki/index.php/Breast_Cancer_Risk_Project
Lab #3. For a spatial data set of your choice (valued point data or polygon data, for i=1,..., n observations) compute and interpret the following: (a) Global Moran statistics (for k=1,...m NN); and (b) Local Moran's I (n of them)
...using alternative weighting schemes {wij}.
Test to see if there are any "hot spots" (or "cool zones"); that is, neighborhoods or clusters of like high (low) values.
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
University of Florida - GEO - 6938
Analysis of Pattern Measuresat the Local/Regional Scale9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.Local Moran's I The Score Statistic Also Getis G-statistic Tango's CF (which we will review later) 2 Cumulative test Maximum 2 test Local Quadrat Test
University of Florida - GEO - 6938
Identification of Local Clusters for Count Data: A Model-Based Moran's I TestTonglin Zhang and Ge LinPurdue University and West Virginia University February 14, 2007Department of Statistics, Purdue University, 250 North University Street,West Lafayette
University of Florida - GEO - 6938
Chapter 4 Modelling Counts - The Poisson and Negative Binomial RegressionIn this chapter, we discuss methods that model counts. In a longitudinal setting, these counts typically result from the collapsing repeated binary events on subjects measured over
University of Florida - GEO - 6938
Notes on the Negative Binomial DistributionJohn D. Cook October 28, 2009Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between the negative binomial distribution and the binomia
University of Florida - GEO - 6938
On Model Fitting Procedures for Inhomogeneous Neyman-Scott ProcessesYongtao GuanJuly 31, 2006ABSTRACTIn this paper we study computationally efficient procedures to estimate the second-order parameters for a class of inhomogeneous Neyman-Scott processe
University of Florida - GEO - 6938
Spatial AutocorrelationGeography 683 - Introduction to Geographic AnalysisSpatial AutocorrelationGuoxiang Ding Department of Geography1155 Derby Hall Phone: 292-2704 Email: ding.45@osu.edu First law of geography: "everything is related to everything
University of Florida - GEO - 6938
Overdispersion and Poisson RegressionRichard Berk John MacDonald Department of Statistics Department of Criminology University of Pennsylvania November 19, 2007Abstract This article discusses the use of regression models for count data. A claim is often
University of Florida - GEO - 6938
Spatial AutocorrelationMoran's I Geary's C Arthur J. Lembo, Jr. Salisbury UniversitySpatial Autocorrelation First law of geography: "everything is related to everything else, but near things are more related than distant things" Waldo Tobler Many geog
University of Florida - GEO - 6938
Analysing spatial point patterns in RAdrian Baddeley CSIRO and University of Western Australia Adrian.Baddeley@csiro.au adrian@maths.uwa.edu.au Workshop Notes Version 3 October 2008 Copyright c CSIRO 2008Abstract This is a detailed set of notes for a wo
University of Florida - GEO - 6938
136Poisson Regression Analysis13. Poisson Regression AnalysisWe have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often encounters situations where the outcome variable is
University of Florida - GEO - 6938
Parametric Test Quadrat AnalysisEquations taken from Rogerson, 2001.i=m i =1s2 = (xs2 xi- x )2m -1 m -1 (VMR - 1) z= 2 m is the number of quadrats, x is the mean of the number of points per quadrat, s2 is the variance of the number of points per
University of Florida - GEO - 6938
AN INTRODUCTION TO QUADRAT ANALYSISR.W.ThomasISSN 0306-6142ISBN 0 902246 66 6 1977 R.W. ThomasCONCEPTS AND TECHNIQUES IN MODERN GEOGRAPHY No. 12CATMOG(Concepts and Techniques in Modern Geography) CATMOG has been created to fill a teaching need in th
University of Florida - GEO - 6938
Rate Transformations and SmoothingLuc Anselin Nancy Lozano Julia KoschinskySpatial Analysis Laboratory Department of Geography University of Illinois, Urbana-Champaign Urbana, IL 61801 http:/sal.uiuc.edu/Revised Version, January 31, 2006Copyright c 20
University of Florida - GEO - 6938
Change Detection Thresholds: Alternative Statistical Approaches to Detecting Temporal Change in Spatial PatternsPeter A. Rogerson Daikwon Han Ikuho Yamada Department of Geography National Center for Geographic Information and Analysis University at Buffa
University of Florida - GEO - 6938
The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report: Document Title: Author(s): Document No.: Date Received: Award Number: Crime Analysis Geographic Information System Services: A
University of Florida - GEO - 6938
Spatial AutocorrelationMorans I Gearys C Arthur J. Lembo, Jr. Salisbury UniversitySpatial Autocorrelation First law of geography: everything is related to everything else, but near things are more related than distant things Waldo Tobler Many geograph
University of Florida - GEO - 6938
SaTScan User GuideTMfor version 8.0By Martin Kulldorff February, 2009 http:/www.satscan.org/ContentsIntroduction . 4 The SaTScan Software . 4 Download and Installation . 5 Test Run . 5 Sample Data Sets. 5 Statistical Methodology.
University of Florida - GEO - 6938
Further Methods for Point Pattern AnalysisBailey and Gatrell Chapter 4Variations in Populationn Certain types of events will exhibit clustering due to heterogeneity in the underlying distribution e.g disease cases or crimes will tend to cluster where t
University of Florida - GEO - 6938
Non-technical Overview of Geospatial Statistical MethodsGIS/Mapping and Census Data Second Annual Census Workshop Series Workshop 3: Spatial Statistics, Spatial Research & Confidential Census DataNew York Census Research Data Center (CRDC) Baruch Colleg
University of Florida - GEO - 6938
Package `spatstat'December 21, 2011Version 1.25-1 Date 2011-12-21 Title Spatial Point Pattern analysis, model-fitting, simulation, tests Author Adrian Baddeley <Adrian.Baddeley@csiro.au> and Rolf Turner <r.turner@auckland.ac.nz> with substantial contrib
University of Florida - GEO - 6938
Andrei Rogers and Norbert G. GomarStatistical inference in Quadrat AnalysisThe growing recognition of the need for establishing a systematic and quantitative means for describing and analyzing, the spatial dispersion of activities in urban areas has gen
University of Florida - GEO - 6938
Biometrical Journal 50 (2008) 1, 4357 DOI: 10.1002/bimj.20061033943Parameter Estimation and Model Selection for Neyman-Scott Point ProcessesUshio Tanaka1, Yosihiko Ogata*, 1, 2, and Dietrich Stoyan31 2 3The Graduate University for Advanced Studies, M
University of Florida - GEO - 4167
Review of Matrix AlgebraMatrices A matrix is a rectangular or square array of values arranged in rows and columns. An m n matrix A, has m rows and n columns, and has a general form of a11 a = 21 . am1 a12 a22 . am 2 . a1n . a2 n . . . amn mn mn1Exa
University of Florida - GEO - 4167
University of Florida - GEO - 4167
Geographically Weighted RegressionA Tutorial on using GWR in ArcGIS 9.3Martin Charlton A Stewart FotheringhamNational Centre for Geocomputation National University of Ireland Maynooth Maynooth, County Kildare, Ireland http:/ncg.nuim.ieThe authors grat
University of Florida - GEO - 4167
GEOGRAPHICALLY WEIGHTED REGRESSIONWHITE PAPERMARTIN CHARLTON A STEWART FOTHERINGHAMNational Centre for Geocomputation National University of Ireland Maynooth Maynooth, Co Kildare, IRELANDMarch 3 2009The authors gratefully acknowledge support from a S
University of Florida - GEO - 4167
Lab#1, Spring 2012 (25 points) GEO 4167/GEO 6161 Intermediate Quantitative Methods (Fik) Name: _ Score: _ Instructions: Complete this lab to the best of your abilities. Attach your work sheets, relevant computer output, results, and write-up to this cover
University of Florida - GEO - 4167
Polynomial regressionDaniel Borcard, Dpartement de sciences biologiques, Universit de Montral Reference: Legendre and Legendre (1998) p. 526A variant form of multiple regression can be used to fit a nonlinear model of an explanatory variable x (or sever
University of Florida - GEO - 4167
Board of the Foundation of the Scandinavian Journal of Statistics 2004. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 31: 515534, 2004Functional Coefficient Regression Mode
University of Florida - GEO - 4167
AN INTRODUCTION TO TREND SURFACE ANALYSISD.UnwinISSN 0305-6142 ISBN 0 902246 51 8 1978 David J. UnwinCONCEPTS AND TECHNIQUES IN MODERN GEOGRAPHY No. 5CATMOG(Concepts and Techniques in Modern Geography) CATMOG has been created to fill a teaching need
University of Florida - GEO - 4167
Intermediate Quantitative MethodsTimothy J. Fik Associate Professor GEO 4167 section #6647 (undergraduate) GEO 6161 section #8377 (graduate)Credit hours: 3Thursdays (periods 2-4): 8:30-11:30AM Location: TUR 3012 SPRING 2012Intermediate Quantitative Me
University of Florida - GEO - 4167
More on the Reliability, Precision, and Performance of the regression model and its estimated parameters. As the least-squares coefficient/parameter estimates ( j's) and the SRF's ability to explain variation in the dependent variable (Y) can vary from sa
University of Florida - GEO - 4167
II. Testing for Multicollinearity When two or more independent variables in a regression model are highly correlated with one another (or collinear), they will contribute "redundant" explanatory information. Hence, not all of those independent variables
University of Florida - GEO - 4167
Recall our recent Reading Assignments. Read and review: (a) the technical appendix in your textbook on Matrix approach to LS regression. Basic Econometrics by D. Gujarati, 2007, 4th edition. and/or (b) the posted Matrix Algebra review and the Matrix Appro
University of Florida - GEO - 4167
Extending Linear Regression: Weighted Least Squares, Heteroskedasticity, Local Polynomial Regression36-350, Data Mining 23 October 2009Contents1 Weighted Least Squares 2 Heteroskedasticity 2.1 Weighted Least Squares as a Solution to Heteroskedasticity
University of Florida - GEO - 4167
OLS Under Heteroskedasticity Testing for HeteroskedasticityHeteroskedasticity and Weighted Least SquaresWalter Sosa-EscuderoEcon 507. Econometric Analysis. Spring 2009April 14, 2009Walter Sosa-EscuderoHeteroskedasticity and Weighted Least SquaresOL
University of Florida - GEO - 4167
Regression Analysis Tutorial183LECTURE / DISCUSSION Weighted Least SquaresEconometrics Laboratory C University of California at Berkeley C 22-26 March 1999Regression Analysis Tutorial184IntroductionIn a regression problem with time series data (whe
University of Florida - GEO - 4167
Intermediate Quantitative MethodsTimothy J. Fik Associate Professor GEO 4167 section #6647 (undergraduate) GEO 6161 section #8377 (graduate)Credit hours: 3Thursdays (periods 2-4): 8:30-11:30AM Location: TUR 3012 SPRING 2012Intermediate Quantitative Me
University of Florida - AST - 1002
UFIDQ1 9.2 8.25 4.5 8 9.85 5.5 9.1 10 7.5 4.5 9.85 6 3.5 7 6.35 10 9 7.5 9.5 5.25 6.75 5 5.75 2.5 5.25 3.25 6.1 7 6.5 9.1 5 3.25 6.5 8.75 9 3.5 10 5 4.1 5.1 4.5 6.7501713653 03291993 03891805 05193165 09669612 11156163 11161338 11314038 11334031 1139879
University of Florida - AST - 1002
1/19/12discoveredinNov2011~600lyfromEarth P=290daysThe1stexoplanetorbiAngwithintheGoldilockzonearoundaSunlikestarReviewonLecture2WhyPtolemy'sEpicycleModelwasagoodtheory? WhyPtolemy'sEpicycleModelwasnotagood theory? Inwhataspect,Kepler'sModelissuperio
University of Florida - AST - 1002
1/19/12ImportantNo/ce1stQuizonJan26(1weekfromtoday) about10~15problems mul/plechoice+T,F+answering +simplemath? itwillcoverChap0.2. Tipsforstudyingthetextbook.Exoplanets51Pegasib*1stexoplanetdiscovered (1995)orbi/ngaSunlikestar CentralStar(51Pegasi)
University of Florida - AST - 1002
1/24/12Observa-onProjectI:Observingthe FullCycleoftheMoonAim:Understandingtherela-vemo-ons betweentheMoon&Sunbyobserving 1)theloca-onoftheMoonintheskyata fixedobserva-on-me 2)thephaseoftheMoon Due:1weekbeforetheFinalExamObserva-onProjectI:Observingthe
University of Florida - AST - 1002
1/27/12ReviewL02L051.BeginningoftheModernAstronomy Aristotle,Ptolemy,Copernicus,Kepler,Newton 2.Exoplanets(examples&generalproperMes) mass,eccentricity,distancefromhoststars,numberofmembers &layout 3.DetecMonMethodsofExoplanets directimagingwithAO radia
University of Florida - AST - 1002
What'supUniverse? TheStrongestSolarFlareIn2012,arewedoomed?UnderstandingOurWorld,SolarSystemChap48kpclyrSolarSystemLayout(1)30AU 100AU 105AULaunchedin1977 V=20,000m/sAsofAug2006OortCloudisahypothePcalshellwhichiscomposedofnumerous cometlikebodie
University of Florida - AST - 1002
EarthMoonSystemPlanetEarthP=365days d=1AU =5,500kg/m3 6,387kmMoon=3,300kg/m3 1,738kmChap5StudyingEarth:LandscapesStudyingEarth:OverallStructure6mainlayersofEarth1)MetallicCores(ironcore) 2)Mantle(Silicatemantle) 3)Crust 4)Atmosphere 5)Trophospher
University of Florida - AST - 1002
2/10/12Reminder!ObservingProjectsReviewonLecture78TextbookChap45Keyconcepts Q.UnderstandingtheoverallproperKesoftheSolarSystem Layout,Orbits,ChemicalcomposiKon.etc Q.Understandingthebasicfeaturesofthenebulartheoryof SolarSystemformaKon Q.Understanding
University of Florida - AST - 1002
MarsRoverMissionsareincredibly cheapandefficient!Whathappenedsince1990?ReviewonLecture691.Exoplanets:DetecEonMethods(Chap4) GravitaEonalLensing AstrometricMeasurement(angulardistance) TransitMethod 2.SolarSystem(Chap48) GeneralProperEes(members,orbits,
University of Florida - AST - 1002
1/12/12Coursewebsite:www.astro.ufl.edu/~sczoo LecturenotewillbeuploadedonFriday ReviewsessionwillbegivenbeforeQuiz&Exam CDisnotarequirement.ItisopIonal!Whereourstorybegins.?11/12/12NabtaPlayaStoneCircle AncientEgypt~5000B.C.TheGreatGizaPyramids (Pha
University of Florida - MAR - 3053
Enjoy! Hedonic Consumption and Compliance with Assertive MessagesANN KRONROD AMIR GRINSTEIN LUC WATHIEUThis paper examines the persuasiveness of assertive language (as in Nike's slogan "Just do it") as compared to nonassertive language (as in Microsoft'
University of Florida - MAR - 3053
Exam #1 Review Sheet MAR 3503 Consumer Behavior Spring 2012 These questions should help you organize your thoughts and prepare for the exam. The questions on these pages are, in general, much broader than the questions you'll find on the exam. This means
University of Florida - MAR - 3053
A Stranger's Touch: Effects of Accidental Interpersonal Touch on Consumer Evaluations and Shopping TimeBRETT A. S. MARTINThis article examines an unexplored area of consumer research-the effect of accidental interpersonal touch (AIT) from a stranger on
University of Florida - MAR - 3053
Some notes on reading and evaluating behavioral research Courtesy Lyle Brenner Different papers have different approaches and goals, so not all of the considerations below will necessarily apply. But here are some questions to ponder when reading an empir
University of Florida - MAR - 3053
Plate Size and Color Suggestibility: The Delboeuf Illusion's Bias on Serving and Eating BehaviorKOERT VAN ITTERSUM BRIAN WANSINKDespite the challenged contention that consumers serve more onto larger dinnerware, it remains unclear what would cause this
University of Florida - MAR - 3053
1Copyright Journal of Consumer Research 2011 Preprint (not copyedited or formatted) Please use DOI when citing or quotingThe Presenter's ParadoxKIMBERLEE WEAVER STEPHEN M. GARCIA NORBERT SCHWARZ Author Note Kimberlee Weaver (kdweaver@vt.edu) is an Assi
University of Florida - MAR - 3053
Nostalgia: The Gift That Keeps on GivingXINYUE ZHOU TIM WILDSCHUT CONSTANTINE SEDIKIDES KAN SHI CONG FENGNostalgia, a sentimental longing for a personally experienced and valued past, is a social emotion. It refers to significant others in the context o
University of Florida - CNT - 6107
CNT 6107 Advanced Computer Networks, Spring 2012 Assignment 1given by Jonathan C.L. Liu Out: Feb. 01 (Wednesday), 2012 Due: Beginning of the lecture on Feb. 08 (Wednesday), 2012 The problem sets form an important part of the learning in this course. Thus
University of Florida - CNT - 6107
CNT 6107: A Quick Background ReviewJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Uses of Computer Networks BusinessApplications Home Applications Mobile Users Social Issues New Emer
University of Florida - CNT - 6107
Chapter 3 Quick Review on Data Link LayerJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Functions of the Data Link Layer Provide service interface to the network layer Dealing with tr
University of Florida - CNT - 6107
Chapter 3-4 Quick Review on Data Link Layer Part 2Jonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Selected wireless link standards54 Mbps 5-11 Mbps 1 Mbps802.11cfw_a,g 802.11b802.15
University of Florida - CNT - 6107
Chapter 5 Network LayerJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida12Store-and-Forward Packet SwitchingThe environment of the network fig 5-1 layer protocols.3Implementation of C
University of Florida - CNT - 6107
CNT 6107 ACN, Spring 2012 A Survey on the Student Backgroundgiven by Jonathan C.L. Liu Out: Jan. 13 (Friday), 2012 Due: Beginning of the lecture on Jan. 18 (Wednesday), 2012 This is an anonymous survey in order to let the instructor determine the proper