18 Pages

tr

Course: TOMOS 1903, Fall 1920
School: Maryland
Rating:
 
 
 
 
 

Word Count: 5149

Document Preview

Structure Exploiting of Symmetric or Triangular Matrices on a GPU Jin Hyuk Jung Dianne P. OLeary January 2008 Abstract Matrix computations are expensive, and GPUs have the potential to deliver results at reduced cost by exploiting parallel computation. We focus on dense matrices of the form AD2 AT , where A is an m n matrix (m n) and D is an n n diagonal matrix. Many important numerical problems require...

Register Now

Unformatted Document Excerpt

Coursehero >> Maryland >> Maryland >> TOMOS 1903

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Structure Exploiting of Symmetric or Triangular Matrices on a GPU Jin Hyuk Jung Dianne P. OLeary January 2008 Abstract Matrix computations are expensive, and GPUs have the potential to deliver results at reduced cost by exploiting parallel computation. We focus on dense matrices of the form AD2 AT , where A is an m n matrix (m n) and D is an n n diagonal matrix. Many important numerical problems require solving linear systems of equations involving matrices of this form. These problems include normal equations approaches to solving linear least squares and weighted linear least squares problems, and interior point algorithms for linear and nonlinear programming problems. We develop in this work ecient GPU algorithms for forming and factoring AD2 AT by exploiting the triangular rastorization capabilities of the GPU. This report summarizes work from 2005 to 2007 and was supported in part by the US Department of Energy under Grant DEFG0204ER25655 and by the National Science Foundation under Grant CCF 05 14213. Keywords: GPGPU, general purpose graphics processing units, symmetric matrix, triangular matrix, rectangular packed format, matrix computation, factorization, decomposition, weighted least squares. Department of Computer Science, University of Maryland, College Park, MD 20742. jjung@cs.umd.edu, salbang+csr@gmail.com Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742. oleary@cs.umd.edu 1 1 Introduction Given time series data, b0 , . . . , bn1 measured at times t0 , . . . , tn1 , we may want to t a model to the data in order to reduce the eects of noise. We choose a set of basis functions 0 (t), . . . , m1 (t) that can describe the behavior of the system and build a model m1 u(t) = j=0 xj j (t). (1) To nd the coecients x0 , . . . , xm1 , we solve a weighted least squares problem involving a matrix A, with entries computed by evaluating the basis functions at the measured times, and a diagonal matrix D, with entries determined by the uncertainties in the measurements. We solve the problem by forming the matrix AD2 AT and then computing a factorization of this matrix [5]. The cost is proportional to nm2 . In data tting problems, m may be on the order of tens or hundreds, but n can be quite large. Many other important computational problems also involve forming and factoring a matrix of the form AD2 AT . For example, interior point algorithms for linear and nonlinear programming solve a sequence of weighted least squares problems [12, 10]. In these applications, m and n can be thousands or more. Because of the expense of solving these problems, the massive parallelism of the GPU architecture is very attractive [8, 3, 1, 9], and we develop in this work ecient GPU algorithms for forming and factoring AD2 AT . In section 2, we briey review the weighted least squares problem and how it is solved. Since the matrix AD2 AT is symmetric, only its lower triangular part is needed, and this reduces computation by 50%. We discuss in section 3 how the matrix can be assembled on a GPU using triangular rasterization. Then we present a GPU algorithm based on a rectangular packed storage form. For packed storage, we store a lower triangular matrix by moving the submatrix at the bottom right, rotated by 180 degrees, to the unused upper right corner of the array, thus reducing the required storage. Section 4 explains how factorization can be performed in either full or packed storage. Results in section 5 demonstrate that by rasterizing two triangles simultaneously, we can assemble and factor the packed matrix as fast as the non-packed matrix. Section 6 gives our conclusions and implications for design of GPU languages. For consistency with GPU notation, we number elements in matrices and vectors starting with 0 instead of 1. 2 2 Weighted Least Squares Given the data (tk , bk ) (k = 0, . . . , n1), and the model (1), we want the model to match the observed data as well as possible: m1 bk j=0 xj j (tk ). To nd good coecients, we could think of summing the squares of the dierences between the values bk and the model prediction. Let A be the m n matrix with entries ajk = j (tk ). If we form a vector b from the values bk and a vector x from the coecients xj , then this sum of squares can be written as n1 b k m1 2 xj j (tk ) b AT x 2 . k=0 j=0 The terms in the summation should be weighted by how sure we are about the data; we multiply the k-th term by d2 , where we assume that the error in k the observation bk has mean 0 and standard deviation d1 . Then our problem k becomes 2 n1 m1 min x d 2 b k k k=0 j=0 xj j (tk ) 2 min D b AT x x , (2) where D is a diagonal matrix with entries dk . To solve this problem, we can set the gradient of D(b AT x) obtaining the linear system of equations AD2 AT x = AD2 b. 2 to zero, (3) If A has full rank (i.e., the basis functions j are linearly independent and the data points tk are distinct), then this system has a unique solution. Therefore, we can solve our weighted least squares problem (2) by forming the matrix AD2 AT and then solving the linear system of equations. The most ecient method for solving this requires computing a lower triangular matrix L so that LLT = AD2 AT . The matrix L is called the Cholesky factor of AD2 AT , and its computation requires approximately m3 /3 operations. We propose to form AD2 AT and factor it on a GPU. This might limit us to single precision. If we require a double precision answer x, then it might be necessary to perform mixed precision iterative renement, as explained in Algorithm 1. Note that we form the initial x0 by forward and back substitusgl tion, solving Ly = AD2 b from the rst to the last equation, and then solving LT x0 = y from the last to the rst equation. sgl 3 Algorithm 1 Mixed-precision iterative renement [2, 5] x0 = double(x0 ); dbl sgl k = 0; repeat rk = AD2 b AD2 AT xk ; // A,D and b in double precision dbl Solve LLT xsgl = single(rk ); // L in single precision xk+1 = xk + double(xsgl ); dbl dbl k =k+1 until rk 2 / xk+1 or kiteration limit dbl 3 Assembling Symmetric Matrices In this section we discuss how AD2 AT can be formed eciently. 3.1 Full format on a CPU The full or non-packed format stores an m m matrix in an m m array or texture as illustrated in Fig. 1. The gure also shows how we store other matrices and vectors used for assembling AD2 AT . Denote the k-th column of A by ak . We observe that n1 C AD2 AT = k=0 d 2 ak aT . k k Therefore, we can initialize the array C to 0 and, at step k, add in baT , where k b = d2 ak . k Assembling the matrix AD2 AT without considering its structure wastes computational resources, because its upper and lower triangular parts are exactly the same. To avoid redundancy we may omit computing the upper triangular part and copy the lower triangular part if necessary. The algorithm consists of two main routines, column scaling and outer product addition. At the k-th iteration, we scale ak by d2 , and store it in a temporary vector b. Then k we add the outer product of b and ak to C. Considering the redundancy, we may write a CPU version that computes only the lower triangular matrix as in Algorithm 2. 3.2 Full format on a GPU We can match the two routines, column scaling and outer product addition, to GPU kernels as described in Kernels 1 and 2 and Fig. 2. The key to saving computational time lies in how we arrange the two nested for loops (iterating with i and j) for the outer product addition. We design the loops so that they iterate over the lower triangular matrix. We can implement this simple strategy using the triangular rasterization on a GPU. Drawing an isosceles right triangle 4 .5 .5 1.5 2.5 3.5 4.5 5.5 1.5 2.5 3.5 4.5 5.5 c00 c01 c02 c03 c04 c05 c10 c11 c12 c13 c14 c15 c20 c21 c22 c23 c24 c25 c30 c31 c32 c33 c34 c35 c40 c41 c42 c43 c44 c45 c50 c51 c52 c53 c54 c55 y .5 1.5 2.5 3.5 4.5 5.5 x .5 1.5 2.5 3.5 4.5 5.5 .5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 a00 a01 a02 a03 a04 a05 a06 a07 a10 a11 a12 a13 a14 a15 a16 a17 a20 a21 a22 a23 a24 a25 a26 a27 a30 a31 a32 a33 a34 a35 a36 a37 a40 a41 a42 a43 a44 a45 a46 a47 a50 a51 a52 a53 a54 a55 a56 a57 y x .5 b0 b1 b2 b3 b4 b5 y x .5 .5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 d0 d1 d2 d3 d4 d5 d6 d7 y x Figure 1: This gure illustrates how we store matrices and vectors for m = 6 and n = 8. We use C for storing the result of AD2 AT , b for temporary storage, and d for storing the diagonal of D2 . The white colored entries of C store 0. Algorithm 2 A CPU algorithm to form AD2 AT // Indices are in (row, column) order // Input d contains the diagonal elements of the scaling matrix D2 // We store a scaled column in b C = zeros(m,m); for k = 0 to n-1 do // Column scaling for i = 0 to m-1 do b(i) = d(k)*A(i,k); end for // Outer product addition for j = 0 to m-1 do for i = j to m-1 do C(i, j) = C(i, j) + A(i, k)b(j); end for end for end for 5 can initiate the outer product addition kernel to work on the lower triangular matrix as illustrated in Fig. 2(b). Kernel 1 GPU kernel for column scaling float main ( uniform uniform float3 return texRECT (A , } samplerRECT A : TEXUNIT0 , samplerRECT d : TEXUNIT1 , index : TEXCOORD0 ) : COLOR { index . yx ) * texRECT (d , index . yz ); Kernel 2 GPU kernel for outer product addition float main ( uniform samplerRECT C uniform samplerRECT A uniform samplerRECT b float2 C_index float2 A_index float2 b_index return texRECT (C , C_index . xy ) + texRECT (A , A_index . xy ) } : : : : : : TEXUNIT0 , TEXUNIT1 , TEXUNIT2 , WPOS , TEXCOORD0 , TEXCOORD1 ) : COLOR { * texRECT (b , b_index . xy ); To generate texture coordinates for fetching the input textures A and b in the outer product addition kernel, we should attach texture coordinates to each vertex of the triangle. The matrix A is stored in an n m texture and the temporary vector b, which stores a scaled column of A, is stored in an m 1 texture with w idth height ordering. At the k-th iteration, we need two sets of texture coordinates, (k + 0.5, j) for fetching A and (i, 0.5) for b, to process a fragment at (i, j) (with (x, y) coordinates ordering). The y coordinate of (k + 0.5, j) and the x coordinate of (i, 0.5) are synchronized with the fragment position. So for a vertex at (x, y), we attach two sets of texture coordinates (k + 0.5, y) for A and (x, 0.5) for b. Then the rasterizer will linearly interpolate the required coordinates for fetching the inputs. Implementing the column scaling operation is easier than the outer product addition. We rst set the viewport size as m 1 so that the size of a pixel matches the size of a texel. Of course, we should change the viewport size to mm when we perform the outer product addition. To make the column scaling kernel operate on the temporary vector b, we draw a rectangle with vertices at (0, 0), (0, 1), (m, 1) and (m, 0). In processing a fragment at (i, 0.5) of b, the kernel needs to fetch the entries at (k + 0.5, i) of A and at (k + 0.5, 0.5) of d. The y coordinate of the texture coordinates for fetching A is synchronized with the x coordinate of the fragment position. So for a vertex at (x, y), we attach a set of texture coordinates (x, k +0.5, 0.5), and use the y and x coordinates of the interpolated coordinate triad for fetching A and y and z for d. By combining the two routines, we can write an algorithm for assembling AD2 AT on a GPU as described in Algorithm 3. 6 Algorithm 3 Assembling AD2 AT in the full format on a GPU Create two textures, CS and CT , of size m m; Create a texture b of size m 1; Bind CS as the target; Clear the target buer; Bind CT as the target; Clear the target buer; for k = 0 to n-1 do // Scale the k-th column of A Load column scaling kernel; Bind b as the target; Bind A and d as the input textures; Set viewport size as m 1; Draw a rectangle covering the texture b; // Add the k-th outer product to CT Load outer product addition kernel; Bind CT as the target; Bind CS , A and b as the input textures; Set viewport size as m m; Draw a triangle covering the lower triangular part of CT ; Swap CT and CS ; end for V(0, -.5) T0(k+.5, -.5) V(0,0) T0(0, k+.5, .5) V(m,0) T0(m, k+.5, .5) T1(0, .5) 1 1 4 2 3 V(0, 1) T0(0, k+.5, .5) V(m,1) T0(m, k+.5, .5) V(0, m) T0(k+.5, m) T1(0, .5) 2 3 V(m+.5, m) T0(k+.5, m) T1(m+.5, .5) (a) Rasterizing a rectangle for the k-th column scaling. (b) Rasterizing an isosceles right triangle for the k-th outer product addition. Figure 2: k-th iteration of the matrix multiplication for computing AD2 AT . Coordinates are in (x, y, z) order. Ti represents the i-th texture coordinates attached to a vertex V . For xed texture coordinates, we use an oset of 0.5 to point to the center of texels. The rasterizer will generate the 0.5 oset for interpolated coordinates. 7 R o ta te 1 8 0 te 1 8 0 R o ta (a) An example of packing a 66 matrix in a 3 6 texture. (b) An example of packing a 77 matrix in a 4 7 texture. Figure 3: In the packed format, we save storage by storing the right lower triangular submatrix in the upper corner by rotating it. 3.3 Rectangular packed format In addition to saving computational resources, we can save storage space by exploiting the matrix structure. Since our GPU algorithm computes only the lower triangular matrix, the upper triangular part is not used at all. Gunnels and Gustavson explored a rectangular packed format requiring half the storage space [6]. In our work, we store the rotated right lower triangular submatrix in the upper left corner as illustrated in Fig. 3. Packing a symmetric or triangular matrix results in a wh texture, where w = m/2 and h = (m+mod(m+1, 2)) represent width and height, respectively. The key to assembling the matrix in the packed format is the triangular rasterization. The column scaling is not dierent from section 3.2. For the outer product addition, we need to draw two separate triangles, because the input access pattern of the fragments in the lower trapezoid is dierent from the upper triangle. So one triangle, with vertices 1 through 3, covers the trapezoid and the other, with vertices 4 through 6, covers the upper triangular part. We can use the same kernels by attaching dierent texture coordinates T0 and T1 to each vertex V of the triangle covering the upper part. We attach texture coordinates for the vertices of the upper part as we would for their original positions in the full format. The texture coordinates attached to vertices 1 through 3 are exactly the same as those used for the full format. We summarize how we issue vertices with attached texture coordinates in Fig. 4. 4 Cholesky Decomposition We can factor an m m symmetric positive denite matrix C as C = LLT , where L is a lower triangular matrix and is usually referred to as the Cholesky factor. 8 6.V(-.5, 0) T0(k+.5, m) T1(m+.5, .5) 6 5 6.V(.5, 0) T0(k+.5, m) T1(m+.5, .5) 5.V(w, 0) T0(k+.5, m) 6 5 5.V(w, 0) T0(k+.5, m) T1(w, .5) 1.V(0, -.5) T0(k+.5, -.5) T1(0, .5) 1 1 T1(w, .5) 1.V(0, .5) T0(k+.5, -.5) T1(0, .5) 4 4.V(w, w+.5) T0(k+.5, w-.5) T1(w, .5) 4 4.V(w, w-.5) T0(k+.5, w-.5) T1(m, .5) 3.V(m+.5, m) 2.V(0, m) T0(k+.5, m) T0(k+.5, m) T1(m+.5, .5) 2 3 2.V(0, h) T0(k+.5, m) T1(0, .5) 2 3.V(m+.5, h) T0(k+.5, m) T1(m+.5, .5) 3 T1(0, .5) (a) When m is even, h = m + 1 and w = m/2. (b) When m is odd, h = m and w = m/2 + 0.5. Figure 4: Drawing two isosceles right triangles is the key to assembling the matrix in packed format. Notice that the texture for coordinates the 6th vertex are same as those for the 3rd. Texture coordinates for the 5th vertex would be same as those for a vertex at (w, h). 4.1 Full format on CPU A CPU implementation of the Cholesky decomposition consists of three components, described in Algorithm 4. We repeat square rooting the k-th diagonal entry, normalizing the entries below the diagonal, and subtracting the outer product of the normalized column from the entries in the remaining lower triangular submatrix. In the k-th normalization process, we use the pivot to indicate the k-th diagonal entry. In the k-th outer product subtraction process, we refer to the entries at (j, k) and (i, k) as the pivot and neighbor of an active entry at (i, j), respectively. 4.2 Full format on GPU We can transform each component of Algorithm 4 to a GPU kernel as described in Kernels 3, 4 and 5. Kernel 3 GPU kernel for square rooting float main ( uniform samplerRECT L : TEXUNIT0 , float2 index : WPOS ) : COLOR { return sqrt ( texRECT (L , index . xy )); } 9 Algorithm 4 A CPU version of Cholesky decomposition // Indices are in (row, column) order L = lower triangle(C); for k = 0 to m-2 do // Square rooting L(k, k) = sqrt(L(k, k)); // Normalization for i = k+1 to m-1 do L(i, k) = L(i, k)/L(k, k); end for // Outer product subtraction for i = k+1 to m-1 do for j = k+1 to i do L(i, j) = L(i, j) - L(j, k)L(i, k); end for end for end for L(m-1, m-1) = sqrt(L(m-1, m-1)); Kernel 4 GPU kernel for normalization float main ( uniform samplerRECT L : TEXUNIT0 , float2 index : WPOS , float2 pivot_index : TEXCOORD0 ) : COLOR { return texRECT (L , index . xy ) / texRECT (L , pivot_index . xy ); } Kernel 5 GPU kernel for outer product subtraction float main ( uniform samplerRECT L : TEXUNIT0 , float2 index : WPOS , float2 neighbor_ in dex : TEXCOORD0 , float2 pivot_index : TEXCOORD1 ) : COLOR { return texRECT (L , index . xy ) texRECT (L , neighbor_i nd ex . xy ) * texRECT (L , pivot_index . yx ); } For the outer product subtraction, we draw an oversized triangle whose vertices are at (k + 1, k + 0.5), (k + 1, m) and (m + 0.5, m) to cover the lower triangular matrix. As seen in Fig. 5, we attach two sets of texture coordinates to each vertex to generate indices for the active neighbor and pivot. An active fragment at (j, i) will have the interpolated texture coordinates T0 of (k + 0.5, i) and T1 of (j, k + 0.5). As described in Fig. 5(b), the pivot is at (k + 0.5, j). So we use the GPUs swizzle operator to rearrange indices for the pivot. For the square rooting, we simply draw a square to make the kernel operate on the diagonal entry. For the normalization kernel, we draw a rectangle covering the o-diagonal column with attached texture coordinates pointing to the 10 square rooted diagonal entry. By combining all three rasterization processes, we can write a GPU version of Cholesky decomposition as described in Algorithm 5. Algorithm 5 Cholesky decomposition on a GPU Create two streams LS and LT of size equal to C; Copy C to LS ; for k = 0 to m 2 do Load square root kernel; Bind LT as the target and LS as an input texture; Draw a square covering the entry at (k, k); Load copy kernel; Bind LS as the target and LT as an input texture; Draw a square covering the entry at (k, k); Load normalization kernel; Bind LT as the target and LS Draw a rectangle covering the Load copy kernel; Bind LS as the target and LT Draw a rectangle covering the as an input texture; k-th column below the diagonal; as an input texture; k-th column below the diagonal; Load outer product subtraction kernel; Bind LT as the target and LS as an input texture. Draw an isosceles right triangle covering lower triangular submatrix; Swap LS and LT ; end for Load square root kernel; Bind LT as the target and LS as an input texture; Draw an isosceles right triangle covering the entry at (m 1, m 1); // Now LT is the Cholesky factor of C To complete the factorization, we draw m squares, m 1 rectangles, and m 1 triangles to initiate the square rooting, normalization, and outer product subtraction kernel, respectively. Drawing a square covering a single pixel to initiate the square root kernel does not utilize the parallel architecture [7]. As a result our Cholesky algorithm utilizes less memory bandwidth for small matrices than the GPU LU decomposition algorithm of Galoppo et al. [4]. Coping with this trouble requires architectural changes supporting thread level parallelism. With the support, square rooting could be done in conjunction with the outer product subtraction once the outer product subtraction kernel nishes computing the left-most diagonal entry. 11 1.V(k+1, k+.5) T0(k+.5, k+.5) T1(k+1, k+.5) 1 3.V(m+.5, m) T0(k+.5, m) 2.V(k+1, m) T0(k+.5, m) T1(k+1, k+.5) 2 3 (k+.5,j) T1(m+.5, k+.5) (k+.5,i) (j,i) (a) We draw an oversized triangle to cover the sub-lower triangular part of the matrix. We attach texture coordinates T0 to the vertices to get the indexes for the active pivot and neighbor elements. (b) An active fragment at (j, i) has its active neighbor at (k + 0.5, j) and active pivot at (k + 0.5, i), whereas it has interpolated texture coordinates T0 of (k + 0.5, i) and T1 of (j, k + 0.5). Figure 5: These gures illustrate how we rasterize an oversized triangle for the k-th outer product subtraction of Cholesky decomposition. 4.3 Rectangular packed format In order to factor a symmetric positive denite matrix in packed format, we draw two triangles for the outer product subtraction in the rst half of the iterations as we did in section 3.3. In the second half of the iterations, we draw a single triangle covering the upper triangular part. Since we draw triangles dierently, we need to divide the single for loop in Algorithm 5 into two, the rst of which iterates k from 0 to w 1 and the second of which iterates k from w 1 to 1 (if m is even, as in Fig. 6) or to 2 (if m is odd, as in Fig. 7). We can also use the same kernels that we used for the full format by attaching dierent texture coordinates to each vertex of the triangles. We summarize how to specify vertices and texture coordinates for the outer product subtraction in Fig. 6 and Fig. 7. 5 Results We tested our algorithms using a PC equipped with an Intel Xeon 3.0GHz CPU and a GeForce 7800 GTX GPU attached to 16x PCIe slot, and running on 64bit Red Hat Linux. 5.1 Assembling AD2 AT The packed format can save half of the storage space without sacricing speed. Fig. 8 compares our implementation for packed format with an algorithm made with saxpy and ssyrk of the ATLAS (Automatically Tuned Linear Algebra 12 6. V(-.5, 0) T0(k+.5, h) T1(m+1.5, k+.5) 6 5 5. V(w, 0) T0(k+.5, h) T1(w+1, k+.5) 3 2 1. V(k+1, k+1.5) T0(k+.5, k+1.5) T1(k+2, k+.5) 1 3. V(-.5, 0) T0(k+.5, 0) T1(-.5, k+.5) 2. V(k, 0) T0(k+.5, 0) T1(k, k+.5) 1. V(k, k+.5) T0(k+.5, k+.5) T1(k, k+.5) 1 4 4. V(w, w+.5) T0(k+.5, w+.5) T1(w+1, k+.5) 2. V(k+1, h) T0(k+.5, h) T1(k+2, k+.5) 3. V(m+.5, h) T0(k+.5, h) T1(m+1.5, k+.5) 2 3 (a) For the rst half of the iterations, we iterate k from 0 to w 1. The texture coordinates attached to the 6th vertex are same as those for the 3rd vertex. (b) For the second half of the iterations, we decrease k from w 1 to 1. Figure 6: These gures illustrate how we rasterize the triangles for the k-th outer product subtraction of Cholesky decomposition for the packed format when m is even. 6.V(.5, 0) T0(k+.5, m) 6 T1(m+.5, k+.5) 5 3 5.V(w, 0) T0(k+.5, m) T1(w, k+.5) 2 1.V(k+1, k+.5) T0(k+.5, k+.5) T1(k+1, k+.5) 1 3. V(.5, 0) T0(k+.5, 0) T1(-.5, k+.5) 2. V(k,0) T0(k+.5, 0) T1(k-1, k+.5) 1. V(k, k-.5) T0(k+.5, k-.5) T1(k-1, k+.5) 1 4.V(w, w-.5) 4 T0(k+.5, w-.5) T1(w, k+.5) 3.V(m+.5, m) 2.V(k+1, m) T0(k+.5, m) T1(k+1, k+.5) 2 T0(k+.5, m) T1(m+.5, k+.5) 3 (a) For the rst half of the iterations, we iterate k from 0 to w 1. The texture coordinates attached to the 6th vertex are same as those for the 3rd vertex. (b) For the second half of the iterations, we decrease k from w 1 to 2. Figure 7: These gures illustrate how we rasterize the triangles for the k-th outer product subtraction of Cholesky decomposition for the packed format when m is odd. 13 Figure 8: Timing result for assembling AD2 AT . Transferring A and d to and retrieving the result from the GPU are not considered. Software) library. As the packed version performs almost the same as the nonpacked version, we do not provide the timing result for non-packed format. (For m=2048, the non-packed version is only 1% faster than the packed version.) Drawing two triangles only requires three more vertices and three more sets of texture coordinates, and the time is negligible. The number of processed fragments at every iteration determines the performance of our algorithm, and it is exactly the same in both the non-packed and the packed version. Our GPU implementations outperforms that of ATLAS for m larger than 768, but, due to latency caused by initiating kernels through drawing shapes, GPU implementations are slower than ATLAS for small matrices. Problems of size less than 1024 are considered small by todays standards. 5.2 Cholesky Decomposition Triangular rasterization is the key to implementing the algorithms for decomposition. Without the triangular rasterization, we cannot achieve performance gain over LU [7]. We compared our Cholesky algorithm with the LU algorithm written by Galoppo et al. [4]. The LU has wider applicability (since it can be used for nonsymmetric matrices, too) but requires twice as many arithmetic operations. As seen in Fig. 9, our algorithm outperformed the LU algorithm by 83.5% for m = 3584. We also compared our algorithms with spotrf, the Cholesky decomposition algorithm of ATLAS. 14 Figure 9: Timing result for Cholesky decomposition. As for the Cholesky decomposition algorithm, the number of generated fragments at every iteration is exactly the same in both the non-packed and the packed version. For this reason, the algorithm for the rectangular packed format performs as fast as that for the non-packed format. For m = 3328, the non-packed version is only 1% faster than the packed. We cannot pack the matrix for m = 4096, because the packed matrix would have height 4097 whereas the maximum texture height that the GPU supports is 4096. 5.3 Weighted Least Squares In our implementation, we used the MATLAB-C interface to make use of MATLAB functions for operations other than the assembly and factorization. CPU operations use double precision. MATLAB uses Intel Math Kernel Library for the BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage) routines. Table 1 and Table 2 show the timing results and performance gain of our implementation using the GPU over the CPU implementation in solving randomly generated problems of various sizes, with n = 2m. We measured the speedup by dividing the CPU timing result by that of GPU. We measured the relative error of xk by computing dbl xk xCP U dbl xCP U 15 m 512 1024 1536 2048 Table 1: Results for randomly generated weighted least squares. Relative Error Renement Time (s) iteration Speedup GPU x0 Rened CPU sgl x0 Rene Total count sgl 3.11E-04 3.10E-04 4.61E-04 1.01E-03 3.37E-13 4.25E-13 6.96E-13 1.76E-12 4 4 4 5 0.39 0.49 1.33 2.99 0.02 0.04 0.09 0.22 0.41 0.53 1.42 3.21 0.20 1.03 3.41 7.95 0.49 1.94 2.40 2.48 Table 2: Results for randomly generated ill-conditioned weighted least squares. Time (s) Relative Error Renement Speedup m iteration GPU 0 xsgl Rened CPU count x0 Rene Total sgl 512 1024 1536 2048 2.92E-02 7.19E-02 9.35E-02 1.13E-01 1.16E-10 2.01E-10 2.37E-10 3.41E-10 7 10 13 15 0.36 0.48 1.33 3.00 0.02 0.10 0.30 0.66 0.38 0.58 1.63 3.66 0.20 1.05 3.41 7.98 0.53 1.81 2.09 2.18 where xCP U is the solution to the system of equations (3) computed only by the CPU in double precision. Timing results to get x0 include transferring sgl the data A and d, and retrieving the Cholesky factor L. We set the tolerance parameter to 108 and the iteration limit to 100 in Algorithm 1. We generated the data matrix A, the diagonal entries of D2 , and the vector b using uniformly distributed random numbers between 0 and 1. To generate illconditioned problems, we set the i-th diagonal element of D2 as 104+8i/(n1) for i = 0, . . . , n 1. Ill-conditioned problems require more renement iterations. Notice that we obtained more than 100% speedup for m larger than 1536 in both types of problems. 6 Conclusions We have presented ecient GPU algorithms for forming a dense positive denite matrix AD2 AT and then computing its Cholesky factor. The best algorithms exploit triangular rasterization to minimize storage without increasing computation time. By comparing our implementations with conventional CPU algorithms, we demonstrated the potential of the commodity parallel architecture of a GPU for solving important numerical problems. We demonstrated performance advantage in solving weighed least squares problems by using a GPU. In such applications, assembling and factoring are most demanding of computational resources. Thus support for triangular rasterization is essential in exploiting the structure of symmetric or triangular 16 matrices. This provides one reason for developers of streaming languages such as BrookGPU [1] and Accelerator [11] to consider implementing this feature. References [1] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, Brook for GPUs: stream computing on graphics hardware, in: SIGGRAPH 04: ACM SIGGRAPH 2004 Papers, ACM, New York, NY, USA, 2004. [2] A. Buttari, J. Dongarra, J. Langou, J. Langou, P. Luszczek, J. Kurzak, Mixed precision iterative renement techniques for the solution of dense linear systems, International Journal of High Performance Computing Applications 21 (4) (2007) 457466. [3] K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the eciency of GPU algorithms for matrix-matrix multiplication, in: HWWS 04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, ACM Press, New York, NY, USA, 2004. [4] N. Galoppo, N. K. Govindaraju, M. Henson, D. Manocha, LU-GPU: Ecient algorithms for solving dense linear systems on graphics hardware, in: SC 05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, IEEE Computer Society, Washington, DC, USA, 2005. [5] G. H. Golub, C. F. Van Loan, Matrix Computations (3rd ed.), Johns Hopkins University Press, Baltimore, MD, USA, 1996. [6] J. A. Gunnels, F. G. Gustavson, A new array format for symmetric and triangular matrices., in: PARA, 2004. [7] J. H. Jung, Cholesky decomposition and linear programming on a GPU, Scholarly Paper (Jan. 2006). [8] J. Krger, R. Westermann, A GPU framework for solving systems of linear u equations, in: M. Pharr (ed.), GPUGems 2 : Programming Techniques for High-Performance Graphics and General-Purpose Computation, chap. 44, Addison-Wesley, 2005, pp. 703718. [9] W. R. Mark, R. S. Glanville, K. Akeley, M. J. Kilgard, Cg: a system for programming graphics hardware in a C-like language, in: SIGGRAPH 03: ACM SIGGRAPH 2003 Papers, ACM, New York, NY, USA, 2003. [10] Y. Nesterov, A. Nemirovski, Interior Point Polynomial Algorithms in Convex Programming, SIAM, 1995. [11] D. Tarditi, S. Puri, J. Oglesby, Accelerator: simplied programming of graphics processing units for general-purpose uses via data-parallelism, Tech. Rep. MSR-TR-2005-184, Microsoft (Dec. 2005). 17 [12] S. J. Wright, Primal-Dual Interior-Point Methods, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997. 18
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Maryland - TOMOS - 7984
Exploiting Structure of Symmetric or Triangular Matrices on a GPUJin Hyuk Jung Dianne P. OLeary January 2008Abstract Matrix computations are expensive, and GPUs have the potential to deliver results at reduced cost by exploiting parallel computati
Maryland - TOMOS - 1903
ABSTRACTDissertation Title: IMPACTS OF CULTURAL CAPITAL AND ECONOMIC CAPITAL ON STUDENT COLLEGE CHOICE PROCESS IN CHINA Lan Gao, Doctor of Philosophy, 2008Directed By:Dr. Steve Klees and Dr. Jing Lin, Department of Education Leadership, Higher E
Maryland - TOMOS - 8187
ABSTRACTDissertation Title: IMPACTS OF CULTURAL CAPITAL AND ECONOMIC CAPITAL ON STUDENT COLLEGE CHOICE PROCESS IN CHINA Lan Gao, Doctor of Philosophy, 2008Directed By:Dr. Steve Klees and Dr. Jing Lin, Department of Education Leadership, Higher E
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:PROJECTING SUBJECTS IN SPANISH AND ENGLISH Ivn Ortega Santos, Doctor of Philosophy, 2008Directed By:Prof. Juan Uriagereka, Department of LinguisticsThe focus of this dissertation is syntactic movement and its relat
Maryland - TOMOS - 8182
ABSTRACTTitle of Document:PROJECTING SUBJECTS IN SPANISH AND ENGLISH Ivn Ortega Santos, Doctor of Philosophy, 2008Directed By:Prof. Juan Uriagereka, Department of LinguisticsThe focus of this dissertation is syntactic movement and its relat
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:DEGREES OF ACCESS: FACTORS PREVENTING WIDE-SCOPE COVERAGE OF THE IRAQ WAR BY EMBEDDED REPORTERSFROM SHOCK AND AWE TO MISSION ACCOMPLISHED (MARCH 21 - MAY 1, 2003) Submitted by: Lindsay Reed Walton Candidate, Master of Ar
Maryland - TOMOS - 8322
ABSTRACTTitle of Document:DEGREES OF ACCESS: FACTORS PREVENTING WIDE-SCOPE COVERAGE OF THE IRAQ WAR BY EMBEDDED REPORTERSFROM SHOCK AND AWE TO MISSION ACCOMPLISHED (MARCH 21 - MAY 1, 2003) Submitted by: Lindsay Reed Walton Candidate, Master of Ar
Maryland - TOMOS - 1903
ABSTRACTTitle of Thesis:LIGHTING DESIGN OF THE ASHGIRL INA & JACK KAY THEATRE CLARICE SMITH PERFORMING ARTS CENTER UNIVERSITY OF MARYLAND Rebecca Melissa Wolf, Master of Fine Arts, 2008Thesis Directed by:Asst. Professor, Harold Burgess II, De
Maryland - TOMOS - 8451
ABSTRACTTitle of Thesis:LIGHTING DESIGN OF THE ASHGIRL INA & JACK KAY THEATRE CLARICE SMITH PERFORMING ARTS CENTER UNIVERSITY OF MARYLAND Rebecca Melissa Wolf, Master of Fine Arts, 2008Thesis Directed by:Asst. Professor, Harold Burgess II, De
Maryland - TOMOS - 1903
ABSTRACT Title of Thesis: ATTACHMENT AND DEMAND/WITHDRAW BEHAVIOR IN COUPLE INTERACTIONS: THE MODERATING ROLE OF CONFLICT LEVEL Katelyn C. Opel, MS, 2008 Thesis Directed By: Professor Norman B. Epstein, Department of Family ScienceThis study examin
Maryland - TOMOS - 8321
ABSTRACT Title of Thesis: ATTACHMENT AND DEMAND/WITHDRAW BEHAVIOR IN COUPLE INTERACTIONS: THE MODERATING ROLE OF CONFLICT LEVEL Katelyn C. Opel, MS, 2008 Thesis Directed By: Professor Norman B. Epstein, Department of Family ScienceThis study examin
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:TRADE OPENNESS AND WELL-BEING: DO COMPLEMENTARY CONDITIONS MATTER? Julio A. Guzman, PhD, 2008Directed By:Prof. Carol Graham, Public PolicyIn the last three decades, most of the existing literature using regression
Maryland - TOMOS - 8327
ABSTRACTTitle of Document:TRADE OPENNESS AND WELL-BEING: DO COMPLEMENTARY CONDITIONS MATTER? Julio A. Guzman, PhD, 2008Directed By:Prof. Carol Graham, Public PolicyIn the last three decades, most of the existing literature using regression
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:SCREAMS SOMEHOW ECHOING: TRAUMA AND TESTIMONY IN ANGLOPHONE AFRICAN LITERATURE Michelle Lynn Brown, Ph.D., 2008Directed By:Professor Sangeeta Ray, Department of EnglishPostcolonial literary critics note persistentl
Maryland - TOMOS - 8539
ABSTRACTTitle of Document:SCREAMS SOMEHOW ECHOING: TRAUMA AND TESTIMONY IN ANGLOPHONE AFRICAN LITERATURE Michelle Lynn Brown, Ph.D., 2008Directed By:Professor Sangeeta Ray, Department of EnglishPostcolonial literary critics note persistentl
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:JOINT REPLENISHMENT AND SUPPLY CHAIN ACTIONS IN THE RETAIL GROCERY INDUSTRY: TWO ESSAYS Pamela S. Donovan, Ph.D., 2006Directed By:Dr. Curtis Grimm, Deans Professor of Supply Chain and Strategy, Logistics, Business &
Maryland - TOMOS - 3968
ABSTRACTTitle of Document:JOINT REPLENISHMENT AND SUPPLY CHAIN ACTIONS IN THE RETAIL GROCERY INDUSTRY: TWO ESSAYS Pamela S. Donovan, Ph.D., 2006Directed By:Dr. Curtis Grimm, Deans Professor of Supply Chain and Strategy, Logistics, Business &
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:LAND PRESERVATION, VOLUNTARY PROGRAMS, AND REGULATORY INSTRUMENTSXiangping Liu, Doctor of Philosophy, 2008 Dissertation directed by: Professor Andreas Lange Department of Agricultural and Resource EconomicsIn the
Maryland - TOMOS - 8342
ABSTRACTTitle of Dissertation:LAND PRESERVATION, VOLUNTARY PROGRAMS, AND REGULATORY INSTRUMENTSXiangping Liu, Doctor of Philosophy, 2008 Dissertation directed by: Professor Andreas Lange Department of Agricultural and Resource EconomicsIn the
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:INFORMATION EXCHANGE IN THE MARKETPLACE: TWO ESSAYS ON FIRM STRATEGIES AND STAKEHOLDER PERCEPTIONS Michael Donald Pfarrer, Doctor of Philosophy, 2007Dissertation directed by:Professor Violina P. Rindova Departmen
Maryland - TOMOS - 7304
ABSTRACTTitle of Dissertation:INFORMATION EXCHANGE IN THE MARKETPLACE: TWO ESSAYS ON FIRM STRATEGIES AND STAKEHOLDER PERCEPTIONS Michael Donald Pfarrer, Doctor of Philosophy, 2007Dissertation directed by:Professor Violina P. Rindova Departmen
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:BEYOND CYNICISM: HOW MEDIA LITERACY CAN MAKE STUDENTS MORE ENGAGED CITIZENS Paul Mihailidis, 2008Dissertation Directed by:Susan Moeller, Associate Professor, Philip Merrill College of JournalismBeyond Cynicism:
Maryland - TOMOS - 8301
ABSTRACTTitle of Dissertation:BEYOND CYNICISM: HOW MEDIA LITERACY CAN MAKE STUDENTS MORE ENGAGED CITIZENS Paul Mihailidis, 2008Dissertation Directed by:Susan Moeller, Associate Professor, Philip Merrill College of JournalismBeyond Cynicism:
Maryland - TOMOS - 1903
ABSTRACTTitle:QUANTITATIVE GLOBAL HEAT-TRANSFER MEASUREMENTS USING TEMPERATURESENSITIVE PAINT ON A BLUNT BODY IN HYPERSONIC FLOWSInna Kurits Master of Science, 2008 Directed by: Professor Mark J. Lewis Department of Aerospace EngineeringA qua
Maryland - TOMOS - 8302
ABSTRACTTitle:QUANTITATIVE GLOBAL HEAT-TRANSFER MEASUREMENTS USING TEMPERATURESENSITIVE PAINT ON A BLUNT BODY IN HYPERSONIC FLOWSInna Kurits Master of Science, 2008 Directed by: Professor Mark J. Lewis Department of Aerospace EngineeringA qua
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:THE COMMUNITY CAPACITY BUILDING IMPACT OF THE BALTIMORE EMPOWERMENT ZONE Richard Patrick Clinch, Doctor of Philosophy, 2008Directed By:Robert H. Nelson School of Public PolicyThe federal Empowerment Zone/Enterprise
Maryland - TOMOS - 8303
ABSTRACTTitle of Document:THE COMMUNITY CAPACITY BUILDING IMPACT OF THE BALTIMORE EMPOWERMENT ZONE Richard Patrick Clinch, Doctor of Philosophy, 2008Directed By:Robert H. Nelson School of Public PolicyThe federal Empowerment Zone/Enterprise
Maryland - TOMOS - 1037
Evolving a Set of Techniques for OO InspectionsForrest Shullfshull@fc-md.umd.eduGuilherme H. Travassos1travassos@cs.umd.eduJeffrey Carvercarver@cs.umd.eduVictor R. Basilibasili@cs.umd.eduExperimental Software Engineering Group Departme
Maryland - TOMOS - 1903
Evolving a Set of Techniques for OO InspectionsForrest Shullfshull@fc-md.umd.eduGuilherme H. Travassos1travassos@cs.umd.eduJeffrey Carvercarver@cs.umd.eduVictor R. Basilibasili@cs.umd.eduExperimental Software Engineering Group Departme
Maryland - TOMOS - 1036
Secure AgentsPiero A. Bonatti Sarit Krausy V.S. SubrahmanianzWith the rapid proliferation of software agents, there comes an increased need for agents to ensure that they do not provide data and/or services to unauthorized users. We rst develop an
Maryland - TOMOS - 1903
Secure AgentsPiero A. Bonatti Sarit Krausy V.S. SubrahmanianzWith the rapid proliferation of software agents, there comes an increased need for agents to ensure that they do not provide data and/or services to unauthorized users. We rst develop an
Maryland - TOMOS - 1031
The CBP Parameter a Useful Annotation to Aid SDF Compilers1Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park ssb@eng.umd.edu Praveen K. Murthy
Maryland - TOMOS - 1903
The CBP Parameter a Useful Annotation to Aid SDF Compilers1Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park ssb@eng.umd.edu Praveen K. Murthy
Maryland - TOMOS - 1030
XMT-M: A Scalable Decentralized ProcessorEfraim Berkovich, Joseph Nuzman, Manoj Franklin, Bruce Jacob, and Uzi Vishkin Department of Electrical and Computer Engineering, and University of Maryland Institute for Advanced Computer Studies (UMIACS) Uni
Maryland - TOMOS - 1903
XMT-M: A Scalable Decentralized ProcessorEfraim Berkovich, Joseph Nuzman, Manoj Franklin, Bruce Jacob, and Uzi Vishkin Department of Electrical and Computer Engineering, and University of Maryland Institute for Advanced Computer Studies (UMIACS) Uni
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:Experimental and numerical characterization of turbulent slot film cooling. Carlos A. Cruz, PhD, 2008Directed By:Associate Professor Andr W. Marshall, and Associate Professor Arnaud Trouv, Department of Fire Protecti
Maryland - TOMOS - 8145
ABSTRACTTitle of Document:Experimental and numerical characterization of turbulent slot film cooling. Carlos A. Cruz, PhD, 2008Directed By:Associate Professor Andr W. Marshall, and Associate Professor Arnaud Trouv, Department of Fire Protecti
Maryland - TOMOS - 1903
ABSTRACTTitle of document:PROCESS MODELING OF A WIRE SAW OPERATION Thomas C. Palathra, Master of Science, 2008Directed by:Professor Raymond Adomaitis Department of Chemical and Biomolecular EngineeringMulticrystalline (MC) silicon solar cel
Maryland - TOMOS - 8496
ABSTRACTTitle of document:PROCESS MODELING OF A WIRE SAW OPERATION Thomas C. Palathra, Master of Science, 2008Directed by:Professor Raymond Adomaitis Department of Chemical and Biomolecular EngineeringMulticrystalline (MC) silicon solar cel
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:BILDU G AND GENDER IN NINETEENTHCENTURY BOURGEOIS GERMANY: A CULTURAL STUDIES ANALYSIS OF TEXTS BY WOMEN WRITERS Cauleen Suzanne Gary, PhD, 2008Directed By:Professor Elke P. Frederiksen, Department of Germanic Studie
Maryland - TOMOS - 8490
ABSTRACTTitle of Document:BILDU G AND GENDER IN NINETEENTHCENTURY BOURGEOIS GERMANY: A CULTURAL STUDIES ANALYSIS OF TEXTS BY WOMEN WRITERS Cauleen Suzanne Gary, PhD, 2008Directed By:Professor Elke P. Frederiksen, Department of Germanic Studie
Maryland - TOMOS - 1903
ABSTRACTTitle of Dissertation:BASIC WRITING, BINARIES, AND BRIDGES: DIFFERENCE AND POWER IN THE PRODUCTION AND RECEPTION OF REPRESENTATIONS OF STUDENTS Maurice C. Champagne, Doctor of Philosophy, 2008Dissertation directed by:Professor Shirley
Maryland - TOMOS - 8493
ABSTRACTTitle of Dissertation:BASIC WRITING, BINARIES, AND BRIDGES: DIFFERENCE AND POWER IN THE PRODUCTION AND RECEPTION OF REPRESENTATIONS OF STUDENTS Maurice C. Champagne, Doctor of Philosophy, 2008Dissertation directed by:Professor Shirley
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:THE EFFECTS OF FINGER MOVEMENT CONDITIONS AND SPEED ON FINGER INTERDEPENDENCY James Jungwoo Lieu, Master of Arts, 2008Directed By:Assistant Professor Dr. Jae Kun Shim, Department of KinesiologyThe study investigate
Maryland - TOMOS - 8499
ABSTRACTTitle of Document:THE EFFECTS OF FINGER MOVEMENT CONDITIONS AND SPEED ON FINGER INTERDEPENDENCY James Jungwoo Lieu, Master of Arts, 2008Directed By:Assistant Professor Dr. Jae Kun Shim, Department of KinesiologyThe study investigate
Maryland - TOMOS - 1903
Path Projection for User-Centered Static Analysis ToolsKhoo Yit Phang Jeffrey S. Foster Michael Hicks Vibha SazawalUniversity of Maryland, College Park {khooyp,jfoster,mwh,vibha}@cs.umd.eduAbstractThe research and industrial communities have mad
Maryland - TOMOS - 8369
Path Projection for User-Centered Static Analysis ToolsKhoo Yit Phang Jeffrey S. Foster Michael Hicks Vibha SazawalUniversity of Maryland, College Park {khooyp,jfoster,mwh,vibha}@cs.umd.eduAbstractThe research and industrial communities have mad
Maryland - TOMOS - 1903
Directing JavaScript with Arrows (Functional Pearl)Khoo Yit Phang Michael Hicks Jeffrey S. Foster Vibha SazawalUniversity of Maryland, College Park {khooyp,mwh,jfoster,vibha}@cs.umd.eduAbstractJavaScript, being a single-threaded language, makes
Maryland - TOMOS - 1903
ABSTRACT Title of Dissertation: MATHEMATICAL MODELING OF LATERALIZATION AND ASYMMETRIES IN CORTICAL MAPS Svetlana Levitan, Doctor of Philosophy, 1999 Dissertation directed by: Professor James A. Reggia Applied Mathematics ProgramRecent experimental
Maryland - TOMOS - 1903
Maryland - TOMOS - 1903
ABSTRACTTitle of Document:STOIC FARMERS, SILENT WOMEN: THE PORTRAYAL OF THE ICELANDIC FAMILY IN TWO NOVELS BY HALLDR LAXNESS Robert Jennings Parker, M.A., 2008Directed By:Professor Rose-Marie Oster, Department of Germanic StudiesNobel Prize
Maryland - TOMOS - 1903
ABSTRACTTitle of dissertation:LANGUAGE-SPECIFIC CONSTRAINTS ON SCOPE INTERPRETATION IN FIRST LANGUAGE ACQUISITION Takuya Goro, Doctor of Philosophy, 2007Directed By:Professor Colin Phillips Department of LinguisticsThis dissertation investi
Maryland - TOMOS - 1903
ABSTRACTTitle:UNDERSTANDING BULIMIA: A QUALITATIVE EXPLORATION OF THE ROLES OF RACE, CULTURE, AND FAMILY Ashley L. SouthardDirected By:Associate Professor Leigh A. Leslie, Department of Family ScienceThe eating disorder, bulimia nervosa, is
Maryland - TOMOS - 8348
ABSTRACTTitle:UNDERSTANDING BULIMIA: A QUALITATIVE EXPLORATION OF THE ROLES OF RACE, CULTURE, AND FAMILY Ashley L. SouthardDirected By:Associate Professor Leigh A. Leslie, Department of Family ScienceThe eating disorder, bulimia nervosa, is
Maryland - TOMOS - 8340
ABSTRACTTitle of Document:THE EFFECTS OF LOW-FAT DIET AND EXERCISE ON C-REACTIVE PROTEIN AND METABOLIC SYNDROME: FINDINGS FROM A RANDOMIZED CONTROLLED TRIAL Sarah Michelle Camhi, Doctor of Philosophy, 2008Directed By:Professor Deborah R. Youn
Maryland - TOMOS - 1903
ABSTRACTTitle of dissertation:CONTRIBUTIONS TO THE DYNAMICS OF HELICOPTERS WITH ACTIVE ROTOR CONTROL Carlos A. Malpica Doctor of Philosophy, 2008Dissertation directed by: Professor Roberto CeliDepartment of Aerospace EngineeringThis dissert
Maryland - TOMOS - 8549
ABSTRACTTitle of Document:THE IMPACT OF TEACHER INTERACTION ON THE ACHIEVEMENT AND SELF-EFFICACY OF STUDENTS WITHIN A COMPUTER-BASED, DEVELOPMENTAL MATHEMATICS COURSE Kristy M. Vernille Blocklin, Ph.D., 2008Directed By:Dr. James T. Fey Dr. An
Maryland - TOMOS - 1903
Maryland - TOMOS - 5994
Maryland - TOMOS - 1903
TECHNICAL RESEARCH REPORTSampled-Data Modeling and Analysis of PWM DC-DC Converters Under Hysteretic Controlby C.-C. Fang, E.H. AbedT.R. 98-56ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hie
Maryland - TOMOS - 5966
TECHNICAL RESEARCH REPORTSampled-Data Modeling and Analysis of PWM DC-DC Converters Under Hysteretic Controlby C.-C. Fang, E.H. AbedT.R. 98-56ISR develops, applies and teaches advanced methodologies of design and analysis to solve complex, hie