This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: LSU EE 7700-1 Homework 4 Solution Due: 28 April 2010 To complete this assignment follow the setup instructions from the course Web page (if not already followed). The setup instructions bring you to the point where you can compile the cpu-only examples but for this assignment that wont be necessary. Then follow the instructions in the first problem to complete and test the setup. Problem 0: Check out the latest copy of the balls project base code and the include directory. The transcript below shows the steps needed. ( a ) Read the following overview of how the code operates. This code includes something like a solution to Homework 3, except that direct cuda calls were not made and the code returns the actual proximity pairs, not just a count. The code computing proximity pairs runs inefficiently due to data access problems and other issues, the point of this homework assignment is to improve it. Here is an outline of how the affected code operates: The code in balls.cc:cuda_contact_pairs_find sorts the balls (phys) in z order using class PSList. An array on cuda, z_sort_indices , maps a z-order index to a balls array in- dex. For example, z_sort_index == 12 means that ball number 12 (the index to cpu ar- ray physs and gpu arrays ball_x ) has the sixth smallest z value. This array is initialized in balls.cc:cuda_contact_pairs_find and sent over to CUDA. The z max values are also placed in an array and sent to CUDA. The routine launches a CUDA kernel, pass_sched , to compute the proximity pairs, and reads back this data, in CUDA array cuda_prox . The data in cuda_prox is unpacked in routine con- tact_pairs_find . The problems below are related to the data moving between CPU and GPU, and data moving between the different memory areas in the GPU. Here is a summary of data movement that the proximity code currently induces: The kernel pass_sched reads the z_sort_indices and z max arrays, and also reads the data in balls_x that is already present on the GPU. (That is, the data for balls_x is only sent to the GPU when balls are added or removed. Otherwise the GPU keeps using the copy it has.) The kernel writes only the array cuda_prox . Each thread of the kernel, pass_sched in file balls-kernel.cu, performs one outer iteration of the idx9 loop from contact_pairs_proximity_check . That is, each thread has its own idx9 value and must iterate idx1 values until there is no possibility of proximity. Unlike contact_pairs_proximity_check , pass_sched does not check for proximity between tiles and balls, only balls and balls. (This was to make the assignment less tedious.) The CPU code, in contact_pairs_proximity_check , prepares a Contact structure for each nearby idx1 it finds. The corresponding code in pass_sched simply passes the values idx1s back to the CPU code....
View Full Document
- Spring '08