Unformatted text preview: CSE 721 Programming Assignment 1 Due 2/28/2011, 3:00PM For this assignment, you are to create versions of the following two codes using SSE intrinsics. Submit via Carmen; be sure to include source code listings and report on performance. A single file must be uploaded to Carmen for the assignment, which includes the written report as well as code listings and output from execution of the programs. 1. (35 points) The following code implements a “hardwired” matrixmatrix product for 4x4 singleprecision floatingpoint matrices. In some computational domains such as QCD (Quantum Chromo Dynamics), a large number of such matrixproducts of small fixedsize matrices is required. The performance of math library matrix multiplication routines is generally not optimized for such small matrix sizes. Hence SSEbased specialized codes are used. void mul4x4(float *A,float *B, float *C) { int i,j,k; for(i=0;i<4;i++) { for(j=0;j<4;j++) C[4*i+j] = 0.0; for(k=0;k<4;k++) for(j=0;j<4;j++) C[4*i+j] += A[4*i+k]*B[4*k+j];...
This note was uploaded on 03/08/2012 for the course CSE 721 taught by Professor Saday during the Winter '11 term at Ohio State.
 Winter '11
 Saday

