Make sure that you use the m thread option when you

2. Scaling efficiency: Without using any tools, run the bt.A solver with 1, 2, 4 and 8 threads and record the total operation throughput reported by the benchmark (Mop/s total) for each run. Plot this throughput and contrast it against an ideal scaling throughput. Discuss what you observe. 3. Differential analysis: For this step you have to use HPCToolkit. Collect only time measurements (PAPI_TOT_CYC) for two runs with 2 and 4 threads, respectively. Compute a performance database with the data for thread 0 from each run, as shown in the lecture slides example.
