Here we use a CP3b implementation as an example. To get a number of useful statistics related to the performance of your implementation, try to run e.g. this command:
perf stat -d ./cp-benchmark 4000 4000 10
See the perf manual for more information on the usage. To get more meaningful results, it is often a good idea to switch off hyper-threading. Then, for example, the number of instructions per cycle per thread is much easier to interpret.
In the perf output, these numbers are often very helpful:
To identify the most critical parts of your code, you can also try e.g.:
perf record ./cp-benchmark 4000 4000 10
This will create a data file
perf.data that you can then study with a simple text-based user interface:
Select the relevant function, select “Annotate”, and it should take you directly to the most performance-critical part in the assembly code.