I can compute GLOPS based on #instructions and instruction type.
For example, I have 17694720000 FMA3, so I have 17694720000*3*8*3*8=283GFLOPS. Great! Intel advisor gives the same number.
However, how to get Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)? (The number in the bottom right. It does not match with #instructions for memory load.
AVX; FMA
Instruction Set
3.842s
Self time
Metric format: (XXXX – Total count of instructions, YYYY – average count of instructions per iteration).
Warning: Currently Dynamic Instruction Mix doesn’t reflect instructions inside of non-inlined function calls.">Dynamic Instruction Mix Summary
Memory
34% (11796480000)
Vector
34% (11796480000)
AVX
34% (11796480000)
Compute
57% (19722240000)
Vector
51% (17694720000)
FMA
51% (17694720000)
Scalar
6% (2027520000)
x86
6% (2027520000)
Other
9% (3133440000)
Statistics for FLOPS And Data Transfers
Self GFLOPS
808.89193
Giga Floating-point Operations Per Second
Self GFLOPS = Self GFLOP / Self Elapsed Time
Self AI
2.18182
Self AI - Self Arithmetic Intensity - Ratio Of Self Floating-Point Operations To Self L1 Transferred Bytes
Self GFLOP
283.11552
Giga Floating-Point Operations, Not Including GFLOP For Functions Called In The Loop Or Function
Self Elapsed Time
0.350s
Elapsed Time Is The Exclusive (Self-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Elapsed Time Is Equal To Self-Time
Total Elapsed Time
0.350s
Total Elapsed Time Is The Inclusive (Total-Time-Based) Wall Time From The Beginning To The End Of Loop/Function Execution. For Single-Threaded Applications Total Elapsed Time Is Equal To Total-Time
Data transfers between CPU and memory sub-system (total traffic, including L1, L2, LLC and DRAM traffic)
In Giga Bytes, Not Including Transfers For Functions Called In The Loop Or Function
129.76128
In Giga Bytes Per Second
370.74213