I found that reporting of vector speedup is more realistic when based on compilation with Qunroll4. This is only partly explained by weak non vector performance of intel compilers without that option.
In that connection, the advice sometimes issued to cut back unrolling when time is spent in remainder loop appears wrong. Vectorized remainder loop perform as well as main loop would without unrolling. Where advisor claims more efficiency for vector loop without unroll, it doesn't look right.
Intel comparisons with gnu compilers seem always to be based on not setting good unroll options, taking advantage of the gnu default being worse than Intel 's In the application I'm characterizing now in advisor, unroll4 gives 4% overall gain even though the top 10 hotspots are vectorizable.