This change builds on the compiler support for selectively disabling assembler inserted padding - we really need the assembler syntax finalized - to selectively disable nop/prefix insertion in cold basic blocks. This has the effect of reducing the code size impact of the branch alignment mitigation.
Based on some quick manual analysis of our assembly on a randomly chosen java workload, padding in slow paths was the most glaringly obvious deficiency with what's currently checked in.
For the moment, the detection of a cold region follows the precedent used elsewhere in the backend and relies on ProfileSummaryInfo. This may change in the future as I can't really find documentation for what this is, or how a frontend might generate it. I figured it was better to get something in and tested then to roll too much into one change, so expect some follow up there.
Missing -mtriple=x86_64.