This is a follow-up to r336178.
This would be one way of working around the alignment problem.
Actually, it would be enough to just check for clang here, because even at -Os we will not emit functions with less than 4-byte alignment.
It's not very pretty though. Is this class performance sensitive enough that it's worth the complexity of packing flags into the function pointer, or could we trade size for simplicity and just store the flags in another byte?