Reland below reverted commits with some changes:
As an optimization, split the entry block of kernel K just after top most static
alloca cluster.
From the better code transformation/optimization perspective, it is expected
that all static alloca appear as a single contiguous cluster at the start of the
entry block. If this canonical form is *not* maintained, then few static alloca
may become dynamic after the entry block split.
There is no reason to assume allocas are clustered in any way in the entry block. That is certainly never the case.