I experimented with making a new pass for the improved generation of BRCT loops for SystemZ. This is based on PPCCTRLoops.cpp.
Results:
Number of brct's on SPEC:
w/out patch: 4752
w/out patch, unrolled: 1367
w/ patch: 8743
w/ patch, unrolled: 11518 (new prologue loops increase the number)
gcc with unrolling generates 20864 brct's, so this result is an improvement, although not necessarily near optimal.
This seemed necessary in order to activate the loop-unroller and tune it properly, otherwise a loop might get unrolled but the BRCT and some performance will be lost.