Following analysis is in context to bug# 33362 related to instruction scheduling.
1/ PRE-RA scheduler DAG based Instruction Scheduling algorithm which scans instruction bottom up
by default picks instruction as per source order(default) from the available instructions to be scheduled in next cycle. Of course instructions are available in the priority queue only after they have satisfied precedence and resource constraints. Thus LZCNT64rr <second_argument> which corresponds to following IR %4 = tail call i64 @llvm.ctlz.i64(i64 %1, i1 false) <-- SWAP is scheduled later (remember algorithm works bottom-up so final schedule is reverse order of instructions which algo produces.)
2/ Peephole optimizer which kicks in later down the pipeline could fold the TEST
with CMPNE to produce CMPAE in first case (clz_i128) as LZCNT operating on first argument was adjacent to TEST instruction and both are EFLAGS reg defining MIs.
3/ In second function(clz_i128_swap) due to swapping of llvm.ctlz LZCNT operating
on first argument of function is no longer adjacent to TEST due to reason mentioned in Point 1/. Which is why flag folding could not happen.
Adding a priority function based on heuristic which takes into
consideration already scheduled instruction which share their operands with
the next instruction to be chosen from from the available queue
under option -pre-RA-sched=guided-src. Fall back priority selection is source order.
This case works, share your thoughts on this and what could be a better heuristic.