We can use rotate-left-then-mask-insert instructions (rlwimi and rldimi) for efficient implementation of bitfield insert (and similar code sequences generated by SROA etc). However, the current LLVM generates inefficient code sequence for this purpose. For example of bitfieldinsert64 in the added unit test, it generates four instructions instead of just one rldimi instruction.
We already have a method to generate rotate-left-then-mask-insert for 32-bit integer (tryBitfieldInsert) in PPCDAGToDAGISel, but it is not executed for most of the simple bitfield insert since tryBitPermutation is executed before tryBitfieldInsert and it generates suboptimal code.
This patch makes tryBitfieldInsert executed before tryBitPermutation with the limited targets of the simplest cases. This patch also adds a 64-bit version of tryBitfieldInsert to generated rldimi instruction.
Do we not want to use uint32_t instead of unsigned to emphasize that this is a 32-bit value?