Base and offset are always separated when a GlobalAddress node is lowered
(rL332641) as an optimization to reduce instruction count. However, this
optimization is not profitable if the Global Address ends up being used in only
instruction.
This patch adds peephole optimizations that merge an offset of
an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.
The peephole handles three patterns:
1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset ---> ADDI (LUI %hi(global + offset)) %lo(global + offset).
This generates:
lui a0, hi (global + offset) add a0, a0, lo (global + offset)
Instead of
lui a0, hi (global) addi a0, hi (global) addi a0, offset
This pattern is for cases when the offset is small enough to fit in the immediate filed of ADDI (less than 12 bits).
2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset)) ---> offset = hi_offset << 12 ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a0, hi(global + offset) addi a0, lo(global + offset)
Instead of:
lui a0, hi(global) addi a0, lo(global) lui a1, (offset) add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field of ADDI but the lower 12 bits are all zeros.
3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset))) ---> offset = global + offhi20<<12 + offlo12 ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a1, %hi(global + offset) addi a1, %lo(global + offset)
Instead of:
lui a0, hi(global) addi a0, lo(global) lui a1, (offhi20) addi a1, (offlo12) add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field of ADDI and both the lower 1 bits and high 20 bits are non zero.
There are a few clang-tidy issues in this function (extra space after return, lack of space before RISCVII::MO_HI.