This is an archive of the discontinued LLVM Phabricator instance.

[mips] Fix 64-bit address loading in case of applying 32-bit mask to the result
ClosedPublic

Authored by atanasyan on Aug 14 2019, 9:35 AM.

Details

Summary

If result of 64-bit address loading combines with 32-bit mask, LLVM tries to optimize the code and remove "redundant" loading of upper 32-bits of the address. It leads to incorrect code on MIPS64 targets.

MIPS backend creates the following chain of commands to load 64-bit address in the MipsTargetLowering::getAddrNonPICSym64 method:

(add (shl (add (shl (add %highest(sym), %higher(sym)),
                    16),
               %hi(sym)),
          16),
     %lo(%sym))

If the mask presents, LLVM decides to optimize the chain of commands. It really does not make sense to load upper 32-bits because the 0x0fffffff mask anyway clears them. After removing redundant commands we get this chain:

(add (shl (%hi(sym), 16), %lo(%sym))

There is no patterns matched (MipsHi (i64 symbol)). Due a bug in SYM_32 predicate definition, backend incorrectly selects a pattern for a 32-bit symbols and uses the lui instruction for loading %hi(sym).

As a result we get incorrect set of instructions with unnecessary 16-bit left shifting:

lui     at,0x0
    R_MIPS_HI16     foo
dsll    at,at,0x10
daddiu  at,at,0
    R_MIPS_LO16     foo

This patch resolves two problems:

  • Fix SYM_32/SYM_64 predicates to prevent selection of patterns dedicated to 32-bit symbols in case of using N64 ABI.
  • Add missed patterns for 64-bit symbols for %hi/%lo.

Fix PR42736.

Diff Detail

Event Timeline

atanasyan created this revision.Aug 14 2019, 9:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 9:35 AM

This looks OK to me, I'd like to take a hard look at what DAGCombiner is doing here.

Can you also supply a test case where the mask has the topmost bit in an i32 set so that we can catch the sign extension cases?

llvm/lib/Target/Mips/Mips64InstrInfo.td
702

This addition and below needs SYM_64 I believe.

I am not really sure if this is correct way to fix PR42736. I looked in td files and found a few inconsistencies, again I might have misinterpreted something.

MipsHi(the sd node) seems to have multiple interpretations if we add new patterns

There is MipsHiLoRelocs multiclass in MipsInstrInfo.td

multiclass MipsHiLoRelocs<Instruction Lui, Instruction Addiu,
                          Register ZeroReg, RegisterOperand GPROpnd> {
  def : MipsPat<(MipsHi tglobaladdr:$in), (Lui tglobaladdr:$in)>;
...

and in Mips64InstrInfo.td

defm : MipsHiLoRelocs<LUi64, DADDiu, ZERO_64, GPR64Opnd>, ISA_MIPS3, GPR_64,
       SYM_32;

equivalent to

def : MipsPat<(MipsHi tglobaladdr:$in), (LUi64 tglobaladdr:$in)>;

disagrees with the new

def : MipsPat<(shl (MipsHi (i64 tglobaladdr:$in)), (i32 16)),
              (LUi64 tglobaladdr:$in)>, ISA_MIPS3, GPR_64;

In td file MipsHi is interpreted like: "16 bit imm(bits 31-16 of global addr in our case) is placed into bits 31-16"
while in getAddrNonPICSym64 it is: "16 bit imm(bits 31-16 of global addr in our case) is placed into low 16 bits of result and shift for 16 is required?"

If we use existing MipsISD::Hi interpretation from td file getAddrNonPICSym64 could be modified like this:

       SDValue HigherPart =
           DAG.getNode(ISD::ADD, DL, Ty, Highest,
                       DAG.getNode(MipsISD::Higher, DL, Ty, Higher));
-      SDValue Cst = DAG.getConstant(16, DL, MVT::i32);
+      SDValue Cst = DAG.getConstant(32, DL, MVT::i32);
       SDValue Shift = DAG.getNode(ISD::SHL, DL, Ty, HigherPart, Cst);
       SDValue Add = DAG.getNode(ISD::ADD, DL, Ty, Shift,
                                 DAG.getNode(MipsISD::Hi, DL, Ty, Hi));
-      SDValue Shift2 = DAG.getNode(ISD::SHL, DL, Ty, Add, Cst);
 
-      return DAG.getNode(ISD::ADD, DL, Ty, Shift2,
+      return DAG.getNode(ISD::ADD, DL, Ty, Add,
                          DAG.getNode(MipsISD::Lo, DL, Ty, Lo));

Also MipsISD::Highest and MipsISD::Higher do not seem consistent with their names,
MipsISD::Highest places 16 bit imm(bits 48-63 of global addr in our case) into bits 31-16

MipsISD::Highest is selected like MipsISD::Hi (lui), while
MipsISD::Higher is selected like MipsISD::Lo (addiu)

Replacing MipsISD::Highest and MipsISD::Higher with MipsISD::Hi and MipsISD::Lo in getAddrNonPICSym64, results in same instructions being selected.
Now, I might be missing something here since I am not that familiar with llvm options and ways that global address and similar should be transformed into instructions,
but SDNodes

def MipsHigher : SDNode<"MipsISD::Higher", SDTIntUnaryOp>;
def MipsHighest : SDNode<"MipsISD::Highest", SDTIntUnaryOp>;

in MipsInstrInfo.td, and patterns they define seem to be duplicates of

def MipsHi    : SDNode<"MipsISD::Hi", SDTIntUnaryOp>;
def MipsLo    : SDNode<"MipsISD::Lo", SDTIntUnaryOp>;

and its patterns. Thoughts?

sdardis requested changes to this revision.Aug 15 2019, 12:55 PM

I took a second look, and I believe this patch is the incorrect solution. The bug actually lies in the implementation of PredicateControl and SYM_32/SYM_64, @Petar.Avramovic nearly spotted it.

Those two adjectives add a predicate which checks if the sym32 feature is enabled or not[1]. However, as they are not appended to the list of predicates for a pattern--as SYMPredicates is not in the definition of PredicateControl.

Without that adjective taking effect there are two sets of patterns for a MipsHi node depending on it's parent. If the parent node is a register and an add--as expected--the instruction selection machinery produces an DADDiu with the relevant operand and relocation. If it is not an add, i.e. the case where the known values of the upper 32 bits are zero which would allow DAGCombiner to elide the addition of the upper components and the addition of the upper components to the MipsHi node as they are all zero, then the instruction selection machinery can't maximally munch to the graph like in the (add (MipsHi ...) ..) case and picks the transformation of MipsHi -> LUi64, when that node is the child of a shift because the SYM_32 patterns provide for that result at a lower complexity.

Those patterns should not be selectable, leading to an instruction selection failure. Since they are selectable, the result of an LUi64 gets shifted into MipsHigher's space then has MipsLo added to it.

The correct approach I believe is to fix the SYM_32 bug, then provide the patterns for cases of where intermediate/end nodes such as MipsHi / MipsLo can appear without an add appearing as their parent node.

[1] There's a bug there too! it should be hasSym32() rather than HasSym32().

llvm/lib/Target/Mips/Mips64InstrInfo.td
702

I don't believe this comment is relevant now.

llvm/test/CodeGen/Mips/global-address-with-mask.ll
1 ↗(On Diff #215143)

Call this file pr42736.ll

This revision now requires changes to proceed.Aug 15 2019, 12:55 PM

@sdardis What is the difference between MipsHi and MipsLo SD nodes? Some of their patterns are the same, e.g. :

def : MipsPat<(add GPR64:$hi, (MipsHi (i64 tglobaladdr:$lo))),
              (DADDiu GPR64:$hi, tglobaladdr:$lo)>, ISA_MIPS3, GPR_64, SYM_64;

def : MipsPat<(add GPR64:$hi, (MipsLo (i64 tglobaladdr:$lo))),
              (DADDiu GPR64:$hi, tglobaladdr:$lo)>, ISA_MIPS3, GPR_64, SYM_64;

Both act like "glue" between add/shift/nothing node and node with part of the address (controlled by target flag) and are used for pattern match into lui or addiu.
They are used interchangeably but I didnt find any comments about what these nodes actually do.
MipsHi can be alone (lui) or used in combine with add (into addiu) according to patterns. By the name I would use MipsLo if I wanted it to combine with add into addiu, not MipsHi.
Aren't all of the MipsHighest, MipsHigher, MipsHi and MipsLo equivalent to some Mips16bitImm node
that would have patterns
Mips16bitImm + shl 16 -> lui
Mips16bitImm + add -> addiu
Mips16bitImm alone -> addiu with zero
but then instead of using MipsHi and MipsHighest alone and expecting lui to be selected we need to make two nodes: Mips16bitImm and shl 16.
MipsHigher and MipsLo are fine if we just replace them with Mips16bitImm.
What difference does SYM_32/SYM_64 make, all selected instructions use only 16 bits of something that is resolved later?

@sdardis What is the difference between MipsHi and MipsLo SD nodes? Some of their patterns are the same, e.g. :

Maybe the difference is in the relocations related to each nodes.

In getAddrNonPICSym64 Relocation looks to come from another SDValue (Hi bellow)
MipsISD::Hi doesn't seem to be connected with any of the target flags (like MipsII::MO_ABS_HI)

SDValue Shift = DAG.getNode(ISD::SHL ...

SDValue Hi = getTargetNode(N, Ty, DAG, MipsII::MO_ABS_HI);	<-Relocation
SDValue Add = DAG.getNode(ISD::ADD, DL, Ty, Shift,		<-Add should become Daddiu when combined with MipsISD::Hi node(below) and  SDValue Hi (above)
                          DAG.getNode(MipsISD::Hi, DL, Ty, Hi));  <-Glue Node, pattern matches immediate daddiu instead of ordinary daddu. This can be whatever as long as these three nodes give Daddiu with imm that is defined in SDValue Hi(MipsISD::Lo also works)

Thanks for review. Could you clarify some points in your comments?

The correct approach I believe is to fix the SYM_32 bug, then provide the patterns for cases of where intermediate/end nodes such as MipsHi / MipsLo can appear without an add appearing as their parent node.

  1. If we do not change the getAddrNonPICSym64 function, at some point (before lowering) we anyway(?) get the following chain of commands (add (shl (%hi(sym), 16), %lo(%sym)). How can we lower it to a correct set of instructions, if we do not have a pattern with the shl? Please correct we if I'm wrong.
  2. What do you mean by "patterns for cases of where intermediate/end nodes such as MipsHi / MipsLo can appear without an add appearing as their parent node"? Do you mean a pattern like add zero, MipsHi(tglobaladdr), or just MipsHi(tglobaladdr) or something else?

Thanks for review. Could you clarify some points in your comments?

The correct approach I believe is to fix the SYM_32 bug, then provide the patterns for cases of where intermediate/end nodes such as MipsHi / MipsLo can appear without an add appearing as their parent node.

  1. If we do not change the getAddrNonPICSym64 function, at some point (before lowering) we anyway(?) get the following chain of commands (add (shl (%hi(sym), 16), %lo(%sym)). How can we lower it to a correct set of instructions, if we do not have a pattern with the shl? Please correct we if I'm wrong.

Match the MipsHi to daddiu zero, MipsHi. The shift will move the Hi value to the correct place, this is correct behaviour. If you look at the stanza of patterns for Higher, you'll notice they're duplicated, both (MipsHigher (i64 symboltype:%sum)) and (add $highest (MipsHigher (i64 symboltype:%sum))). Providing the first set of patterns for (MipsHi (i64 symbol)) will allow DAGISel to pick the correct instructions. We can''t use LUi64 in this case as it would sign extend the symbol value into the upper 32 bits.

  1. What do you mean by "patterns for cases of where intermediate/end nodes such as MipsHi / MipsLo can appear without an add appearing as their parent node"? Do you mean a pattern like add zero, MipsHi(tglobaladdr), or just MipsHi(tglobaladdr) or something else?

We need to ensure that (add Hi$, MipsHi/Higher(tglobaladdr)) can be selected for the chain of nodes created by getAddrNonPICSym64. If that series of nodes is modified we could have a case where the .td patterns don't match what getAddrNonPICSym64 produce, and DAGISel sees (shl (MipsHi (tglobaladdr), 16). We need two sets of patterns, one for (add (MipsHi ...)) and (MipsHi ..).

Also, without fixing the sym32 bug, this patch won't produce the correct result.

...
What difference does SYM_32/SYM_64 make, all selected instructions use only 16 bits of something that is resolved later?

SYM_32 is a special submode of the N64 ABI where symbols are 32bits wide. This means embedded/kernel developers who precisely know how their memory map is laid out can save on code-size.

atanasyan updated this revision to Diff 217256.Aug 26 2019, 3:50 PM
atanasyan edited the summary of this revision. (Show Details)
  • Fix SYM_32/SYM64 predicates definitions
  • Add missed patterns to load 64-bit symbol's address

LGTM. Let's wait for @sdardis.

sdardis accepted this revision.Aug 28 2019, 12:46 PM

LGTM.

llvm/lib/Target/Mips/Mips.td
28

Nit: Predicates for a symbol's sizee such as hasSym32.

llvm/lib/Target/Mips/Mips64InstrInfo.td
654

Touch up the formatting in a separate NFC commit.

This revision is now accepted and ready to land.Aug 28 2019, 12:46 PM

Thanks for review.

This revision was automatically updated to reflect the committed changes.