This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Prevent Merging Bitcast with non-normal loads
ClosedPublic

Authored by niravd on Apr 3 2017, 10:46 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

niravd created this revision.Apr 3 2017, 10:46 AM
uweigand edited edge metadata.Apr 4 2017, 6:40 AM

I guess this is OK with me, but I admit I don't quite understand why this would make any difference ... As far as I can see, the flavor of DAG.getLoad currently used would use the existing MemOperand, including its existing alignment, while the flavor in your patch creates a new MemOperand using the passed alignment value which the patch also gets from the current MemOperand. Either way we ought to end up with the same alignment ...

Would you mind explaining what I'm missing here?

niravd added a comment.Apr 4 2017, 8:38 AM

I was slightly incorrectl it's not the alignment. I mixed up the various getLoads in my tracing. It's an issue with offset. I just pulled the corresponding call from the visitBITCAST since it's doing the same operation.

The original was indirectly calling:

SDValue getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,

const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
EVT MemVT, MachineMemOperand *MMO);

with Offset set to undef which seems wrong. I suspect it should be zero.

The new version calls:

SDValue getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,

MachinePointerInfo PtrInfo, unsigned Alignment = 0,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
const AAMDNodes &AAInfo = AAMDNodes(),
const MDNode *Ranges = nullptr);

I was slightly incorrectl it's not the alignment. I mixed up the various getLoads in my tracing. It's an issue with offset. I just pulled the corresponding call from the visitBITCAST since it's doing the same operation.

The original was indirectly calling:

SDValue getLoad(ISD::MemIndexedMode AM, ISD::LoadExtType ExtType, EVT VT,
                  const SDLoc &dl, SDValue Chain, SDValue Ptr, SDValue Offset,
                  EVT MemVT, MachineMemOperand *MMO);

with Offset set to undef which seems wrong. I suspect it should be zero.

According to the comments and assertions in ::getLoad, Offset must be Undef for non-indexed loads.

The new version calls:

SDValue getLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
                 MachinePointerInfo PtrInfo, unsigned Alignment = 0,
                 MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
                 const AAMDNodes &AAInfo = AAMDNodes(),
                 const MDNode *Ranges = nullptr);

This likewise forwards to another version of getLoad specifying an unindexed load and an Offset of Undef, so I'm afraid I still don't see any difference:

SDValue SelectionDAG::getLoad(EVT VT, const SDLoc &dl, SDValue Chain,
                              SDValue Ptr, MachinePointerInfo PtrInfo,
                              unsigned Alignment,
                              MachineMemOperand::Flags MMOFlags,
                              const AAMDNodes &AAInfo, const MDNode *Ranges) {
  SDValue Undef = getUNDEF(Ptr.getValueType());
  return getLoad(ISD::UNINDEXED, ISD::NON_EXTLOAD, VT, dl, Chain, Ptr, Undef,
                 PtrInfo, VT, Alignment, MMOFlags, AAInfo, Ranges);
}

I'm not trying to be difficult here :-) I'd just like to understand what's going on to make sure there is no problem lurking with any of the other getLoad calls in the SystemZ back-end ...

jonpa edited edge metadata.Apr 5 2017, 1:56 AM

I don't follow how this relates to Offset (which seems to be 1 w or w/out patch)..?

The assert that triggers says 'memvt.getStoreSize() <= MMO->getSize() && "Size mismatch!"', which to me means that when calling getLoad(), the memory operand must have a size that is at least as big as the the size in memory of the load VT.

At trunk, the MMO Size is 1, as indicated on the DAG dump:

t0: ch = EntryToken
  t27: i32,ch = load<LD1[undef+1], sext from i8> t0, undef:i64, undef:i64
t22: f32 = bitcast t27

The load is going to load 1 byte (LD1), and then sext to i32.

With the trunk call to getLoad, we try to make a new Load of type f32 with a memory operand of size 1, which triggers the assert.

The patch instead calls getLoad (line 5621) with VT==f32, which calls getLoad (line 5551) with MemVT==f32. This method then creates a new MachineMemOperand of size 'MemVT.getStoreSize()', which is 4. Therefore this works.

This however means we are now loading 4 bytes as a float, instead of 1 byte which is sign-extended and then bitcast. I am guessing this must be wrong, or?

jonpa added a comment.Apr 5 2017, 4:19 AM

On the other hand, it's a load from an undef address, so maybe that isn't illegal in this case...?

Ah, I think the problem is simply that this piece of code in lowerBITCAST does not expect to be handling extending loads at all, and does the wrong thing for them. It probably just should verify this, and fall through to the rest of the code if the load is actually an extending one ...

niravd added a comment.Apr 5 2017, 6:30 AM

Ah, I think the problem is simply that this piece of code in lowerBITCAST does not expect to be handling extending loads at all, and does the wrong thing for them. It probably just should verify this, and fall through to the rest of the code if the load is actually an extending one ...

This seems reasonable. We can change line 2794 to check isNormalLoad(N) and from my quick look at SystemZISelLowering it looks like Extended Loads have no direct realization.

jonpa added a comment.Apr 5 2017, 6:40 AM

Ah, I think the problem is simply that this piece of code in lowerBITCAST does not expect to be handling extending loads at all, and does the wrong thing for them. It probably just should verify this, and fall through to the rest of the code if the load is actually an extending one ...

This seems reasonable. We can change line 2794 to check isNormalLoad(N) and from my quick look at SystemZISelLowering it looks like Extended Loads have no direct realization.

Yes - I just tried

  • return DAG.getLoad(ResVT, DL, LoadN->getChain(), LoadN->getBasePtr(),
  • LoadN->getMemOperand());

+ if (LoadN->getExtensionType() == ISD::NON_EXTLOAD)
+ return DAG.getLoad(ResVT, DL, LoadN->getChain(), LoadN->getBasePtr(),
+ LoadN->getMemOperand());

And also had to add the -mcpu=zEC12 to the test (at least on my x86 laptop):
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=zEC12 -o - %s | FileCheck %s

We should basically not get a new Load SDNode if the load is extending.

/Jonas

jonpa added a comment.Apr 5 2017, 6:41 AM
-    return DAG.getLoad(ResVT, DL, LoadN->getChain(), LoadN->getBasePtr(),
-                       LoadN->getMemOperand());
+    if (LoadN->getExtensionType() == ISD::NON_EXTLOAD)
+      return DAG.getLoad(ResVT, DL, LoadN->getChain(), LoadN->getBasePtr(),
+                         LoadN->getMemOperand());

sorry - this is with formatting
niravd updated this revision to Diff 94231.Apr 5 2017, 8:00 AM

update to prevent extended / indexed loads from being merged with bitcast.

niravd retitled this revision from [SystemZ] Fix bitcast of load translation to preserve alignment. to [SystemZ] Prevent Merging Bitcast with non-normal loads.Apr 5 2017, 8:00 AM

Patch also updated test to remove undef input address. Looks right to me.

uweigand accepted this revision.Apr 5 2017, 8:20 AM

LGTM, thanks!

This revision is now accepted and ready to land.Apr 5 2017, 8:20 AM
This revision was automatically updated to reflect the committed changes.