This is an archive of the discontinued LLVM Phabricator instance.

[TypePromotion] Search from ZExt + PHI
ClosedPublic

Authored by avieira on Oct 6 2021, 9:06 AM.

Details

Summary

Hi all,

This is a WIP patch to teach TypePromotion to promote PHI-nodes with a single use by a ZExt or SExt, where the incoming values are either constants or loads where the target can extend the loaded value for free.

See below a motivating example:

#include <stdint.h>
int foo(uint8_t* p) {
  uint16_t index = *p;
  do {
      index = p[index];
  } while (index < 100);
  return index;
}

For aarch64 this would end up with the following IR:

define dso_local i32 @_Z3fooPh(i8* nocapture readonly %p) local_unnamed_addr #0 {
entry:
  %0 = load i8, i8* %p, align 1, !tbaa !8
  br label %do.body
do.body:                                          ; preds = %do.body, %entry
  %index.0.in = phi i8 [ %0, %entry ], [ %1, %do.body ]
  %idxprom = zext i8 %index.0.in to i64
  %arrayidx = getelementptr inbounds i8, i8* %p, i64 %idxprom
  %1 = load i8, i8* %arrayidx, align 1, !tbaa !8
  %cmp = icmp ult i8 %1, 100
  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !11
do.end:                                           ; preds = %do.body
  %conv2 = zext i8 %1 to i32
  ret i32 %conv2
}

Which in turn leads to an unnecessary 'and' at the start of the loop. The goal here is to turn that PHI-node to an i64 and push the zext to the loads.

Let me know what you think of this approach and whether there might be more appropriate ways to ensure we get a better codegen for issues like this.

Diff Detail

Event Timeline

avieira created this revision.Oct 6 2021, 9:06 AM
avieira requested review of this revision.Oct 6 2021, 9:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2021, 9:06 AM
dmgreen added a subscriber: dmgreen.Oct 6 2021, 9:49 AM

Do you have any tests?

llvm/lib/CodeGen/TypePromotion.cpp
906

TII -> TTI

SjoerdMeijer edited the summary of this revision. (Show Details)
SjoerdMeijer retitled this revision from [RFC] TypePromotion: Promote PHI-nodes to [TypePromotion] Promote PHI-nodes.Oct 6 2021, 10:45 AM
SjoerdMeijer edited the summary of this revision. (Show Details)

Let me know what you think of this approach and whether there might be more appropriate ways to ensure we get a better codegen for issues like this.

Sounds like a good and nice addition to this patch, and seems to be the right place for this kind of thing.

Some first drive-by comments: some comments would be nice, some testing and perf numbers too as this pass was really tricky to get right, and like Dave said, some tests.

I found that other than the motivation example I showed above coming up with C-level examples, especially for cases where I don't think transforming is a good idea is difficult. I'll see if I have better luck with IR tests.

llvm/lib/CodeGen/TypePromotion.cpp
906

Ah sorry didn't add that change to the patch I posted upstream :(

I found that other than the motivation example I showed above coming up with C-level examples, especially for cases where I don't think transforming is a good idea is difficult. I'll see if I have better luck with IR tests.

Adding your motivating example as an IR test is a good start! :) And cases that you think should not be transformed are equally valuable, so please add them too.

avieira abandoned this revision.Oct 22 2021, 2:27 AM

Abandoning this in favor of D112300, I think that it would be better to try to avoid the sinking of the Exts rather than trying to undo them this late.

avieira updated this revision to Diff 388144.Nov 18 2021, 3:45 AM

Coming back to this as I abandoned the approach in D112300, since that was deemed too early to be consulting target information.

Reworked the patch to be able to support more general cases. Only promote now if all incoming values of the PHI-node lead to free extensions.

Benchmarked SPEC 2017 intrate on aarch64 and saw no regressions and a small (0.7%) performance uplift on perlbench. Size differences were negligible.

avieira retitled this revision from [TypePromotion] Promote PHI-nodes to [TypePromotion] Promote PHI + [SZ]Ext.Nov 18 2021, 3:45 AM

I forgot to mention this address https://bugs.llvm.org/show_bug.cgi?id=51317 which is a suboptimal codegen issue encountered in snappy. Currently Snappy uses an inline asm workaround to make sure it doesn't end up with an extra 'and', this patch removes the need for that workaround.

Here's a round of review. I haven't got though everything in TryToPromotePHI yet, but it mostly seemed sensible from what I could tell. Reducing the indenting would be good, if we can.

llvm/lib/CodeGen/TypePromotion.cpp
925

It may be possible to check for legal types, as opposed to using this single RegisterBitWidth (and ignoring vector sizes).

937–943

20 is quite high, it is usually around 4. I think you can leave the amount off in most cases and it will choose a sensible default.

947

I will always be an Instruction.

948–949

It is preferred in llvm to remove indenting with early exits/continues. From what I can tell, this algorithm is in two stages now, and this code needn't change? The phis are optimized first, then sequences starting from icmps.

951

I think you may be able to turn any Transform = false; break; into return false;

960

It may be better to check for legal operations instead of the cost. Not sure though with the masked operations.

977

The user of an instruction will always be an instruction. llvm likes to use continue to reduce heavy indenting too, where it can.

1045

Is this reachable?

llvm/test/Transforms/TypePromotion/AArch64/dont-promote-phi-ext.ll
5 ↗(On Diff #388144)

I like to remove "; Function Attrs:..", "dso_local" and "local_unnamed_addr"

My knee jerk reaction is that this looks like a lot of code to add little functionality... the pass can already promote phis, so, IIUC, the only functional change is to enable the use of sign extends for their operands? If so, I don't really understand why this couldn't be added more gracefully into the existing search. My handwavey suggestion would be to search from SExt nodes (in the same manner as icmp, but with a search depth limit of 2), create a dedicated function to estimate the cost of promotion and modify the IRPromoter to enable sign extending some sources. Does any of that sound reasonable?

llvm/lib/CodeGen/TypePromotion.cpp
944

Can these new loops not just be fused into the original? When I originally wrote this, I treated phis the same as icmps, but admittedly didn't find any additional improvements.

Thanks for the suggestion @samparker . I hadn't look too much into it after I found that the ICmp promotion leaves the code in a somewhat 'dirty' state.
For instance if you take the motivating example from Snappy, the BB before the one with the PHI + ZEXT has an icmp that feeds into the relevant PHI node and it ends up yielding:

%tag.0.in10 = phi i32 [ %2, %for.body ], [ %0, %for.body.preheader ]
%1 = trunc i32 %tag.0.in10 to i8
%tag.0 = zext i8 %1 to i64

Which ofc gets cleaned up later, removing the trunc, but it would mean I'd need to account for stuff like that when pattern matching PHI + Ext. I found it was easier to do the transformation I wanted before doing the ICmp transformations. I could alternatively try to avoid generating the trunc, but I haven't looked at why it does this yet and I don't know whether it does so by design...
I realized I didn't actually include a testcase that showcases this... I'll make sure to add one.

@dmgreen Thanks for the comments, I'm looking into those.

I could alternatively try to avoid generating the trunc

Yes, there's a TODO about avoiding the insertion in TypePromotion::isSink

I could alternatively try to avoid generating the trunc

Yes, there's a TODO about avoiding the insertion in TypePromotion::isSink

I see. Am I misreading this or is the TODO saying we don't need isSink? That's a rather odd one ;) I could add some code to return isSink for the user of a PHI node if that PHI node only has a single use. That would work for the ZExt case, as that's recognized in isSink.

Sorry for missing your last comment @avieira , I've uploaded a suggested change in D115451, though I still think something is off with isSink.

adriantong1024 added inline comments.Feb 2 2022, 4:09 PM
llvm/lib/CodeGen/TypePromotion.cpp
140–143

Should we change this in a separate patch, its not the intend of this patch to change how these structs are declared.

951

I agree with @dmgreen here. Once Transform becomes false, it cant become true and we end up return from this function. So should return early.

955

can we do isa<LoadInst> here ? I also see you use getOpcode() in other places. Isn't the standard way isa cast ?

avieira updated this revision to Diff 450734.Aug 8 2022, 1:54 AM
avieira retitled this revision from [TypePromotion] Promote PHI + [SZ]Ext to [TypePromotion] Search from ZExt + PHI.

Hi all,

This is a complete revamp of my original patch, based on @samparker's patch in https://reviews.llvm.org/D118906/new/ with some changes. The most important ones are:

  • we start searching from a ZExt and use it's result type to determine the width to promote the sequence to,
  • we only promote phi node's inside a loop.

Benchmarked SPEC CPU 2017 Intrate on aarch64-none-linux. No significant changes in performance or size. Benchmarked Snappy and with this patch there is no performance regression when the inline assembly hack is removed that was put in place to prevent the generation of the superfluous 'and'.

Also ran llvm-test-suite and geomean stays the same, though I must say that there was a lot of variability from run to run even with the same binaries :(

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2022, 1:54 AM

I'd be tempted to split this into two and handle the NFC + InstRemove fix in one and the (zext (phi)) in another, just in case the functional change needs reverting. Completely up to you though.

And thanks for doing this!

llvm/lib/CodeGen/TypePromotion.cpp
124

Use a SmallPtrSetImpl, like we do for wrap.

179

We should also mark it as preserved: AU.addPreserved<LoopInfoWrapperPass>

954

We know it's a phi, so use static_cast.

962

nit: indentation looks off here.

avieira updated this revision to Diff 451126.Aug 9 2022, 5:53 AM
avieira marked 8 inline comments as done.

Addressed comments by @samparker. This patch depends on D131487, D131488 and D131489.

samparker accepted this revision.Aug 9 2022, 6:53 AM

LGTM, cheers

This revision is now accepted and ready to land.Aug 9 2022, 6:53 AM
adriantong1024 accepted this revision.Aug 9 2022, 10:41 AM
avieira updated this revision to Diff 451511.Aug 10 2022, 8:58 AM

Had not run this for ARM and there is a testcase there that also benefits from this change moving the zext up to the loads outside of the loop.

This revision was landed with ongoing or failed builds.Aug 11 2022, 1:50 AM
This revision was automatically updated to reflect the committed changes.
samparker added inline comments.Aug 11 2022, 2:01 AM
llvm/test/Transforms/TypePromotion/ARM/casts.ll
53

Update the comment please.

Fixed it in a NFC.

@avieira This patch looks to be breaking the AArch64/SVE buildbots. It's a compile time failure where getFixedSizeInBits() is being called on a scalable vector. Can you please take a look and perhaps revert the patch if it's not a quick fix?

See https://lab.llvm.org/buildbot/#/builders/197/builds/2467. Note 2466 also has failures but I've narrowed down the MultiSource/Applications/sgefa & SingleSource/UnitTests/Vectorizer ones to this patch.

@avieira This patch looks to be breaking the AArch64/SVE buildbots. It's a compile time failure where getFixedSizeInBits() is being called on a scalable vector. Can you please take a look and perhaps revert the patch if it's not a quick fix?

See https://lab.llvm.org/buildbot/#/builders/197/builds/2467. Note 2466 also has failures but I've narrowed down the MultiSource/Applications/sgefa & SingleSource/UnitTests/Vectorizer ones to this patch.

Thanks for reporting this Paul. This is because I don't check the type of the ZExt is an Integer, testing a patch for that now and creating a reduced IR testcase, will post it tomorrow.