This is an archive of the discontinued LLVM Phabricator instance.

[BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses
ClosedPublic

Authored by dnsampaio on Apr 8 2019, 10:23 AM.

Details

Summary

This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.

Diff Detail

Event Timeline

dnsampaio created this revision.Apr 8 2019, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2019, 10:23 AM

InstCombine knows how to do this, but only when there is a single use of the sext. Multiple uses is much more difficult to do. This would probably fit best in AggressiveInstCombine.

dnsampaio updated this revision to Diff 194166.Apr 8 2019, 10:29 AM

Added transformation for comparing.

dnsampaio updated this revision to Diff 194703.Apr 11 2019, 8:58 AM
dnsampaio retitled this revision from SExt -> Zext when no sign bits is used to SExt -> ZExt when no sign bits is used with multiple uses.
dnsampaio edited the summary of this revision. (Show Details)

Moved to agressiveInstCombine.

dnsampaio updated this revision to Diff 194704.Apr 11 2019, 8:59 AM

Moved conversion to the existing folding loop

dnsampaio edited the summary of this revision. (Show Details)Jul 7 2020, 5:27 AM
dnsampaio added reviewers: dmgreen, eli.friedman.
lebedev.ri retitled this revision from SExt -> ZExt when no sign bits is used with multiple uses to [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses.Jul 7 2020, 5:29 AM
lebedev.ri added reviewers: spatel, lebedev.ri, nikic.

Why doesn't InstCombiner::SimplifyDemandedUseBits() handle this?
I would have expected this to already deal with it.

nikic added a comment.Jul 7 2020, 8:56 AM

Why doesn't InstCombiner::SimplifyDemandedUseBits() handle this?
I would have expected this to already deal with it.

SimplifyDemandedUseBits() can only handle single-use cases, because demanded bits are only computed for a specific use. There is SimplifyMultipleUseDemandedBits(), but it can only simplify a specific use-site, not replace a whole instruction.

This patch has the right general idea, in that DemandedBits is the analysis that can determine this for the multi-use case. However, I'm not very comfortable with performing a full demanded bits calculation (on the whole function) in AggressiveInstCombine, just for this purpose. I think it would be better to repurpose BDCE (which already computes DemandedBits) to be a bit more general and also perform some demanded-bits based folds there, rather than only DCE. (This is similar to how SCCP has recently started replacing sext with zext if possible, even though that is not the primary purpose of that pass.)

nikic added inline comments.Jul 7 2020, 8:58 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
356 ↗(On Diff #276016)

It's quite likely that this analysis may get invalidated by some of the performed transforms. AssumptionCache should also be made a pass dependency, not constructed inline.

But as mentioned, I would recommend moving this into BDCE instead.

dnsampaio updated this revision to Diff 276291.Jul 7 2020, 6:00 PM

Moved to BDCE.

lebedev.ri accepted this revision.Jul 8 2020, 1:20 AM

Thanks, this looks about right to me, but please wait for @nikic.

This revision is now accepted and ready to land.Jul 8 2020, 1:20 AM
spatel added inline comments.Jul 8 2020, 5:02 AM
llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll
3

Run line should be more like this:
opt -S -bdce < %s

The test file should be moved to this folder:
https://github.com/llvm/llvm-project/tree/master/llvm/test/Transforms/BDCE

(please also update the title of this review / commit message)

dnsampaio updated this revision to Diff 276423.Jul 8 2020, 7:13 AM

Fixed test location and command

dnsampaio retitled this revision from [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses to [BDCE] SExt -> ZExt when no sign bits is used with multiple uses.Jul 8 2020, 7:13 AM
dnsampaio retitled this revision from [BDCE] SExt -> ZExt when no sign bits is used with multiple uses to [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses.
nikic requested changes to this revision.Jul 8 2020, 9:25 AM
nikic added inline comments.
llvm/lib/Transforms/Scalar/BDCE.cpp
127

You probably need to clearAssumptionsOfUsers() here. Please check this test case: https://alive2.llvm.org/ce/z/caMis2

This revision now requires changes to proceed.Jul 8 2020, 9:25 AM
dnsampaio updated this revision to Diff 276674.Jul 9 2020, 2:34 AM
dnsampaio marked 3 inline comments as done.

Clear users assumptions (and test it)

dnsampaio added inline comments.Jul 9 2020, 2:40 AM
llvm/lib/Transforms/Scalar/BDCE.cpp
127

Indeed, many thanks.

spatel added a comment.Jul 9 2020, 6:44 AM

Fixed test location and command

The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs.

Fixed test location and command

The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs.

@spatel Ups, sorry about that, I got two working environments now wfh, one was not fixed. Will do as you recommend.

dnsampaio updated this revision to Diff 276741.Jul 9 2020, 8:08 AM

Re-fixed test file, now showing only differences

nikic accepted this revision.Jul 9 2020, 9:46 AM

LG

llvm/lib/Transforms/Scalar/BDCE.cpp
125

Please pass SE->getName() here to preserve the instruction name.

This revision is now accepted and ready to land.Jul 9 2020, 9:46 AM
dnsampaio marked an inline comment as done.Jul 10 2020, 12:01 AM

Preserve name

This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Aug 18 2020, 8:42 AM

Having some problems with this patch downstream:

  1. Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.
  1. My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

lebedev.ri added a subscriber: vsk.Aug 18 2020, 8:50 AM

Having some problems with this patch downstream:

  1. Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.

Isn't that an issue for any of the other places that do such a transform, among other things?
This very transform also exists in InstCombine and CVP.

  1. My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

You can always just do sext and then mask unneeded high bits away.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

bjope added a comment.Aug 18 2020, 9:12 AM

Having some problems with this patch downstream:

  1. Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.

Isn't that an issue for any of the other places that do such a transform, among other things?
This very transform also exists in InstCombine and CVP.

That might be a bigger problem ofcourse. But if a transform doing RAUW isn't replacing with the same thing, then I guess it need to salvage/undef the debug uses and only replace the non-debug uses. Otherwise I think the debug values will be incorrect.

  1. My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

You can always just do sext and then mask unneeded high bits away.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

Oh, I wish I had a vector-AND. I can probably implement the zext using SHL+LSHR, but that is two operations instead of one (SEXT+AND would also be two ops instead of one single SEXT). The target also support sextload but not zextload, so I might miss out on that as well. In general the target got better support for sext than zext (also in some scalar cases). Sext is often for free, while zext isn't. Maybe I need to teach CodegenPrepare to turn zext into sext if the upper bits are irrelevant. Undoing all these transforms changing zext into sext.

bjope added a comment.Aug 19 2020, 1:49 AM

One minor problem with implementing a reverse transform for targets that prefer sext over zext, e.g. in CodeGenPrepare, is that it will be hard to restore the debug info (considering that we do the right thing here an invalidate metadata uses).

Maybe we want to have anyext already in IR (to allow targets to use either zext or sext)?

I'm out in vacations, just have my phone here. I'm surprised I forgot to
preserve de debug information, sorry about that.

Regarding use of anyext on IR, I don't know how well that would play with
all other optimizations.

The sext to zext for example already exists in instruction combine, but
just for those with a single use. Perhaps these transformations could just
check the backend if it prefers sext or zext and decide to act based on
that?

bjope added a comment.Aug 24 2020, 6:08 AM

I've submitted a PR, https://bugs.llvm.org/show_bug.cgi?id=47296, regarding the debuginfo problem.

For know I simply hardcoded a condition to avoid the transform downstream for out target (when vectors are involved), as a workaround to the problem with now having any simple way to lower a vector zext.