This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 31002 Build 31001: arc lint + arc unit
Event Timeline
InstCombine knows how to do this, but only when there is a single use of the sext. Multiple uses is much more difficult to do. This would probably fit best in AggressiveInstCombine.
Why doesn't InstCombiner::SimplifyDemandedUseBits() handle this?
I would have expected this to already deal with it.
SimplifyDemandedUseBits() can only handle single-use cases, because demanded bits are only computed for a specific use. There is SimplifyMultipleUseDemandedBits(), but it can only simplify a specific use-site, not replace a whole instruction.
This patch has the right general idea, in that DemandedBits is the analysis that can determine this for the multi-use case. However, I'm not very comfortable with performing a full demanded bits calculation (on the whole function) in AggressiveInstCombine, just for this purpose. I think it would be better to repurpose BDCE (which already computes DemandedBits) to be a bit more general and also perform some demanded-bits based folds there, rather than only DCE. (This is similar to how SCCP has recently started replacing sext with zext if possible, even though that is not the primary purpose of that pass.)
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
356 ↗ | (On Diff #276016) | It's quite likely that this analysis may get invalidated by some of the performed transforms. AssumptionCache should also be made a pass dependency, not constructed inline. But as mentioned, I would recommend moving this into BDCE instead. |
llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll | ||
---|---|---|
2 ↗ | (On Diff #276291) | Run line should be more like this: The test file should be moved to this folder: (please also update the title of this review / commit message) |
llvm/lib/Transforms/Scalar/BDCE.cpp | ||
---|---|---|
127 ↗ | (On Diff #276423) | You probably need to clearAssumptionsOfUsers() here. Please check this test case: https://alive2.llvm.org/ce/z/caMis2 |
llvm/lib/Transforms/Scalar/BDCE.cpp | ||
---|---|---|
127 ↗ | (On Diff #276423) | Indeed, many thanks. |
The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs.
@spatel Ups, sorry about that, I got two working environments now wfh, one was not fixed. Will do as you recommend.
LG
llvm/lib/Transforms/Scalar/BDCE.cpp | ||
---|---|---|
125 ↗ | (On Diff #276741) | Please pass SE->getName() here to preserve the instruction name. |
Having some problems with this patch downstream:
- Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.
- My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.
At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).
Isn't that an issue for any of the other places that do such a transform, among other things?
This very transform also exists in InstCombine and CVP.
- My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.
You can always just do sext and then mask unneeded high bits away.
At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).
That might be a bigger problem ofcourse. But if a transform doing RAUW isn't replacing with the same thing, then I guess it need to salvage/undef the debug uses and only replace the non-debug uses. Otherwise I think the debug values will be incorrect.
- My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.
You can always just do sext and then mask unneeded high bits away.
At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).
Oh, I wish I had a vector-AND. I can probably implement the zext using SHL+LSHR, but that is two operations instead of one (SEXT+AND would also be two ops instead of one single SEXT). The target also support sextload but not zextload, so I might miss out on that as well. In general the target got better support for sext than zext (also in some scalar cases). Sext is often for free, while zext isn't. Maybe I need to teach CodegenPrepare to turn zext into sext if the upper bits are irrelevant. Undoing all these transforms changing zext into sext.
One minor problem with implementing a reverse transform for targets that prefer sext over zext, e.g. in CodeGenPrepare, is that it will be hard to restore the debug info (considering that we do the right thing here an invalidate metadata uses).
Maybe we want to have anyext already in IR (to allow targets to use either zext or sext)?
I'm out in vacations, just have my phone here. I'm surprised I forgot to
preserve de debug information, sorry about that.
Regarding use of anyext on IR, I don't know how well that would play with
all other optimizations.
The sext to zext for example already exists in instruction combine, but
just for those with a single use. Perhaps these transformations could just
check the backend if it prefers sext or zext and decide to act based on
that?
I've submitted a PR, https://bugs.llvm.org/show_bug.cgi?id=47296, regarding the debuginfo problem.
For know I simply hardcoded a condition to avoid the transform downstream for out target (when vectors are involved), as a workaround to the problem with now having any simple way to lower a vector zext.