Download Raw Diff

Details

Reviewers

dmgreen
eli.friedman
spatel
lebedev.ri
nikic

Commits

rG7bf168390fd0: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses

Summary

This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 31002
Build 31001: arc lint + arc unit

Event Timeline

dnsampaio created this revision.Apr 8 2019, 10:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2019, 10:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

InstCombine knows how to do this, but only when there is a single use of the sext. Multiple uses is much more difficult to do. This would probably fit best in AggressiveInstCombine.

Added transformation for comparing.

Moved to agressiveInstCombine.

dnsampaio updated this revision to Diff 194704.Apr 11 2019, 8:59 AM

dmgreen added a subscriber: dmgreen.Apr 11 2019, 10:12 AM

Moved conversion to the existing folding loop

Harbormaster completed remote builds in B31002: Diff 196670.Apr 25 2019, 10:12 AM

Rebase

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 7 2020, 5:25 AM

dnsampaio edited the summary of this revision. (Show Details)Jul 7 2020, 5:27 AM

dnsampaio added reviewers: dmgreen, eli.friedman.

lebedev.ri retitled this revision from SExt -> ZExt when no sign bits is used with multiple uses to [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses.Jul 7 2020, 5:29 AM

lebedev.ri added reviewers: spatel, lebedev.ri, nikic.

Why doesn't InstCombiner::SimplifyDemandedUseBits() handle this?
I would have expected this to already deal with it.

In D60413#2135831, @lebedev.ri wrote:

Why doesn't InstCombiner::SimplifyDemandedUseBits() handle this?
I would have expected this to already deal with it.

SimplifyDemandedUseBits() can only handle single-use cases, because demanded bits are only computed for a specific use. There is SimplifyMultipleUseDemandedBits(), but it can only simplify a specific use-site, not replace a whole instruction.

This patch has the right general idea, in that DemandedBits is the analysis that can determine this for the multi-use case. However, I'm not very comfortable with performing a full demanded bits calculation (on the whole function) in AggressiveInstCombine, just for this purpose. I think it would be better to repurpose BDCE (which already computes DemandedBits) to be a bit more general and also perform some demanded-bits based folds there, rather than only DCE. (This is similar to how SCCP has recently started replacing sext with zext if possible, even though that is not the primary purpose of that pass.)

nikic added inline comments.Jul 7 2020, 8:58 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
356 ↗	(On Diff #276016)	It's quite likely that this analysis may get invalidated by some of the performed transforms. AssumptionCache should also be made a pass dependency, not constructed inline. But as mentioned, I would recommend moving this into BDCE instead.

Moved to BDCE.

Harbormaster completed remote builds in B63346: Diff 276291.Jul 7 2020, 8:10 PM

Thanks, this looks about right to me, but please wait for @nikic.

This revision is now accepted and ready to land.Jul 8 2020, 1:20 AM

spatel added inline comments.Jul 8 2020, 5:02 AM

llvm/test/Transforms/AggressiveInstCombine/sext_multi_uses.ll
2 ↗	(On Diff #276291)	Run line should be more like this: opt -S -bdce < %s The test file should be moved to this folder: https://github.com/llvm/llvm-project/tree/master/llvm/test/Transforms/BDCE (please also update the title of this review / commit message)

Fixed test location and command

dnsampaio retitled this revision from [AggressiveInstCombine] SExt -> ZExt when no sign bits is used with multiple uses to [BDCE] SExt -> ZExt when no sign bits is used with multiple uses.Jul 8 2020, 7:13 AM

dnsampaio retitled this revision from [BDCE] SExt -> ZExt when no sign bits is used with multiple uses to [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses.

Harbormaster completed remote builds in B63414: Diff 276423.Jul 8 2020, 8:10 AM

nikic requested changes to this revision.Jul 8 2020, 9:25 AM

nikic added inline comments.

llvm/lib/Transforms/Scalar/BDCE.cpp
127 ↗	(On Diff #276423)	You probably need to `clearAssumptionsOfUsers()` here. Please check this test case: https://alive2.llvm.org/ce/z/caMis2

This revision now requires changes to proceed.Jul 8 2020, 9:25 AM

Clear users assumptions (and test it)

dnsampaio added inline comments.Jul 9 2020, 2:40 AM

llvm/lib/Transforms/Scalar/BDCE.cpp
127 ↗	(On Diff #276423)	Indeed, many thanks.

Harbormaster completed remote builds in B63557: Diff 276674.Jul 9 2020, 3:09 AM

In D60413#2139023, @dnsampaio wrote:

Fixed test location and command

The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs.

In D60413#2141645, @spatel wrote:

In D60413#2139023, @dnsampaio wrote:

Fixed test location and command

The file is not moved here in the review, so something may be out-of-sync. The best thing would be to commit that file first with the current CHECK lines, then update it after applying this code patch. That way, we will highlight the test diffs.

@spatel Ups, sorry about that, I got two working environments now wfh, one was not fixed. Will do as you recommend.

Re-fixed test file, now showing only differences

Harbormaster completed remote builds in B63590: Diff 276741.Jul 9 2020, 8:56 AM

llvm/lib/Transforms/Scalar/BDCE.cpp
125 ↗	(On Diff #276741)	Please pass `SE->getName()` here to preserve the instruction name.

This revision is now accepted and ready to land.Jul 9 2020, 9:46 AM

dnsampaio marked an inline comment as done.Jul 10 2020, 12:01 AM

Preserve name

Closed by commit rG7bf168390fd0: [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses (authored by Diogo Sampaio <diogo.sampaio@arm.com>). · Explain WhyJul 10 2020, 12:35 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster completed remote builds in B63697: Diff 276926.Jul 10 2020, 12:38 AM

Having some problems with this patch downstream:

Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.

My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

In D60413#2223866, @bjope wrote:

Having some problems with this patch downstream:

Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.

Isn't that an issue for any of the other places that do such a transform, among other things?
This very transform also exists in InstCombine and CVP.

My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

You can always just do sext and then mask unneeded high bits away.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

In D60413#2223875, @lebedev.ri wrote:

In D60413#2223866, @bjope wrote:

Having some problems with this patch downstream:

Patch does not update any dbg.value that might refer to the value that now is zext:ed instead of sext:ed. So debugging experience might become weird. It might be possible to rewrite debug uses, using some nifty debug expression to express a trunc+sext of the value. But in my case it was an <128 x i40> vector, and we got no dwarf expressions to express a vector sext afaik. Instead I guess any debug uses should be invalidated.

Isn't that an issue for any of the other places that do such a transform, among other things?
This very transform also exists in InstCombine and CVP.

That might be a bigger problem ofcourse. But if a transform doing RAUW isn't replacing with the same thing, then I guess it need to salvage/undef the debug uses and only replace the non-debug uses. Otherwise I think the debug values will be incorrect.

My target really prefers sext before zext when it comes to extending from <128 x i32> to <128 x i40>. Well, at the moment the zext isn't even legal. So now I get compilation failures. Pretty hard to redo the (global) demanded bits analysis during ISel to perform the reverse transform. Not sure really how such problems are dealt with in general. For my use case it had been much better to hoist the trunc that implies that the upper bits is irrelevant, and that way avoiding an <128 x i40> vector with irrelevant upper bits altogether.

You can always just do sext and then mask unneeded high bits away.

At least (1) is something that I think we need to do something about.
When it comes to (2) I'd welcome any suggestions on how I could avoid this transform for my target (or ideas about how to add a reverse transform closer to codegen).

Oh, I wish I had a vector-AND. I can probably implement the zext using SHL+LSHR, but that is two operations instead of one (SEXT+AND would also be two ops instead of one single SEXT). The target also support sextload but not zextload, so I might miss out on that as well. In general the target got better support for sext than zext (also in some scalar cases). Sext is often for free, while zext isn't. Maybe I need to teach CodegenPrepare to turn zext into sext if the upper bits are irrelevant. Undoing all these transforms changing zext into sext.

One minor problem with implementing a reverse transform for targets that prefer sext over zext, e.g. in CodeGenPrepare, is that it will be hard to restore the debug info (considering that we do the right thing here an invalidate metadata uses).

Maybe we want to have anyext already in IR (to allow targets to use either zext or sext)?

I'm out in vacations, just have my phone here. I'm surprised I forgot to
preserve de debug information, sorry about that.

Regarding use of anyext on IR, I don't know how well that would play with
all other optimizations.

The sext to zext for example already exists in instruction combine, but
just for those with a single use. Perhaps these transformations could just
check the backend if it prefers sext or zext and decide to act based on
that?

I've submitted a PR, https://bugs.llvm.org/show_bug.cgi?id=47296, regarding the debuginfo problem.

For know I simply hardcoded a condition to avoid the transform downstream for out target (when vectors are involved), as a workaround to the problem with now having any simple way to lower a vector zext.

Diff 196670

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"		#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
#include "AggressiveInstCombineInternal.h"		#include "AggressiveInstCombineInternal.h"
#include "llvm-c/Initialization.h"		#include "llvm-c/Initialization.h"
#include "llvm-c/Transforms/AggressiveInstCombine.h"		#include "llvm-c/Transforms/AggressiveInstCombine.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
		#include "llvm/Analysis/DemandedBits.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	static bool foldAnyOrAllBitsSet(Instruction &I) {
Value *And = Builder.CreateAnd(MOps.Root, Mask);		Value *And = Builder.CreateAnd(MOps.Root, Mask);
Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)		Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)
: Builder.CreateIsNotNull(And);		: Builder.CreateIsNotNull(And);
Value *Zext = Builder.CreateZExt(Cmp, I.getType());		Value *Zext = Builder.CreateZExt(Cmp, I.getType());
I.replaceAllUsesWith(Zext);		I.replaceAllUsesWith(Zext);
return true;		return true;
}		}

		// This pass uses demanded bits to identify SExtInst that
		// can be converted to ZExtInst, as no sign bits are used.

		static bool SExtToZExt(Instruction &I, DemandedBits &DB) {
		SExtInst *SE = dyn_cast<SExtInst>(&I);
		if (!SE)
		return false;

		APInt Demanded = DB.getDemandedBits(&I);
		const uint32_t SrcBitSize = SE->getSrcTy()->getScalarSizeInBits();
		const auto DstTy = SE->getDestTy();
		const uint32_t DestBitSize = DstTy->getScalarSizeInBits();
		if (Demanded.countLeadingZeros() >= (DestBitSize - SrcBitSize)) {
		IRBuilder<> Builder(&I);
		I.replaceAllUsesWith(Builder.CreateZExt(SE->getOperand(0), DstTy));
		return true;
		}
		return false;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
		AssumptionCache AC(F);
		DemandedBits DB(F, AC, DT);

for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedRotateToFunnelShift(I);		MadeChange \|= foldGuardedRotateToFunnelShift(I);
		MadeChange \|= SExtToZExt(I, DB);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/sext_multi_uses.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -o - -aggressive-instcombine -dce -S %s \| FileCheck %s

				define i32 @ZEXT_0(i16 %a) {
				; CHECK-LABEL: @ZEXT_0(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = zext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[TMP0]], 65280
				; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[TMP0]], 8
				; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]]
				; CHECK-NEXT: ret i32 [[OR]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 65280
				%lsr = lshr i32 %ext, 8
				%and2 = and i32 %lsr, 255
				%or = or i32 %and, %and2
				ret i32 %or
				}

				define i32 @ZEXT_1(i16 %a) {
				; CHECK-LABEL: @ZEXT_1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = zext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[TMP0]], 8
				; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255
				; CHECK-NEXT: [[AND:%.*]] = or i32 [[TMP0]], -65536
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]]
				; CHECK-NEXT: ret i32 [[OR]]
				;
				entry:
				%ext = sext i16 %a to i32
				%lsr = lshr i32 %ext, 8
				%and2 = and i32 %lsr, 255
				%and = or i32 %ext, 4294901760
				%or = or i32 %and, %and2
				ret i32 %or
				}

				define i16 @NOT_ZEXT_0(i16 %a) {
				; CHECK-LABEL: @NOT_ZEXT_0(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280
				; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 9
				; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]]
				; CHECK-NEXT: [[RET:%.*]] = trunc i32 [[OR]] to i16
				; CHECK-NEXT: ret i16 [[RET]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 65280
				%lsr = lshr i32 %ext, 9
				%and2 = and i32 %lsr, 255
				%or = or i32 %and, %and2
				%ret = trunc i32 %or to i16
				ret i16 %ret
				}

				define i32 @NOT_ZEXT_1(i16 %a) {
				; CHECK-LABEL: @NOT_ZEXT_1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 85280
				; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8
				; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]]
				; CHECK-NEXT: ret i32 [[OR]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 85280
				%lsr = lshr i32 %ext, 8
				%and2 = and i32 %lsr, 255
				%or = or i32 %and, %and2
				ret i32 %or
				}

				define i32 @NOT_ZEXT_2(i16 %a) {
				; CHECK-LABEL: @NOT_ZEXT_2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[LSR:%.*]] = lshr i32 [[EXT]], 8
				; CHECK-NEXT: [[AND2:%.*]] = and i32 [[LSR]], 255
				; CHECK-NEXT: [[AND:%.*]] = xor i32 [[EXT]], -65536
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[AND]], [[AND2]]
				; CHECK-NEXT: ret i32 [[OR]]
				;
				entry:
				%ext = sext i16 %a to i32
				%lsr = lshr i32 %ext, 8
				%and2 = and i32 %lsr, 255
				%and = xor i32 %ext, 4294901760
				%or = or i32 %and, %and2
				ret i32 %or
				}

This is an archive of the discontinued LLVM Phabricator instance.

[BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196670

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

test/Transforms/AggressiveInstCombine/sext_multi_uses.ll

This is an archive of the discontinued LLVM Phabricator instance.

[BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple usesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196670

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

test/Transforms/AggressiveInstCombine/sext_multi_uses.ll

[BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses
ClosedPublic