This is an archive of the discontinued LLVM Phabricator instance.

Fix for PR20059 (instcombine reorders shufflevector after instruction that may trap)
ClosedPublic

Authored by spatel on Jul 8 2014, 12:25 PM.

Download Raw Diff

Details

Reviewers

aschwaighofer
hfinkel

Commits

rG58814445d4a4: Fix for PR20059 (instcombine reorders shufflevector after instruction that may…
rL212629: Fix for PR20059 (instcombine reorders shufflevector after instruction that may…

Summary

In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem).

This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed.

I'm not sure if this 'reordering of shuffles' optimization should also be disallowed for all vector FP ops (any of those can cause an exception?), but since there's an existing test case in test/Transforms/InstCombine/vec_shuffle.ll that will have to be removed if we want to change that behavior, I'll post it as a separate patch.

Diff Detail

Event Timeline

spatel updated this revision to Diff 11167.Jul 8 2014, 12:25 PM

spatel retitled this revision from to Fix for PR20059 (instcombine reorders shufflevector after instruction that may trap).

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added a subscriber: Unknown Object (MLST).

Forgot to add reviewers when creating patch.

I believe the optimization is safe for floating values that cannot trap.

undef = fadd undef, %x

but it does not trap. So the optimization of reordering the shuffle should be safe.

However:

fdiv %x, undef

can trap because undef can be assumed to be any value including zero. And fdiv traps on zero. So the optimization is not safe.

What does isSafeToSpeculativelyExecute() return for fdiv? It seems to be an omission (bug) that it returns true? (Note, I have not verified this claim but the switch in this function does not contain fdiv and the default case returns true)

At least LICM seems like it could go wrong with conditionally executed fdiv (with loop invariant operands):

bool LICM::isSafeToExecuteUnconditionally(Instruction &Inst) {

// If it is not a trapping instruction, it is always safe to hoist.
if (isSafeToSpeculativelyExecute(&Inst))
  return true;

Actually, now I am not sure myself - does an floating point div trap on zero? :D

We probably get NaN or some infinity - forget what I said.

Your patch LGTM.

Closed by commit rL212629 (authored by @spatel).

Thanks, Arnold.

I've committed the fix for int ops, but I'm still doubtful that this is safe for FP (any FP)...because any FP op can cause an exception, right? We don't normally have exceptions enabled for FP div-by-zero, denorms, underflow, etc (especially for vector ops), but a system or user can certainly enable those manually. If that is true and one of the unknown operands in the vector is bad in some way, then we'll generate an exception that wouldn't have occurred without this optimization. I suppose I can generate a C test case to prove this by twiddling the FP exception bits.

"but a system or user can certainly enable those manually" -- No, Clang/LLVM specifically do not support this! In the context of C, this means that we don't support "#pragma STDC FENV_ACCESS on". It is not well defined for the user to fool with the fp environment without informing the compiler (in C, that pragma is the standard mechanism), and if the user tries, they'll receive an error.

spatel mentioned this in D42485: InstSimplify: If divisor element is undef simplify to undef.Feb 13 2018, 8:05 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

6 lines

test/

Transforms/

InstCombine/

pr20059.ll

16 lines

Diff 11167

lib/Transforms/InstCombine/InstructionCombining.cpp

Context not available.
	#include "llvm/Analysis/ConstantFolding.h"	#include "llvm/Analysis/ConstantFolding.h"
	#include "llvm/Analysis/InstructionSimplify.h"	#include "llvm/Analysis/InstructionSimplify.h"
	#include "llvm/Analysis/MemoryBuiltins.h"	#include "llvm/Analysis/MemoryBuiltins.h"
		#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/CFG.h"	#include "llvm/IR/CFG.h"
	#include "llvm/IR/DataLayout.h"	#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/GetElementPtrTypeIterator.h"	#include "llvm/IR/GetElementPtrTypeIterator.h"
Context not available.
	Value *InstCombiner::SimplifyVectorOp(BinaryOperator &Inst) {	Value *InstCombiner::SimplifyVectorOp(BinaryOperator &Inst) {
	if (!Inst.getType()->isVectorTy()) return nullptr;	if (!Inst.getType()->isVectorTy()) return nullptr;

		// It may not be safe to reorder shuffles and things like div, urem, etc.
		// because we may trap when executing those ops on unknown vector elements.
		// See PR20059.
		if (!isSafeToSpeculativelyExecute(&Inst)) return nullptr;

	unsigned VWidth = cast<VectorType>(Inst.getType())->getNumElements();	unsigned VWidth = cast<VectorType>(Inst.getType())->getNumElements();
	Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);	Value LHS = Inst.getOperand(0), RHS = Inst.getOperand(1);
	assert(cast<VectorType>(LHS->getType())->getNumElements() == VWidth);	assert(cast<VectorType>(LHS->getType())->getNumElements() == VWidth);
Context not available.

test/Transforms/InstCombine/pr20059.ll

				; RUN: opt -S -instcombine < %s \| FileCheck %s

				; In PR20059 ( http://llvm.org/pr20059 ), shufflevector operations are reordered/removed
				; for an srem operation. This is not a valid optimization because it may cause a trap
				; on div-by-zero.

				; CHECK-LABEL: @do_not_reorder
				; CHECK: %splat1 = shufflevector <4 x i32> %p1, <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: %splat2 = shufflevector <4 x i32> %p2, <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: %retval = srem <4 x i32> %splat1, %splat2
				define <4 x i32> @do_not_reorder(<4 x i32> %p1, <4 x i32> %p2) {
				%splat1 = shufflevector <4 x i32> %p1, <4 x i32> undef, <4 x i32> zeroinitializer
				%splat2 = shufflevector <4 x i32> %p2, <4 x i32> undef, <4 x i32> zeroinitializer
				%retval = srem <4 x i32> %splat1, %splat2
				ret <4 x i32> %retval
				}