This is an archive of the discontinued LLVM Phabricator instance.

[X86] Speculatively load operands of select instruction
Needs RevisionPublic

Authored by lsaba on Aug 30 2017, 1:23 AM.

Download Raw Diff

Details

Reviewers

zvi
craig.topper
aaboud
spatel
RKSimon
hfinkel

Summary

For a select instruction where the operands are address calculations of two independent loads, the pass tries to speculate the loads and feed them into the select instruction, this allows early parallel execution of the loads and possibly memory folding into the CMOV instructions later on.
The pass currently only handles cases where the loads are elements of the same struct.

Diff Detail

Event Timeline

lsaba created this revision.Aug 30 2017, 1:23 AM

Herald added a subscriber: mgorny. · View Herald TranscriptAug 30 2017, 1:23 AM

zvi added reviewers: spatel, RKSimon, hfinkel.Aug 30 2017, 6:34 AM

igorb added a subscriber: igorb.Aug 30 2017, 6:45 AM

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

aaboud added inline comments.Aug 30 2017, 3:30 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16266	Do you need this dump()? It will break compilation of release mode. Either remove it or use it under DEBUG() macro.
lib/Target/X86/X86SpeculateSelectLoad.cpp
26	Maybe, I missed that, but why do you need this include for?
99	Remove this empty line.
121	NewSI can be of type Value, and would not need the cast to SelectInst. Notice that the only use for NewSI, is "replaceAllUsersWith()" which takes Value as argument.
test/CodeGen/X86/speculate-select-load.ll
29	Can you add a comment explaining this case (and the one above). Something like: Selecting between address of/pointer to two members of same structure, with offset bigger than cache line (64 bytes) between them. Thus, do not load speculatively.

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

In D37289#857435, @lsaba wrote:

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

In D37289#857637, @spatel wrote:

In D37289#857435, @lsaba wrote:

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

int c;
if (x) 
  c = s->a;
else
 c = s->b;

}

to:
foo (S* s, int x) {

int c1= s->a;
int c2 = a->b;
c = x? c1 : c2;

}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

In D37289#857652, @lsaba wrote:
In D37289#857637, @spatel wrote:

In D37289#857435, @lsaba wrote:

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {
int a;
int b;
}

from:

foo (S* s, int x) {
int c;
if (x) 
  c = s->a;
else
 c = s->b;
}

to:
foo (S* s, int x) {
int c1= s->a;
int c2 = a->b;
c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:

#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}

When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

addressing aaboud's comments + limiting the opt. to non aggregate gep accesses

lsaba marked 5 inline comments as done.Aug 31 2017, 7:12 AM

In D37289#857668, @spatel wrote:
In D37289#857652, @lsaba wrote:
In D37289#857637, @spatel wrote:

In D37289#857435, @lsaba wrote:

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {
int a;
int b;
}

from:

foo (S* s, int x) {
int c;
if (x) 
  c = s->a;
else
 c = s->b;
}

to:
foo (S* s, int x) {
int c1= s->a;
int c2 = a->b;
c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)
Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:
#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}
When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

That's correct (unfortunately). We have a utility function to help with this (isDereferenceablePointer and isDereferenceableAndAlignedPointer) and also isSafeToLoadUnconditionally (which is more expensive than the last two, but more powerful). There may be more that can be done in certain cases, but you also need to be careful about race conditions (it could be the case that some other thread is going to modify one of the values, and by speculating the load, you're moving it to before the synchronization point).

In D37289#857668, @spatel wrote:
In D37289#857652, @lsaba wrote:
In D37289#857637, @spatel wrote:

In D37289#857435, @lsaba wrote:

In D37289#857007, @spatel wrote:

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {
int a;
int b;
}

from:

foo (S* s, int x) {
int c;
if (x) 
  c = s->a;
else
 c = s->b;
}

to:
foo (S* s, int x) {
int c1= s->a;
int c2 = a->b;
c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)
Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:
#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}
When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

you're right, this is problematic. Thanks, I will try to limit this to cases where it is safe to do this

If the common pattern really is just a select of pointers, then there might be an existing IR transform pass where this can be added (with the right analysis or metadata to ensure safety)? If not, a more general hoisting solution like D37121 could solve this?

This revision now requires changes to proceed.Aug 31 2017, 7:41 AM

DavidKreitzer added a subscriber: DavidKreitzer.Sep 13 2017, 1:51 PM

andrew.w.kaylor added a subscriber: andrew.w.kaylor.Sep 13 2017, 1:52 PM

After some internal discussions, we suspect that Sanjay's counter-example may have undefined behavior according to the C standard.

typedef struct S {
  char padding[4088];
  struct S *p1;
  struct S *p2;
} S;

S* f1(S *s, int x)
{
  S *r;
  if (x)
    r = s->p1;
  else
    r = s->p2;
  return r;
}

Sanjay's example was to pass the address of an incomplete object of type S as the first argument to f1. f1 will always access that object through an lvalue of type S, which seems like a violation of the type based aliasing rules in section 6.5 paragraph 7 of the standard:

An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,

I am deliberately using fuzzy wording like "may have" and "seems like". I would be happy to have the C language experts in the community weigh in on whether this is valid optimization. But both gcc and icc speculate the loads in f1. This is the code generated by gcc 7.2 at -O2:

f1:
        movq    4096(%rdi), %rax
        testl   %esi, %esi
        cmovne  4088(%rdi), %rax
        ret

icc is admittedly more aggressive with this optimization. gcc will only speculate the loads when they are accessing adjacent fields of the same struct. But both compilers are using the same logic that accessing one field of a structure makes it safe to speculatively access another field of the same structure.

Note that I am not trying to argue that this patch is correct. It isn't, because it doesn't guard against the case where the same base pointer is used to access structs of two different types, e.g.

typedef struct S {
  char padding[4088];
  struct S *p1;
  struct S *p2;
} S;

// T is a partial overlay of S
typedef struct T {
  char padding[4088];
  struct S *p1;
} T;

S* f1(S *s, int x)
{
  S *r;
  if (x)
    // p1 is accessed through an lvalue of type T. So it isn't safe to speculate the load of p2.
    r = ((T*)s)->p1;
  else
    r = s->p2;
  return r;
}

I'd like to pose two questions.

Is the gcc & icc logic correct? It is valid to speculatively access a structure field in the presence of an access to a different field of the same structure?
If yes, how can we perform this optimization in LLVM? Should the TBAA information be used for this purpose, e.g. to enhance the isSafeToLoadUnconditionally utility that Hal mentioned?

In D37289#891041, @DavidKreitzer wrote:

I am deliberately using fuzzy wording like "may have" and "seems like". I would be happy to have the C language experts in the community weigh in on whether this is valid optimization.

I'd be really happy to be wrong on this one! :)
Would you reference this patch/example and ask the experts on cfe-dev?

alexey.zhikhar added a subscriber: alexey.zhikhar.Mar 22 2018, 10:36 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetLowering.h

6 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

20 lines

Target/

X86/

CMakeLists.txt

1 line

X86.h

4 lines

X86ISelLowering.h

4 lines

X86SpeculateSelectLoad.cpp

148 lines

X86TargetMachine.cpp

6 lines

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

2 lines

test/

CodeGen/

X86/

speculate-select-load.ll

70 lines

Diff 113398

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,800 Lines • ▼ Show 20 Lines	public:

/// This method query the target whether it is beneficial for dag combiner to		/// This method query the target whether it is beneficial for dag combiner to
/// promote the specified node. If true, it should return the desired		/// promote the specified node. If true, it should return the desired
/// promotion type by reference.		/// promotion type by reference.
virtual bool IsDesirableToPromoteOp(SDValue /Op/, EVT &/PVT/) const {		virtual bool IsDesirableToPromoteOp(SDValue /Op/, EVT &/PVT/) const {
return false;		return false;
}		}

		/// Return true if it is desirable to speculatively load the operands
		/// of a select instruction for the target.
		virtual bool isDesirableToSpeculateSelectLoad() const {
		return false;
		}

/// Return true if the target supports swifterror attribute. It optimizes		/// Return true if the target supports swifterror attribute. It optimizes
/// loads and stores to reading and writing a specific register.		/// loads and stores to reading and writing a specific register.
virtual bool supportSwiftError() const {		virtual bool supportSwiftError() const {
return false;		return false;
}		}

/// Return true if the target supports that a subset of CSRs for the given		/// Return true if the target supports that a subset of CSRs for the given
/// machine function is handled explicitly via copies.		/// machine function is handled explicitly via copies.
▲ Show 20 Lines • Show All 691 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,242 Lines • ▼ Show 20 Lines	if (LHS.getOperand(0) != RHS.getOperand(0) \|\|
// src value info, don't do the transformation if the memory		// src value info, don't do the transformation if the memory
// locations are not in the default address space.		// locations are not in the default address space.
LLD->getPointerInfo().getAddrSpace() != 0 \|\|		LLD->getPointerInfo().getAddrSpace() != 0 \|\|
RLD->getPointerInfo().getAddrSpace() != 0 \|\|		RLD->getPointerInfo().getAddrSpace() != 0 \|\|
!TLI.isOperationLegalOrCustom(TheSelect->getOpcode(),		!TLI.isOperationLegalOrCustom(TheSelect->getOpcode(),
LLD->getBasePtr().getValueType()))		LLD->getBasePtr().getValueType()))
return false;		return false;

		// Avoid combining the load if both loads are GEPs into elements of the same
		// struct. TODO: handle cases where the GEP is bitcasted to another type.
		if (TLI.isDesirableToSpeculateSelectLoad()) {
		if (LLD->getMemOperand()->getValue() &&
		RLD->getMemOperand()->getValue()) {
		const GetElementPtrInst *GEPTrue =
		dyn_cast<GetElementPtrInst>(LLD->getMemOperand()->getValue());
		const GetElementPtrInst *GEPFalse =
		dyn_cast<GetElementPtrInst>(RLD->getMemOperand()->getValue());
		if (GEPTrue && GEPFalse) {
		if (GEPTrue->getSourceElementType()->isStructTy() &&
		GEPFalse->getSourceElementType()->isStructTy() &&
		GEPTrue->getPointerOperand() == GEPFalse->getPointerOperand() &&
		GEPTrue->hasAllConstantIndices() &&
		GEPFalse->hasAllConstantIndices())
		return false;
		aaboudUnsubmitted Done Reply Inline Actions Do you need this dump()? It will break compilation of release mode. Either remove it or use it under DEBUG() macro. aaboud: Do you need this dump()? It will break compilation of release mode. Either remove it or use it…
		}
		}
		}

// Check that the select condition doesn't reach either load. If so,		// Check that the select condition doesn't reach either load. If so,
// folding this will induce a cycle into the DAG. If not, this is safe to		// folding this will induce a cycle into the DAG. If not, this is safe to
// xform, so create a select of the addresses.		// xform, so create a select of the addresses.
SDValue Addr;		SDValue Addr;
if (TheSelect->getOpcode() == ISD::SELECT) {		if (TheSelect->getOpcode() == ISD::SELECT) {
SDNode *CondNode = TheSelect->getOperand(0).getNode();		SDNode *CondNode = TheSelect->getOperand(0).getNode();
if ((LLD->hasAnyUseOfValue(1) && LLD->isPredecessorOf(CondNode)) \|\|		if ((LLD->hasAnyUseOfValue(1) && LLD->isPredecessorOf(CondNode)) \|\|
(RLD->hasAnyUseOfValue(1) && RLD->isPredecessorOf(CondNode)))		(RLD->hasAnyUseOfValue(1) && RLD->isPredecessorOf(CondNode)))
▲ Show 20 Lines • Show All 1,049 Lines • Show Last 20 Lines

lib/Target/X86/CMakeLists.txt

Show All 39 Lines	set(sources
X86MachineFunctionInfo.cpp		X86MachineFunctionInfo.cpp
X86MacroFusion.cpp		X86MacroFusion.cpp
X86OptimizeLEAs.cpp		X86OptimizeLEAs.cpp
X86PadShortFunction.cpp		X86PadShortFunction.cpp
X86RegisterBankInfo.cpp		X86RegisterBankInfo.cpp
X86RegisterInfo.cpp		X86RegisterInfo.cpp
X86SelectionDAGInfo.cpp		X86SelectionDAGInfo.cpp
X86ShuffleDecodeConstantPool.cpp		X86ShuffleDecodeConstantPool.cpp
		X86SpeculateSelectLoad.cpp
X86Subtarget.cpp		X86Subtarget.cpp
X86TargetMachine.cpp		X86TargetMachine.cpp
X86TargetObjectFile.cpp		X86TargetObjectFile.cpp
X86TargetTransformInfo.cpp		X86TargetTransformInfo.cpp
X86VZeroUpper.cpp		X86VZeroUpper.cpp
X86WinAllocaExpander.cpp		X86WinAllocaExpander.cpp
X86WinEHState.cpp		X86WinEHState.cpp
X86CallingConv.cpp		X86CallingConv.cpp
Show All 10 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	/// done by replacing esp-relative movs with pushes.			/// done by replacing esp-relative movs with pushes.
	FunctionPass *createX86CallFrameOptimization();			FunctionPass *createX86CallFrameOptimization();

	/// Return an IR pass that inserts EH registration stack objects and explicit			/// Return an IR pass that inserts EH registration stack objects and explicit
	/// EH state updates. This pass must run after EH preparation, which does			/// EH state updates. This pass must run after EH preparation, which does
	/// Windows-specific but architecture-neutral preparation.			/// Windows-specific but architecture-neutral preparation.
	FunctionPass *createX86WinEHStatePass();			FunctionPass *createX86WinEHStatePass();

				/// Return an IR pass that tries to speculatively load the operands of
				/// a select instruction when profitable.
				FunctionPass *createX86SpeculateSelectLoadPass();

	/// Return a Machine IR pass that expands X86-specific pseudo			/// Return a Machine IR pass that expands X86-specific pseudo
	/// instructions into a sequence of actual instructions. This pass			/// instructions into a sequence of actual instructions. This pass
	/// must run after prologue/epilogue insertion and before lowering			/// must run after prologue/epilogue insertion and before lowering
	/// the MachineInstr to MC.			/// the MachineInstr to MC.
	FunctionPass *createX86ExpandPseudoPass();			FunctionPass *createX86ExpandPseudoPass();

	/// This pass converts X86 cmov instructions into branch when profitable.			/// This pass converts X86 cmov instructions into branch when profitable.
	FunctionPass *createX86CmovConverterPass();			FunctionPass *createX86CmovConverterPass();
	Show All 22 Lines

lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 799 Lines • ▼ Show 20 Lines	public:
bool isTypeDesirableForOp(unsigned Opc, EVT VT) const override;		bool isTypeDesirableForOp(unsigned Opc, EVT VT) const override;

/// Return true if the target has native support for the		/// Return true if the target has native support for the
/// specified value type and it is 'desirable' to use the type. e.g. On x86		/// specified value type and it is 'desirable' to use the type. e.g. On x86
/// i16 is legal, but undesirable since i16 instruction encodings are longer		/// i16 is legal, but undesirable since i16 instruction encodings are longer
/// and some i16 instructions are slow.		/// and some i16 instructions are slow.
bool IsDesirableToPromoteOp(SDValue Op, EVT &PVT) const override;		bool IsDesirableToPromoteOp(SDValue Op, EVT &PVT) const override;

		/// Return true if it is desirable to speculatively load the operands
		/// of a select instruction for the target.
		bool isDesirableToSpeculateSelectLoad() const override { return true; }

MachineBasicBlock *		MachineBasicBlock *
EmitInstrWithCustomInserter(MachineInstr &MI,		EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *MBB) const override;		MachineBasicBlock *MBB) const override;

/// This method returns the name of a target specific DAG node.		/// This method returns the name of a target specific DAG node.
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;

bool mergeStoresAfterLegalization() const override { return true; }		bool mergeStoresAfterLegalization() const override { return true; }
▲ Show 20 Lines • Show All 684 Lines • Show Last 20 Lines

lib/Target/X86/X86SpeculateSelectLoad.cpp

This file was added.

				//===-- X86SpeculateSelectLoad- Speculatively load operands of a select --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===---------------------------------------------------------------------===//
				// For a select instruction where the operands are address calculations
				// of two independent loads, the pass tries to speculate the loads and
				// feed them into the select instruction, this allows early parallel execution
				// of the loads and possibly memory folding into the CMOV instructions later on.
				// The pass currently only handles cases where the loads are elements of the
				// same struct.
				//===---------------------------------------------------------------------===//

				#include "X86.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;
				aaboudUnsubmitted Done Reply Inline Actions Maybe, I missed that, but why do you need this include for? aaboud: Maybe, I missed that, but why do you need this include for?

				#define DEBUG_TYPE "x86speculateload"

				namespace llvm {
				void initializeX86SpeculateSelectLoadPassPass(PassRegistry &);
				}

				namespace {

				class X86SpeculateSelectLoadPass : public FunctionPass {
				public:
				static char ID; // Pass identification, replacement for typeid.

				X86SpeculateSelectLoadPass() : FunctionPass(ID) {
				initializeX86SpeculateSelectLoadPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &Fn) override;
				StringRef getPassName() const override {
				return "X86 Speculatively load before select instruction";
				}
				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<TargetTransformInfoWrapperPass>();
				}

				private:
				bool OptimizeSelectInst(SelectInst *SI);
				const DataLayout *DL;
				const TargetTransformInfo *TTI;
				SmallVector<Instruction *, 2> InstrForRemoval;
				};
				}

				FunctionPass *llvm::createX86SpeculateSelectLoadPass() {
				return new X86SpeculateSelectLoadPass();
				}

				char X86SpeculateSelectLoadPass::ID = 0;

				INITIALIZE_PASS_BEGIN(X86SpeculateSelectLoadPass, "x86-speculateload",
				"X86 Speculatively load before select instruction", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(X86SpeculateSelectLoadPass, "x86-speculateload",
				"X86 Speculatively load before select instruction", false,
				false)

				bool X86SpeculateSelectLoadPass::OptimizeSelectInst(SelectInst *SI) {
				GetElementPtrInst *GEPIT = dyn_cast<GetElementPtrInst>(SI->getTrueValue());
				GetElementPtrInst *GEPIF = dyn_cast<GetElementPtrInst>(SI->getFalseValue());
				if (GEPIT == nullptr \|\| GEPIF == nullptr)
				return false;

				// The pass currently only handles cases where the loads are elements of the
				// same struct and which aren't aggregate.
				if (!GEPIT->getSourceElementType()->isStructTy() \|\|
				!GEPIF->getSourceElementType()->isStructTy() \|\|
				GEPIT->getPointerOperand() != GEPIF->getPointerOperand() \|\|
				!GEPIT->hasAllConstantIndices() \|\| !GEPIF->hasAllConstantIndices() \|\|
				GEPIT->getNumOperands() != 3 \|\| GEPIF->getNumOperands() != 3)
				return false;

				if (!SI->hasOneUse())
				return false;
				// Bail out if there is a good chance we'll be loading from two different
				// cache lines instead of one.
				if (StructType *STy = dyn_cast<StructType>(GEPIT->getSourceElementType())) {
				// Get the indices of the elements in the struct
				ConstantInt *Idx1 = dyn_cast<ConstantInt>(GEPIT->getOperand(2));
				ConstantInt *Idx2 = dyn_cast<ConstantInt>(GEPIF->getOperand(2));
				const StructLayout *STL = DL->getStructLayout(STy);
				if (Idx1 && Idx2 && STL) {
				signed offset1 = STL->getElementOffset(Idx1->getZExtValue());
				aaboudUnsubmitted Done Reply Inline Actions Remove this empty line. aaboud: Remove this empty line.
				signed offset2 = STL->getElementOffset(Idx2->getZExtValue());
				unsigned dist = abs(offset1 - offset2);
				assert((TTI->getCacheLineSize() > 0) &&
				"CacheLineSize information is missing for X86");
				if (dist > TTI->getCacheLineSize())
				return false;
				}
				}

				for (User *U : SI->users()) {
				if (LoadInst *LI = dyn_cast<LoadInst>(U)) {
				if (!LI->isSimple())
				return false;
				IRBuilder<> Builder(SI);
				LoadInst *LT = Builder.CreateAlignedLoad(GEPIT, LI->getAlignment());
				LoadInst *LF = Builder.CreateAlignedLoad(GEPIF, LI->getAlignment());
				Value *NewSI = Builder.CreateSelect(SI->getCondition(), LT, LF);
				LI->replaceAllUsesWith(NewSI);
				InstrForRemoval.push_back(LI);
				InstrForRemoval.push_back(SI);

				return true;
				aaboudUnsubmitted Done Reply Inline Actions NewSI can be of type Value, and would not need the cast to SelectInst. Notice that the only use for NewSI, is "replaceAllUsersWith()" which takes Value as argument. aaboud: NewSI can be of type Value*, and would not need the cast to SelectInst. Notice that the only…
				}
				}
				return false;
				}

				bool X86SpeculateSelectLoadPass::runOnFunction(Function &F) {
				if (skipFunction(F) \|\| F.optForSize())
				return false;

				DL = &F.getParent()->getDataLayout();
				TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

				bool Changed = false;

				for (auto &BB : F)
				for (auto &I : BB)
				if (SelectInst *SI = dyn_cast<SelectInst>(&I))
				Changed \|= OptimizeSelectInst(SI);

				for (auto Instr : InstrForRemoval)
				Instr->eraseFromParent();

				InstrForRemoval.clear();

				return Changed;
				}

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",
cl::desc("Enable the machine combiner pass"),		cl::desc("Enable the machine combiner pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

namespace llvm {		namespace llvm {

void initializeWinEHStatePassPass(PassRegistry &);		void initializeWinEHStatePassPass(PassRegistry &);
void initializeFixupLEAPassPass(PassRegistry &);		void initializeFixupLEAPassPass(PassRegistry &);
void initializeX86ExecutionDepsFixPass(PassRegistry &);		void initializeX86ExecutionDepsFixPass(PassRegistry &);
		void initializeX86SpeculateSelectLoadPassPass(PassRegistry &);

} // end namespace llvm		} // end namespace llvm

extern "C" void LLVMInitializeX86Target() {		extern "C" void LLVMInitializeX86Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<X86TargetMachine> X(getTheX86_32Target());		RegisterTargetMachine<X86TargetMachine> X(getTheX86_32Target());
RegisterTargetMachine<X86TargetMachine> Y(getTheX86_64Target());		RegisterTargetMachine<X86TargetMachine> Y(getTheX86_64Target());

PassRegistry &PR = *PassRegistry::getPassRegistry();		PassRegistry &PR = *PassRegistry::getPassRegistry();
initializeGlobalISel(PR);		initializeGlobalISel(PR);
initializeWinEHStatePassPass(PR);		initializeWinEHStatePassPass(PR);
initializeFixupBWInstPassPass(PR);		initializeFixupBWInstPassPass(PR);
initializeEvexToVexInstPassPass(PR);		initializeEvexToVexInstPassPass(PR);
initializeFixupLEAPassPass(PR);		initializeFixupLEAPassPass(PR);
initializeX86ExecutionDepsFixPass(PR);		initializeX86ExecutionDepsFixPass(PR);
		initializeX86SpeculateSelectLoadPassPass(PR);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO()) {		if (TT.isOSBinFormatMachO()) {
if (TT.getArch() == Triple::x86_64)		if (TT.getArch() == Triple::x86_64)
return llvm::make_unique<X86_64MachoTargetObjectFile>();		return llvm::make_unique<X86_64MachoTargetObjectFile>();
return llvm::make_unique<TargetLoweringObjectFileMachO>();		return llvm::make_unique<TargetLoweringObjectFileMachO>();
}		}
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(*this, PM);		return new X86PassConfig(*this, PM);
}		}

void X86PassConfig::addIRPasses() {		void X86PassConfig::addIRPasses() {
addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();

if (TM->getOptLevel() != CodeGenOpt::None)		if (TM->getOptLevel() != CodeGenOpt::None) {
addPass(createInterleavedAccessPass());		addPass(createInterleavedAccessPass());
		addPass(createX86SpeculateSelectLoadPass());
		}
}		}

bool X86PassConfig::addInstSelector() {		bool X86PassConfig::addInstSelector() {
// Install an instruction selector.		// Install an instruction selector.
addPass(createX86ISelDag(getX86TargetMachine(), getOptLevel()));		addPass(createX86ISelDag(getX86TargetMachine(), getOptLevel()));

// For ELF, cleanup any local-dynamic TLS accesses.		// For ELF, cleanup any local-dynamic TLS accesses.
if (TM->getTargetTriple().isOSBinFormatELF() &&		if (TM->getTargetTriple().isOSBinFormatELF() &&
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedLoad(Type *DataType);		bool isLegalMaskedLoad(Type *DataType);
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);		bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
		unsigned getCacheLineSize() const;

private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,523 Lines • ▼ Show 20 Lines	return getInterleavedMemoryOpCostAVX512(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
if (ST->hasAVX2())		if (ST->hasAVX2())
return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,		return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
}		}

		unsigned X86TTIImpl::getCacheLineSize() const { return 64; }

test/CodeGen/X86/speculate-select-load.ll

This file was added.

				; RUN: opt < %s -mcpu=x86-64 -S -x86-speculateload \| FileCheck %s
				; RUN: opt < %s -mcpu=i386 -S -x86-speculateload \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.S = type { i64, %struct.S, %struct.S, i64, i16, i64, i64, i64, i64, i64 }

				;; Selecting between pointers of two members of the same structure with offset
				;; smaller than a cache line's size (64 bytes) between them, load speculatively.
				; Function Attrs: norecurse nounwind readonly uwtable
				define %struct.S* @spec_load(i32 %x, %struct.S* nocapture readnone %A, %struct.S* nocapture readonly %B) local_unnamed_addr #0 {
				entry:
				; CHECK-LABEL:@spec_load
				; CHECK: getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 1
				; CHECK: getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 2
				; CHECK: [[A:%[0-9]+]] = load %struct.S, %struct.S* %{{.*}}, align 8
				; CHECK: [[B:%[0-9]+]] = load %struct.S, %struct.S* %{{.*}}, align 8
				; CHECK: select i1 %tobool, %struct.S* [[A]], %struct.S* [[B]]

				%tobool = icmp eq i32 %x, 0
				%b = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 1
				%c = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 2
				%A.addr.0.in = select i1 %tobool, %struct.S %c, %struct.S %b
				%A.addr.0 = load %struct.S, %struct.S* %A.addr.0.in, align 8
				ret %struct.S* %A.addr.0
				}

				;; Selecting between pointers of two members of the same structure with offset
				aaboudUnsubmitted Done Reply Inline Actions Can you add a comment explaining this case (and the one above). Something like: Selecting between address of/pointer to two members of same structure, with offset bigger than cache line (64 bytes) between them. Thus, do not load speculatively. aaboud: Can you add a comment explaining this case (and the one above). Something like: Selecting…
				;; greater than a cache line's size (64 bytes) between them, thus do not load speculatively.
				; Function Attrs: norecurse nounwind readonly uwtable
				define i64 @no_spec_load(i32 %x, i64 %A, %struct.S* nocapture readonly %B) local_unnamed_addr #0 {
				entry:
				; CHECK-LABEL:@no_spec_load
				; CHECK: [[A:%[a-z]+]] = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 0
				; CHECK: [[B:%[a-z]+]] = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 9
				; CHECK: select i1 %tobool, i64* [[B]], i64* [[A]]
				; CHECK-NEXT: load i64, i64* %{{.*}}, align 8

				%tobool = icmp eq i32 %x, 0
				%a = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 0
				%j = getelementptr inbounds %struct.S, %struct.S* %B, i64 0, i32 9
				%A.addr.0.in = select i1 %tobool, i64* %j, i64* %a
				%A.addr.0 = load i64, i64* %A.addr.0.in, align 8
				ret i64 %A.addr.0
				}

				;; Selecting into an aggregate type of the struct, do not load speculatively
				%struct.S2 = type { i32, [10 x i32] }

				; Function Attrs: norecurse nounwind readonly uwtable
				define i32 @no_load_agg(%struct.S2* nocapture readonly %s1, %struct.S2* nocapture readnone %s2, i32 %x) local_unnamed_addr #0 {
				entry:
				; CHECK-LABEL:@no_load_agg
				; CHECK: [[A:%[a-z]+]] = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 0
				; CHECK: [[B:%[a-z]+]] = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 1, i64 10
				; CHECK: %retval.0.in = select i1 %tobool, i32* [[B]], i32* [[A]]

				%tobool = icmp eq i32 %x, 0
				%a = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 0
				%arrayidx = getelementptr inbounds %struct.S2, %struct.S2* %s1, i64 0, i32 1, i64 10
				%retval.0.in = select i1 %tobool, i32* %arrayidx, i32* %a
				%retval.0 = load i32, i32* %retval.0.in, align 4
				ret i32 %retval.0
				}



				attributes #0 = { norecurse nounwind readonly uwtable "target-cpu"="core-avx2" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87"}