This is an archive of the discontinued LLVM Phabricator instance.

Optionally simplify basic blocks introduced by AtomicExpandPass
AbandonedPublic

Authored by kparzysz on May 3 2018, 10:27 AM.

Download Raw Diff

Details

Reviewers

jyknight
ab
jfb
javed.absar
t.p.northover

Summary

The atomic expansion pass typically runs late, after most IR optimizations have already run. The test-and-branch code it generates can often be simplified, but no such simplification normally occurred. This patch adds an optional parameter to createAtomicExpandPass with CFG simplification options. The presence of this parameter will trigger running simplifyCFG
on basic blocks created in expandPartwordCmpXchg, insertRMWLLSCLoop, and expandAtomicCmpXchg.

Diff Detail

Repository: rL LLVM

Event Timeline

kparzysz created this revision.May 3 2018, 10:27 AM

kparzysz added a reviewer: jfb.May 3 2018, 11:46 AM

Is hexagon the only backend that triggers this? Seems like we want tests for at least ARM64, no?

This revision now requires changes to proceed.May 4 2018, 10:09 AM

I don't know what AArch64 needs. I can enable it there too.

Removed incorrect (leftover from initial incorrect diff) Hexagon tests.
Enable simplification on AArch64.
Add testcases for AArch64 and Hexagon.

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 4 2018, 11:24 AM

ab added a reviewer: t.p.northover.May 4 2018, 11:42 AM

ab added inline comments.

lib/Target/AArch64/AArch64TargetMachine.cpp
376	This already does the simplification, no? How about adding the pass for Hexagon as well? I suppose this patch limits it to a smaller set of blocks, but on the other hand it's nice not to have to deal with it in the pass.

kparzysz added inline comments.May 4 2018, 12:02 PM

lib/Target/AArch64/AArch64TargetMachine.cpp
376	Evidently it doesn't a good enough job---check the testcase. On Hexagon we don't want to run the full simplify-cfg. I've tried that and it broke over 100 lit tests and I don't even know what impact it would have on performance. This would be just a way too big of a hammer.

kparzysz added inline comments.May 4 2018, 12:50 PM

lib/Target/AArch64/AArch64TargetMachine.cpp

376

Let me attach the testcase outputs (I somehow thought it was more evident in the testcase itself).

Without this patch

f0:                                     // @f0
        .cfi_startproc
// %bb.0:                               // %b0
        ldr     w8, [x0]
        add     w9, w8, #1              // =1
        cmp     w9, #17                 // =17
        csinc   w9, wzr, w8, eq
.LBB0_1:                                // %cmpxchg.start
                                        // =>This Inner Loop Header: Depth=1
        ldaxr   w10, [x0]
        cmp     w10, w8
        b.ne    .LBB0_4
// %bb.2:                               // %cmpxchg.trystore
                                        //   in Loop: Header=BB0_1 Depth=1
        stlxr   w10, w9, [x0]
        cbnz    w10, .LBB0_1
// %bb.3:
        orr     w8, wzr, #0x1
        b       .LBB0_5
.LBB0_4:                                // %cmpxchg.nostore
        clrex
        mov     w8, wzr
.LBB0_5:                                // %cmpxchg.end
        cmp     w8, #0                  // =0
        mov     w8, #123
        mov     w9, #321
        csel    w0, w9, w8, ne          // *** The patch eliminates this
                                        // *** select (and the setup code).
        ret
.Lfunc_end0:
        .size   f0, .Lfunc_end0-f0
        .cfi_endproc
                                        // -- End function

With the patch:

f0:                                     // @f0
        .cfi_startproc
// %bb.0:                               // %b0
        ldr     w8, [x0]
        add     w9, w8, #1              // =1
        cmp     w9, #17                 // =17
        csinc   w9, wzr, w8, eq
.LBB0_1:                                // %cmpxchg.start
                                        // =>This Inner Loop Header: Depth=1
        ldaxr   w10, [x0]
        cmp     w10, w8
        b.ne    .LBB0_4
// %bb.2:                               // %cmpxchg.fencedstore
                                        //   in Loop: Header=BB0_1 Depth=1
        stlxr   w10, w9, [x0]
        cbnz    w10, .LBB0_1
// %bb.3:
        mov     w0, #321
        ret
.LBB0_4:                                // %cmpxchg.nostore
        clrex
        mov     w0, #123
        ret
.Lfunc_end0:
        .size   f0, .Lfunc_end0-f0
        .cfi_endproc
                                        // -- End function

Ping.

I'd like to hear from @t.p.northover on the ARM bits. He's been paying much closer attention that I have.

Pinging @t.p.northover...

asb added a subscriber: asb.May 30 2018, 12:53 PM

Pinging @t.p.northover ...

While this sort of thing seems like it _should_ be okay (and possibly it even is, after all we're already running simplifycfg on ARM), the way LLVM expands operations into separate LL and SC calls in IR level is basically an invalid thing to do, and therefore fragile. And that makes changes like this which trigger more/different transformations seem somewhat scary.

I'd certainly be more comfortable making such a change once we stop emitting fragile LLSC loops at the IR level.

(The idea would be to have AtomicExpandPass emit an intrinsic for the inner ll/sc loop, which would get lowered very late in the MI pipeline, in order to guarantee that it's emitted without interference. The discussion of such has been restarted recently due to some of the RISCV atomic work, since RISCV is more precise in terms of its architectural requirements than most. I hope the result will be new functionality added to AtomicExpandPass which all targets can then be converted over to use.)

Replaced with running SimplifyCFG after expanding atomics into ll/sc.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

2 lines

lib/

CodeGen/

AtomicExpandPass.cpp

38 lines

Target/

AArch64/

AArch64TargetMachine.cpp

6 lines

Hexagon/

HexagonTargetMachine.cpp

3 lines

test/

CodeGen/

AArch64/

atomic-expand-simplify-cfg.ll

26 lines

Hexagon/

atomic-expand-simplify-cfg.ll

25 lines

Diff 145234

include/llvm/CodeGen/Passes.h

	Show All 22 Lines
	class FunctionPass;			class FunctionPass;
	class MachineFunction;			class MachineFunction;
	class MachineFunctionPass;			class MachineFunctionPass;
	class ModulePass;			class ModulePass;
	class Pass;			class Pass;
	class TargetMachine;			class TargetMachine;
	class TargetRegisterClass;			class TargetRegisterClass;
	class raw_ostream;			class raw_ostream;
				struct SimplifyCFGOptions;

	} // End llvm namespace			} // End llvm namespace

	/// List of target independent CodeGen pass IDs.			/// List of target independent CodeGen pass IDs.
	namespace llvm {			namespace llvm {
	FunctionPass *createAtomicExpandPass();			FunctionPass *createAtomicExpandPass();
				FunctionPass *createAtomicExpandPass(const SimplifyCFGOptions &Opts);

	/// createUnreachableBlockEliminationPass - The LLVM code generator does not			/// createUnreachableBlockEliminationPass - The LLVM code generator does not
	/// work well with unreachable basic blocks (what live ranges make sense for a			/// work well with unreachable basic blocks (what live ranges make sense for a
	/// block that cannot be reached?). As such, a code generator should either			/// block that cannot be reached?). As such, a code generator should either
	/// not instruction select unreachable blocks, or run this pass as its			/// not instruction select unreachable blocks, or run this pass as its
	/// last LLVM modifying pass to clean up blocks that are not reachable from			/// last LLVM modifying pass to clean up blocks that are not reachable from
	/// the entry block.			/// the entry block.
	FunctionPass *createUnreachableBlockEliminationPass();			FunctionPass *createUnreachableBlockEliminationPass();
	▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

lib/CodeGen/AtomicExpandPass.cpp

Show All 12 Lines
// include the use of (intrinsic-based) load-linked/store-conditional loops,		// include the use of (intrinsic-based) load-linked/store-conditional loops,
// AtomicCmpXchg, or type coercions.		// AtomicCmpXchg, or type coercions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/Utils/Local.h"
#include "llvm/CodeGen/AtomicExpandUtils.h"		#include "llvm/CodeGen/AtomicExpandUtils.h"
#include "llvm/CodeGen/RuntimeLibcalls.h"		#include "llvm/CodeGen/RuntimeLibcalls.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
Show All 24 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "atomic-expand"		#define DEBUG_TYPE "atomic-expand"

namespace {		namespace {

class AtomicExpand: public FunctionPass {		class AtomicExpand: public FunctionPass {
const TargetLowering *TLI = nullptr;		const TargetLowering *TLI = nullptr;
		const TargetTransformInfo *TTI = nullptr;
		// If the CFG simplification options are present, the newly created
		// basic blocks will be simplified at the end of runOnFunction.
		Optional<SimplifyCFGOptions> SimplifyOpts;
		SmallVector<BasicBlock*,4> SimplifyBlocks;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

AtomicExpand() : FunctionPass(ID) {		AtomicExpand(Optional<SimplifyCFGOptions> Opts = None)
		: FunctionPass(ID), SimplifyOpts(Opts) {
initializeAtomicExpandPass(*PassRegistry::getPassRegistry());		initializeAtomicExpandPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<TargetTransformInfoWrapperPass>();
		}

private:		private:
bool bracketInstWithFences(Instruction *I, AtomicOrdering Order);		bool bracketInstWithFences(Instruction *I, AtomicOrdering Order);
IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);		IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);
LoadInst convertAtomicLoadToIntegerType(LoadInst LI);		LoadInst convertAtomicLoadToIntegerType(LoadInst LI);
bool tryExpandAtomicLoad(LoadInst *LI);		bool tryExpandAtomicLoad(LoadInst *LI);
bool expandAtomicLoadToLL(LoadInst *LI);		bool expandAtomicLoadToLL(LoadInst *LI);
bool expandAtomicLoadToCmpXchg(LoadInst *LI);		bool expandAtomicLoadToCmpXchg(LoadInst *LI);
StoreInst convertAtomicStoreToIntegerType(StoreInst SI);		StoreInst convertAtomicStoreToIntegerType(StoreInst SI);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

char AtomicExpand::ID = 0;		char AtomicExpand::ID = 0;

char &llvm::AtomicExpandID = AtomicExpand::ID;		char &llvm::AtomicExpandID = AtomicExpand::ID;

INITIALIZE_PASS(AtomicExpand, DEBUG_TYPE, "Expand Atomic instructions",		INITIALIZE_PASS(AtomicExpand, DEBUG_TYPE, "Expand Atomic instructions",
false, false)		false, false)

FunctionPass *llvm::createAtomicExpandPass() { return new AtomicExpand(); }		FunctionPass *llvm::createAtomicExpandPass() {
		return new AtomicExpand(None);
		}

		FunctionPass *llvm::createAtomicExpandPass(const SimplifyCFGOptions &Opts) {
		return new AtomicExpand(Opts);
		}

// Helper functions to retrieve the size of atomic instructions.		// Helper functions to retrieve the size of atomic instructions.
static unsigned getAtomicOpSize(LoadInst *LI) {		static unsigned getAtomicOpSize(LoadInst *LI) {
const DataLayout &DL = LI->getModule()->getDataLayout();		const DataLayout &DL = LI->getModule()->getDataLayout();
return DL.getTypeStoreSize(LI->getType());		return DL.getTypeStoreSize(LI->getType());
}		}

static unsigned getAtomicOpSize(StoreInst *SI) {		static unsigned getAtomicOpSize(StoreInst *SI) {
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bool AtomicExpand::runOnFunction(Function &F) {
auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)		if (!TPC)
return false;		return false;

auto &TM = TPC->getTM<TargetMachine>();		auto &TM = TPC->getTM<TargetMachine>();
if (!TM.getSubtargetImpl(F)->enableAtomicExpand())		if (!TM.getSubtargetImpl(F)->enableAtomicExpand())
return false;		return false;
TLI = TM.getSubtargetImpl(F)->getTargetLowering();		TLI = TM.getSubtargetImpl(F)->getTargetLowering();
		TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		SimplifyBlocks.clear();

SmallVector<Instruction *, 1> AtomicInsts;		SmallVector<Instruction *, 1> AtomicInsts;

// Changing control-flow while iterating through it is a bad idea, so gather a		// Changing control-flow while iterating through it is a bad idea, so gather a
// list of all atomic instructions before we start.		// list of all atomic instructions before we start.
for (inst_iterator II = inst_begin(F), E = inst_end(F); II != E; ++II) {		for (inst_iterator II = inst_begin(F), E = inst_end(F); II != E; ++II) {
Instruction I = &II;		Instruction I = &II;
if (I->isAtomic() && !isa<FenceInst>(I))		if (I->isAtomic() && !isa<FenceInst>(I))
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	if (LI) {
"MinCmpXchgSizeInBits not yet supported for LL/SC expansions.");		"MinCmpXchgSizeInBits not yet supported for LL/SC expansions.");
expandPartwordCmpXchg(CASI);		expandPartwordCmpXchg(CASI);
} else {		} else {
if (TLI->shouldExpandAtomicCmpXchgInIR(CASI))		if (TLI->shouldExpandAtomicCmpXchgInIR(CASI))
MadeChange \|= expandAtomicCmpXchg(CASI);		MadeChange \|= expandAtomicCmpXchg(CASI);
}		}
}		}
}		}

		if (SimplifyOpts)
		for (BasicBlock *B : SimplifyBlocks)
		simplifyCFG(B, TTI, SimplifyOpts);

return MadeChange;		return MadeChange;
}		}

bool AtomicExpand::bracketInstWithFences(Instruction *I, AtomicOrdering Order) {		bool AtomicExpand::bracketInstWithFences(Instruction *I, AtomicOrdering Order) {
IRBuilder<> Builder(I);		IRBuilder<> Builder(I);

auto LeadingFence = TLI->emitLeadingFence(Builder, I, Order);		auto LeadingFence = TLI->emitLeadingFence(Builder, I, Order);

▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines	void AtomicExpand::expandPartwordCmpXchg(AtomicCmpXchgInst *CI) {
Value *FinalOldVal = Builder.CreateTrunc(		Value *FinalOldVal = Builder.CreateTrunc(
Builder.CreateLShr(OldVal, PMV.ShiftAmt), PMV.ValueType);		Builder.CreateLShr(OldVal, PMV.ShiftAmt), PMV.ValueType);
Value *Res = UndefValue::get(CI->getType());		Value *Res = UndefValue::get(CI->getType());
Res = Builder.CreateInsertValue(Res, FinalOldVal, 0);		Res = Builder.CreateInsertValue(Res, FinalOldVal, 0);
Res = Builder.CreateInsertValue(Res, Success, 1);		Res = Builder.CreateInsertValue(Res, Success, 1);

CI->replaceAllUsesWith(Res);		CI->replaceAllUsesWith(Res);
CI->eraseFromParent();		CI->eraseFromParent();

		SimplifyBlocks.insert(SimplifyBlocks.end(), { EndBB, FailureBB, LoopBB });
}		}

void AtomicExpand::expandAtomicOpToLLSC(		void AtomicExpand::expandAtomicOpToLLSC(
Instruction I, Type ResultType, Value *Addr, AtomicOrdering MemOpOrder,		Instruction I, Type ResultType, Value *Addr, AtomicOrdering MemOpOrder,
function_ref<Value (IRBuilder<> &, Value )> PerformOp) {		function_ref<Value (IRBuilder<> &, Value )> PerformOp) {
IRBuilder<> Builder(I);		IRBuilder<> Builder(I);
Value *Loaded =		Value *Loaded =
insertRMWLLSCLoop(Builder, ResultType, Addr, MemOpOrder, PerformOp);		insertRMWLLSCLoop(Builder, ResultType, Addr, MemOpOrder, PerformOp);
Show All 40 Lines	Value *AtomicExpand::insertRMWLLSCLoop(

Value *StoreSuccess =		Value *StoreSuccess =
TLI->emitStoreConditional(Builder, NewVal, Addr, MemOpOrder);		TLI->emitStoreConditional(Builder, NewVal, Addr, MemOpOrder);
Value *TryAgain = Builder.CreateICmpNE(		Value *TryAgain = Builder.CreateICmpNE(
StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");		StoreSuccess, ConstantInt::get(IntegerType::get(Ctx, 32), 0), "tryagain");
Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);		Builder.CreateCondBr(TryAgain, LoopBB, ExitBB);

Builder.SetInsertPoint(ExitBB, ExitBB->begin());		Builder.SetInsertPoint(ExitBB, ExitBB->begin());

		SimplifyBlocks.insert(SimplifyBlocks.end(), { ExitBB, LoopBB });
return Loaded;		return Loaded;
}		}

/// Convert an atomic cmpxchg of a non-integral type to an integer cmpxchg of		/// Convert an atomic cmpxchg of a non-integral type to an integer cmpxchg of
/// the equivalent bitwidth. We used to not support pointer cmpxchg in the		/// the equivalent bitwidth. We used to not support pointer cmpxchg in the
/// IR. As a migration step, we convert back to what use to be the standard		/// IR. As a migration step, we convert back to what use to be the standard
/// way to represent a pointer cmpxchg so that we can update backends one by		/// way to represent a pointer cmpxchg so that we can update backends one by
/// one.		/// one.
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	if (!CI->use_empty()) {
Value *Res;		Value *Res;
Res = Builder.CreateInsertValue(UndefValue::get(CI->getType()), Loaded, 0);		Res = Builder.CreateInsertValue(UndefValue::get(CI->getType()), Loaded, 0);
Res = Builder.CreateInsertValue(Res, Success, 1);		Res = Builder.CreateInsertValue(Res, Success, 1);

CI->replaceAllUsesWith(Res);		CI->replaceAllUsesWith(Res);
}		}

CI->eraseFromParent();		CI->eraseFromParent();

		SimplifyBlocks.insert(SimplifyBlocks.end(),
		{ ExitBB, FailureBB, NoStoreBB, SuccessBB,
		ReleasedLoadBB, TryStoreBB, ReleasingStoreBB,
		StartBB });
return true;		return true;
}		}

bool AtomicExpand::isIdempotentRMW(AtomicRMWInst* RMWI) {		bool AtomicExpand::isIdempotentRMW(AtomicRMWInst* RMWI) {
auto C = dyn_cast<ConstantInt>(RMWI->getValOperand());		auto C = dyn_cast<ConstantInt>(RMWI->getValOperand());
if(!C)		if(!C)
return false;		return false;

▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetMachine.cpp

	Show All 14 Lines
	#include "AArch64MacroFusion.h"			#include "AArch64MacroFusion.h"
	#include "AArch64Subtarget.h"			#include "AArch64Subtarget.h"
	#include "AArch64TargetObjectFile.h"			#include "AArch64TargetObjectFile.h"
	#include "AArch64TargetTransformInfo.h"			#include "AArch64TargetTransformInfo.h"
	#include "MCTargetDesc/AArch64MCTargetDesc.h"			#include "MCTargetDesc/AArch64MCTargetDesc.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/Utils/Local.h"
	#include "llvm/CodeGen/GlobalISel/IRTranslator.h"			#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
	#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"			#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
	#include "llvm/CodeGen/GlobalISel/Legalizer.h"			#include "llvm/CodeGen/GlobalISel/Legalizer.h"
	#include "llvm/CodeGen/GlobalISel/Localizer.h"			#include "llvm/CodeGen/GlobalISel/Localizer.h"
	#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"			#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
	#include "llvm/CodeGen/MachineScheduler.h"			#include "llvm/CodeGen/MachineScheduler.h"
	#include "llvm/CodeGen/Passes.h"			#include "llvm/CodeGen/Passes.h"
	#include "llvm/CodeGen/TargetPassConfig.h"			#include "llvm/CodeGen/TargetPassConfig.h"
	▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines

	TargetPassConfig *AArch64TargetMachine::createPassConfig(PassManagerBase &PM) {			TargetPassConfig *AArch64TargetMachine::createPassConfig(PassManagerBase &PM) {
	return new AArch64PassConfig(*this, PM);			return new AArch64PassConfig(*this, PM);
	}			}

	void AArch64PassConfig::addIRPasses() {			void AArch64PassConfig::addIRPasses() {
	// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg			// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg
	// ourselves.			// ourselves.
				if (EnableAtomicTidy)
				addPass(createAtomicExpandPass(SimplifyCFGOptions()));
				else
	addPass(createAtomicExpandPass());			addPass(createAtomicExpandPass());

	// Cmpxchg instructions are often used with a subsequent comparison to			// Cmpxchg instructions are often used with a subsequent comparison to
	// determine whether it succeeded. We can exploit existing control-flow in			// determine whether it succeeded. We can exploit existing control-flow in
	// ldrex/strex loops to simplify this, but it needs tidying up.			// ldrex/strex loops to simplify this, but it needs tidying up.
	if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)			if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)
	addPass(createCFGSimplificationPass(1, true, true, false, true));			addPass(createCFGSimplificationPass(1, true, true, false, true));
				abUnsubmitted Not Done Reply Inline Actions This already does the simplification, no? How about adding the pass for Hexagon as well? I suppose this patch limits it to a smaller set of blocks, but on the other hand it's nice not to have to deal with it in the pass. ab: This already does the simplification, no? How about adding the pass for Hexagon as well? I…
				kparzyszAuthorUnsubmitted Not Done Reply Inline Actions Evidently it doesn't a good enough job---check the testcase. On Hexagon we don't want to run the full simplify-cfg. I've tried that and it broke over 100 lit tests and I don't even know what impact it would have on performance. This would be just a way too big of a hammer. kparzysz: Evidently it doesn't a good enough job---check the testcase. On Hexagon we don't want to run…
				kparzyszAuthorUnsubmitted Not Done Reply Inline Actions Let me attach the testcase outputs (I somehow thought it was more evident in the testcase itself). Without this patch f0: // @f0 .cfi_startproc // %bb.0: // %b0 ldr w8, [x0] add w9, w8, #1 // =1 cmp w9, #17 // =17 csinc w9, wzr, w8, eq .LBB0_1: // %cmpxchg.start // =>This Inner Loop Header: Depth=1 ldaxr w10, [x0] cmp w10, w8 b.ne .LBB0_4 // %bb.2: // %cmpxchg.trystore // in Loop: Header=BB0_1 Depth=1 stlxr w10, w9, [x0] cbnz w10, .LBB0_1 // %bb.3: orr w8, wzr, #0x1 b .LBB0_5 .LBB0_4: // %cmpxchg.nostore clrex mov w8, wzr .LBB0_5: // %cmpxchg.end cmp w8, #0 // =0 mov w8, #123 mov w9, #321 csel w0, w9, w8, ne // * The patch eliminates this // * select (and the setup code). ret .Lfunc_end0: .size f0, .Lfunc_end0-f0 .cfi_endproc // -- End function With the patch: f0: // @f0 .cfi_startproc // %bb.0: // %b0 ldr w8, [x0] add w9, w8, #1 // =1 cmp w9, #17 // =17 csinc w9, wzr, w8, eq .LBB0_1: // %cmpxchg.start // =>This Inner Loop Header: Depth=1 ldaxr w10, [x0] cmp w10, w8 b.ne .LBB0_4 // %bb.2: // %cmpxchg.fencedstore // in Loop: Header=BB0_1 Depth=1 stlxr w10, w9, [x0] cbnz w10, .LBB0_1 // %bb.3: mov w0, #321 ret .LBB0_4: // %cmpxchg.nostore clrex mov w0, #123 ret .Lfunc_end0: .size f0, .Lfunc_end0-f0 .cfi_endproc // -- End function kparzysz: Let me attach the testcase outputs (I somehow thought it was more evident in the testcase…

	// Run LoopDataPrefetch			// Run LoopDataPrefetch
	//			//
	// Run this before LSR to remove the multiplies involved in computing the			// Run this before LSR to remove the multiplies involved in computing the
	// pointer values N iterations ahead.			// pointer values N iterations ahead.
	if (TM->getOptLevel() != CodeGenOpt::None) {			if (TM->getOptLevel() != CodeGenOpt::None) {
	if (EnableLoopDataPrefetch)			if (EnableLoopDataPrefetch)
	addPass(createLoopDataPrefetchPass());			addPass(createLoopDataPrefetchPass());
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

lib/Target/Hexagon/HexagonTargetMachine.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "HexagonTargetMachine.h"		#include "HexagonTargetMachine.h"
#include "Hexagon.h"		#include "Hexagon.h"
#include "HexagonISelLowering.h"		#include "HexagonISelLowering.h"
#include "HexagonMachineScheduler.h"		#include "HexagonMachineScheduler.h"
#include "HexagonTargetObjectFile.h"		#include "HexagonTargetObjectFile.h"
#include "HexagonTargetTransformInfo.h"		#include "HexagonTargetTransformInfo.h"
		#include "llvm/Analysis/Utils/Local.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	void HexagonPassConfig::addIRPasses() {
TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();
bool NoOpt = (getOptLevel() == CodeGenOpt::None);		bool NoOpt = (getOptLevel() == CodeGenOpt::None);

if (!NoOpt) {		if (!NoOpt) {
addPass(createConstantPropagationPass());		addPass(createConstantPropagationPass());
addPass(createDeadCodeEliminationPass());		addPass(createDeadCodeEliminationPass());
}		}

addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass(SimplifyCFGOptions()));
if (!NoOpt) {		if (!NoOpt) {
if (EnableLoopPrefetch)		if (EnableLoopPrefetch)
addPass(createLoopDataPrefetchPass());		addPass(createLoopDataPrefetchPass());
if (EnableCommGEP)		if (EnableCommGEP)
addPass(createHexagonCommonGEP());		addPass(createHexagonCommonGEP());
// Replace certain combinations of shifts and ands with extracts.		// Replace certain combinations of shifts and ands with extracts.
if (EnableGenExtract)		if (EnableGenExtract)
addPass(createHexagonGenExtract());		addPass(createHexagonGenExtract());
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

test/CodeGen/AArch64/atomic-expand-simplify-cfg.ll

This file was added.

				; RUN: llc -march=aarch64 < %s \| FileCheck %s

				; The atomic expansion pass generates CFG structures that contain phi nodes
				; for boolean values.
				; Check that this is optimized away.

				; There should be no uses of WZR.
				; CHECK-NOT: orr{{.*}}wzr
				; CHECK-NOT: mov{{.*}}wzr

				define dso_local i32 @f0(i32* nocapture readnone %a0) {
				b0:
				%v0 = load volatile i32, i32* %a0, align 8
				%v1 = add i32 %v0, 1
				%v2 = icmp eq i32 %v1, 17
				%v3 = select i1 %v2, i32 0, i32 %v1
				%v4 = cmpxchg i32* %a0, i32 %v0, i32 %v3 seq_cst seq_cst
				%v5 = extractvalue { i32, i1 } %v4, 1
				br i1 %v5, label %b2, label %b1

				b1: ; preds = %b0
				ret i32 123

				b2: ; preds = %b0
				ret i32 321
				}

test/CodeGen/Hexagon/atomic-expand-simplify-cfg.ll

This file was added.

				; RUN: llc -march=hexagon < %s \| FileCheck %s

				; The atomic expansion pass generates CFG structures that contain phi nodes
				; for boolean values.
				; Check that this is optimized away.

				; CHECK-NOT: p{{[0-3]}} = or
				; CHECK-NOT: p{{[0-3]}} = and

				define dso_local i32 @f0(i32* nocapture readnone %a0) {
				b0:
				%v0 = load volatile i32, i32* %a0, align 8
				%v1 = add i32 %v0, 1
				%v2 = icmp eq i32 %v1, 17
				%v3 = select i1 %v2, i32 0, i32 %v1
				%v4 = cmpxchg i32* %a0, i32 %v0, i32 %v3 seq_cst seq_cst
				%v5 = extractvalue { i32, i1 } %v4, 1
				br i1 %v5, label %b2, label %b1

				b1: ; preds = %b0
				ret i32 123

				b2: ; preds = %b0
				ret i32 321
				}

This is an archive of the discontinued LLVM Phabricator instance.

Optionally simplify basic blocks introduced by AtomicExpandPassAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 145234

include/llvm/CodeGen/Passes.h

lib/CodeGen/AtomicExpandPass.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/Hexagon/HexagonTargetMachine.cpp

test/CodeGen/AArch64/atomic-expand-simplify-cfg.ll

test/CodeGen/Hexagon/atomic-expand-simplify-cfg.ll

Optionally simplify basic blocks introduced by AtomicExpandPass
AbandonedPublic