This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
CMakeLists.txt
1/2
InstCombineAtomicRMW.cpp
-
InstCombineInternal.h
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
atomicrmw.ll

Differential D57854

[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
ClosedPublic

Authored by qcolombet on Feb 6 2019, 3:07 PM.

Download Raw Diff

Details

Reviewers

spatel
majnemer
jfb

Commits

rG96f54de8ff5d: [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
rL353471: [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible

Summary

This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given atomicrmw <op>, this is possible when:

The ordering of that operation is compatible with a load (i.e., anything that doesn't have a release semantic).
<op> does not modify the value being stored

Diff Detail

Repository: rL LLVM

Event Timeline

qcolombet created this revision.Feb 6 2019, 3:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 3:07 PM

Herald added subscribers: jfb, mgorny. · View Herald Transcript

jfb added inline comments.Feb 6 2019, 3:14 PM

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
27 ↗	(On Diff #185651)	I don't think you should modify `volatile` this way.
32 ↗	(On Diff #185651)	This doesn't capture seq_cst. I think you want to list the valid ones: NotAtomic / Unordered / Monotonic / Acquire.
36 ↗	(On Diff #185651)	This drops SyncScope.

qcolombet marked 3 inline comments as done.Feb 6 2019, 4:20 PM

qcolombet added inline comments.

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
27 ↗	(On Diff #185651)	Good point, I'm killing the volatile store in the process, which is invalid.
32 ↗	(On Diff #185651)	Ok
36 ↗	(On Diff #185651)	What do you mean?

jfb added inline comments.Feb 6 2019, 4:21 PM

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
36 ↗	(On Diff #185651)	Oops ignore me on this one! I misread.

FWIW you might find wg21.link/n4455 relevant :)

Update:

Switch to listing valid ordering instead of invalid ones
Don't apply the transformation on volatile atomicrmw

Thanks for the link @jfb.

One fix then this LGTM

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39 ↗	(On Diff #185675)	Agreed NotAtomic is bonkers (`AtomicRMWInst::Init` disallows it), but Unordered might be sensible, @reames would know.

This revision is now accepted and ready to land.Feb 6 2019, 4:46 PM

qcolombet marked an inline comment as done.Feb 7 2019, 10:00 AM

qcolombet added inline comments.

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39 ↗	(On Diff #185675)	From the language reference (http://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints): Unordered [...] This ordering cannot be specified for read-modify-write operations; it is not strong enough to make them atomic in any interesting way. And if I try to write a test case with unordered I get: error: atomicrmw cannot be unordered So I think we are good :).

jfb added inline comments.Feb 7 2019, 10:31 AM

lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
39 ↗	(On Diff #185675)	I did not know this! Ignore me then.

Closed by commit rL353471: [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible (authored by qcolombet). · Explain WhyFeb 7 2019, 1:27 PM

This revision was automatically updated to reflect the committed changes.

reames added inline comments.Feb 8 2019, 10:05 PM

llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
21	Any plans to handle the other obvious cases such as xor, and, etc? On the inverse side, we can do the same thing converting to a blind store. e.g. and w/zero, or max w/INT_MAX. Just curious as to what plans here are.

qcolombet marked an inline comment as done.Feb 12 2019, 9:52 AM

qcolombet added inline comments.

llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
21	That would certainly make sense. I didn't see motivating examples yet, but we could just write test cases that exercises those. I'll take a look soon-ish unless you beat me at it :).

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 9:52 AM

JonChesterfield added a subscriber: JonChesterfield.Jan 7 2023, 2:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 2:53 PM

I believe that replacing atomicrmw <op> LHS, 0 with load atomic LHS. is not sound. There's an extensive discussion in a github issue https://github.com/llvm/llvm-project/issues/56450 which roughly concludes that this patch should be amended or reverted, but it doesn't seem to have tracked across to here.

In particular, anyone programming exclusively using thread fences, relaxed loads and stores, and relaxed atomicrmw, is going to have a bad time when this optimisation kicks in. That's roughly the model that nvptx steers one towards. I can work around this miscompilation by remembering to mark rmw operations as acquire-release instead of relaxed but I'd rather we fix it in trunk instead.

Hi @JonChesterfield ,

Reading through that issue, I agree.
Reverting it in https://reviews.llvm.org/D141277.

Cheers,
-Quentin

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 5:34 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

CMakeLists.txt

1 line

InstCombineAtomicRMW.cpp

48 lines

InstCombineInternal.h

1 line

test/

Transforms/

InstCombine/

atomicrmw.ll

84 lines

Diff 185860

llvm/trunk/lib/Transforms/InstCombine/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)			set(LLVM_TARGET_DEFINITIONS InstCombineTables.td)
	tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)			tablegen(LLVM InstCombineTables.inc -gen-searchable-tables)
	add_public_tablegen_target(InstCombineTableGen)			add_public_tablegen_target(InstCombineTableGen)

	add_llvm_library(LLVMInstCombine			add_llvm_library(LLVMInstCombine
	InstructionCombining.cpp			InstructionCombining.cpp
	InstCombineAddSub.cpp			InstCombineAddSub.cpp
				InstCombineAtomicRMW.cpp
	InstCombineAndOrXor.cpp			InstCombineAndOrXor.cpp
	InstCombineCalls.cpp			InstCombineCalls.cpp
	InstCombineCasts.cpp			InstCombineCasts.cpp
	InstCombineCompares.cpp			InstCombineCompares.cpp
	InstCombineLoadStoreAlloca.cpp			InstCombineLoadStoreAlloca.cpp
	InstCombineMulDivRem.cpp			InstCombineMulDivRem.cpp
	InstCombinePHI.cpp			InstCombinePHI.cpp
	InstCombineSelect.cpp			InstCombineSelect.cpp
	Show All 11 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp

				//===- InstCombineAtomicRMW.cpp -------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the visit functions for atomic rmw instructions.
				//
				//===----------------------------------------------------------------------===//
				#include "InstCombineInternal.h"
				#include "llvm/IR/Instructions.h"

				using namespace llvm;

				Instruction *InstCombiner::visitAtomicRMWInst(AtomicRMWInst &RMWI) {
				switch (RMWI.getOperation()) {
				default:
				break;
				case AtomicRMWInst::Add:
				reamesUnsubmitted Not Done Reply Inline Actions Any plans to handle the other obvious cases such as xor, and, etc? On the inverse side, we can do the same thing converting to a blind store. e.g. and w/zero, or max w/INT_MAX. Just curious as to what plans here are. reames: Any plans to handle the other obvious cases such as xor, and, etc? On the inverse side, we can…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions That would certainly make sense. I didn't see motivating examples yet, but we could just write test cases that exercises those. I'll take a look soon-ish unless you beat me at it :). qcolombet: That would certainly make sense. I didn't see motivating examples yet, but we could just write…
				case AtomicRMWInst::Sub:
				case AtomicRMWInst::Or:
				// Replace atomicrmw <op> addr, 0 => load atomic addr.

				// Volatile RMWs perform a load and a store, we cannot replace
				// this by just a load.
				if (RMWI.isVolatile())
				break;

				auto *CI = dyn_cast<ConstantInt>(RMWI.getValOperand());
				if (!CI \|\| !CI->isZero())
				break;
				// Check if the required ordering is compatible with an
				// atomic load.
				AtomicOrdering Ordering = RMWI.getOrdering();
				assert(Ordering != AtomicOrdering::NotAtomic &&
				Ordering != AtomicOrdering::Unordered &&
				"AtomicRMWs don't make sense with Unordered or NotAtomic");
				if (Ordering != AtomicOrdering::Acquire &&
				Ordering != AtomicOrdering::Monotonic)
				break;
				LoadInst *Load = new LoadInst(RMWI.getType(), RMWI.getPointerOperand());
				Load->setAtomic(Ordering, RMWI.getSyncScopeID());
				return Load;
				}
				return nullptr;
				}

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	public:
Instruction *SliceUpIllegalIntegerPHI(PHINode &PN);		Instruction *SliceUpIllegalIntegerPHI(PHINode &PN);
Instruction *visitPHINode(PHINode &PN);		Instruction *visitPHINode(PHINode &PN);
Instruction *visitGetElementPtrInst(GetElementPtrInst &GEP);		Instruction *visitGetElementPtrInst(GetElementPtrInst &GEP);
Instruction *visitAllocaInst(AllocaInst &AI);		Instruction *visitAllocaInst(AllocaInst &AI);
Instruction *visitAllocSite(Instruction &FI);		Instruction *visitAllocSite(Instruction &FI);
Instruction *visitFree(CallInst &FI);		Instruction *visitFree(CallInst &FI);
Instruction *visitLoadInst(LoadInst &LI);		Instruction *visitLoadInst(LoadInst &LI);
Instruction *visitStoreInst(StoreInst &SI);		Instruction *visitStoreInst(StoreInst &SI);
		Instruction *visitAtomicRMWInst(AtomicRMWInst &SI);
Instruction *visitBranchInst(BranchInst &BI);		Instruction *visitBranchInst(BranchInst &BI);
Instruction *visitFenceInst(FenceInst &FI);		Instruction *visitFenceInst(FenceInst &FI);
Instruction *visitSwitchInst(SwitchInst &SI);		Instruction *visitSwitchInst(SwitchInst &SI);
Instruction *visitReturnInst(ReturnInst &RI);		Instruction *visitReturnInst(ReturnInst &RI);
Instruction *visitInsertValueInst(InsertValueInst &IV);		Instruction *visitInsertValueInst(InsertValueInst &IV);
Instruction *visitInsertElementInst(InsertElementInst &IE);		Instruction *visitInsertElementInst(InsertElementInst &IE);
Instruction *visitExtractElementInst(ExtractElementInst &EI);		Instruction *visitExtractElementInst(ExtractElementInst &EI);
Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);		Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);
▲ Show 20 Lines • Show All 532 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/atomicrmw.ll

				; RUN: opt -instcombine -S -o - %s \| FileCheck %s
				; Check that we can replace `atomicrmw <op> LHS, 0` with `load atomic LHS`.
				; This is possible when:
				; - <op> LHS, 0 == LHS
				; - the ordering of atomicrmw is compatible with a load (i.e., no release semantic)

				; CHECK-LABEL: atomic_add_zero
				; CHECK-NEXT: %res = load atomic i32, i32* %addr monotonic, align 4
				; CHECK-NEXT: ret i32 %res
				define i32 @atomic_add_zero(i32* %addr) {
				%res = atomicrmw add i32* %addr, i32 0 monotonic
				ret i32 %res
				}

				; Don't transform volatile atomicrmw. This would eliminate a volatile store
				; otherwise.
				; CHECK-LABEL: atomic_sub_zero_volatile
				; CHECK-NEXT: %res = atomicrmw volatile sub i64* %addr, i64 0 acquire
				; CHECK-NEXT: ret i64 %res
				define i64 @atomic_sub_zero_volatile(i64* %addr) {
				%res = atomicrmw volatile sub i64* %addr, i64 0 acquire
				ret i64 %res
				}


				; Check that the transformation properly preserve the syncscope.
				; CHECK-LABEL: atomic_or_zero
				; CHECK-NEXT: %res = load atomic i16, i16* %addr syncscope("some_syncscope") acquire, align 2
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_or_zero(i16* %addr) {
				%res = atomicrmw or i16* %addr, i16 0 syncscope("some_syncscope") acquire
				ret i16 %res
				}

				; Don't transform seq_cst ordering.
				; By eliminating the store part of the atomicrmw, we would get rid of the
				; release semantic, which is incorrect.
				; CHECK-LABEL: atomic_or_zero_seq_cst
				; CHECK-NEXT: %res = atomicrmw or i16* %addr, i16 0 seq_cst
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_or_zero_seq_cst(i16* %addr) {
				%res = atomicrmw or i16* %addr, i16 0 seq_cst
				ret i16 %res
				}

				; Check that the transformation does not apply when the value is changed by
				; the atomic operation (non zero constant).
				; CHECK-LABEL: atomic_or_non_zero
				; CHECK-NEXT: %res = atomicrmw or i16* %addr, i16 2 monotonic
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_or_non_zero(i16* %addr) {
				%res = atomicrmw or i16* %addr, i16 2 monotonic
				ret i16 %res
				}

				; Check that the transformation does not apply when the value is changed by
				; the atomic operation (xor operation with zero).
				; CHECK-LABEL: atomic_xor_zero
				; CHECK-NEXT: %res = atomicrmw xor i16* %addr, i16 0 monotonic
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_xor_zero(i16* %addr) {
				%res = atomicrmw xor i16* %addr, i16 0 monotonic
				ret i16 %res
				}

				; Check that the transformation does not apply when the ordering is
				; incompatible with a load (release).
				; CHECK-LABEL: atomic_or_zero_release
				; CHECK-NEXT: %res = atomicrmw or i16* %addr, i16 0 release
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_or_zero_release(i16* %addr) {
				%res = atomicrmw or i16* %addr, i16 0 release
				ret i16 %res
				}

				; Check that the transformation does not apply when the ordering is
				; incompatible with a load (acquire, release).
				; CHECK-LABEL: atomic_or_zero_acq_rel
				; CHECK-NEXT: %res = atomicrmw or i16* %addr, i16 0 acq_rel
				; CHECK-NEXT: ret i16 %res
				define i16 @atomic_or_zero_acq_rel(i16* %addr) {
				%res = atomicrmw or i16* %addr, i16 0 acq_rel
				ret i16 %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possibleClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 185860

llvm/trunk/lib/Transforms/InstCombine/CMakeLists.txt

llvm/trunk/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/trunk/test/Transforms/InstCombine/atomicrmw.ll

[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
ClosedPublic