This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Target/AArch64/
-
trunk/
-
lib/
-
Target/
-
AArch64/
-
AArch64DeadRegisterDefinitionsPass.cpp
-
AArch64ISelDAGToDAG.cpp
-
AArch64ISelLowering.cpp
-
AArch64InstrAtomics.td

Differential D33586

ARMv8.1 support for LLVM AArch64
ClosedPublic

Authored by ajasty-cavium on May 25 2017, 11:19 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
echristo
christof
ajasty-cavium

Commits

rG1ee68828b210: [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
rGc1c28051d284: [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
rL305918: [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
rL305893: [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.

Summary

Proposed patch for ARMv8.1 Large System Extensions support.

Currently missing support for CASP, NAND (not supported by LSE instructions), and subword SUB/AND.

Requesting comments, looking into fixing subword SUB/AND, and writing tests (currently tested by atomic-ops.ll with -mcpu=thunderx2t99). Also, the ATOMIC_LOAD_CLR was added to generic IR, I suspect I should move it to AArch64ISD, feedback requested.

ATOMIC_LOAD_ADD with discarded return should probably be performed via a separate IR (ATOMIC_ADD, etc). This is straightforward to implement and should increase performance, RFC there too.

Finally, weaker memory ordering should be implemented, but even LDX/STX seem to use full acquire/relax even when __ATOMIC_RELAXED is specified. This will likely be looked into in a later patch.

Diff Detail

Repository: rL LLVM

Event Timeline

ajasty-cavium created this revision.May 25 2017, 11:19 PM

Herald added subscribers: javed.absar, rengolin, aemerson. · View Herald TranscriptMay 25 2017, 11:19 PM

ajasty-cavium edited the summary of this revision. (Show Details)May 25 2017, 11:21 PM

ajasty-cavium edited the summary of this revision. (Show Details)May 25 2017, 11:29 PM

tschuett added a subscriber: tschuett.May 26 2017, 12:29 AM

Hi, can you please re-share the patch with more context?

git diff -U9999

would do.

Thanks!

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2664 ↗	(On Diff #100368)	Wouldn't this be better inside SelectCMP_SWAP?

One other thing you have to be careful of: if an instruction loads into xzr or wzr then it's actually an "STwhatever" operation and defined not to have acquire semantics regardless of its mnemonic. So you probably have to modify AArch64DeadRegisterDefinitionsPass.cpp to ignore these instructions.

Also, you should add regression tests under tests/CodeGen/AArch64 to make sure the instructions you're expecting get emitted.

include/llvm/CodeGen/ISDOpcodes.h
763 ↗	(On Diff #100368)	This doesn't need to be a global ISD opcode since only AArch64 ever uses it. It can be added to the list in AArch64ISelLowering.h instead. Actually, you don't even need any special ISD handling for SUB or CLR. You can make an SDNodeXForm that inserts the appropriate - or ~ during selection. Something like def i32not : Operand<i64>, SDNodeXForm<imm, [{ return CurDAG->getTargetConstant(~N->getZExtValue(), SDLoc(N), MVT::i32); ]}>; If you use that in the destination part of the Pat it'll all work automatically.
lib/Target/AArch64/AArch64ISelLowering.cpp
453 ↗	(On Diff #100368)	Indentation and commented code here.
10620–10621 ↗	(On Diff #100368)	How does this work with your special ISD handling above? It seems like it means it would never trigger for bytes and halfs, but there are valid instructions there too.
lib/Target/AArch64/AArch64InstrAtomics.td
410 ↗	(On Diff #100368)	It looks like this only ever emits the operations with the full barrier, it would be good to support the more relaxed access modes (acquire/release/relaxed) too. multiclasses ought to be able to reduce a lot of the repetition, but what you'd fundamentally want to do is use PatFrags to cast the node to an AtomicSDNode and check its ordering. Maybe add an extra level in TargetSelectionDAG.td: multiclass binary_atomic_op_ord { def #NAME#_monotonic : PatFrag<(ops node:$ptr, node:$val), (!cast<SDNode>(#NAME) node:$ptr, node:$val), [{ return cast<AtomicSDNode>(N)->getOrdering() == AtomicOrdering::Monotonic; }]>; ... } then add a `defm NAME#_8 : binary_atomic_op_ord;` (and other sizes) to the binary_atomic_op multiclass. After that you'd be able to check `atomic_load_add_8_acquire` and so on in these patterns.

Fixed to full diff.

In D33586#765646, @t.p.northover wrote:

One other thing you have to be careful of: if an instruction loads into xzr or wzr then it's actually an "STwhatever" operation and defined not to have acquire semantics regardless of its mnemonic. So you probably have to modify AArch64DeadRegisterDefinitionsPass.cpp to ignore these instructions.

Yeah, this was a major concern of mine, technically it's an invalid mnemonic with ladaddal x0, xz, p. One solution was to custom lower all the atomic_load_X where operand 0 wasn't used, and create new AArch64ISD's for them. I wasn't thrilled about this for the first pass (because it seems to work anyway), but for the final it's something to consider.

Also, you should add regression tests under tests/CodeGen/AArch64 to make sure the instructions you're expecting get emitted.

Adapting CodeGen/AArch64/atomic-ops.ll, it's partly in place, will add to updated patch. Any other tests to look at for atomics? Also, I'm using -mcpu=thunderx2t99 for lse as -mcpu=generic+lse doesn't work, is there another preferred mechanism?

Thank you for all the great feedback btw!

lib/Target/AArch64/AArch64ISelLowering.cpp
10620–10621 ↗	(On Diff #100368)	Currently bytes and halfs cause issues because I didn't get the bitcast right, for the moment everything that isn't cleanly covered falls back to llsc.
lib/Target/AArch64/AArch64InstrAtomics.td
410 ↗	(On Diff #100368)	Agree completely on relaxed accesses, let me experiment with your PatFrag.

ajasty-cavium added a reviewer: christof.May 30 2017, 9:53 AM

kristof.beyls added a subscriber: kristof.beyls.May 31 2017, 12:21 AM

In D33586#765995, @ajasty-cavium wrote:

In D33586#765646, @t.p.northover wrote:

One other thing you have to be careful of: if an instruction loads into xzr or wzr then it's actually an "STwhatever" operation and defined not to have acquire semantics regardless of its mnemonic. So you probably have to modify AArch64DeadRegisterDefinitionsPass.cpp to ignore these instructions.

Yeah, this was a major concern of mine, technically it's an invalid mnemonic with ladaddal x0, xz, p. One solution was to custom lower all the atomic_load_X where operand 0 wasn't used, and create new AArch64ISD's for them. I wasn't thrilled about this for the first pass (because it seems to work anyway), but for the final it's something to consider.

I don't think you can say that it works that easily. The instruction will not implement the acquire semantics, so the memory model gets broken. Not an easy thing to track down once that happens. I think ignoring these instructions in AArch64DeadRegisterDefinitionsPass.cpp as suggested is a good approach.

? Also, I'm using -mcpu=thunderx2t99 for lse as -mcpu=generic+lse doesn't work, is there another preferred mechanism?

The target -march=8.1-a should be enough. I gave you patch a spin. The options -march=8.1-a -O1 I can see CASAL being used, but it is not used when passing -O0. I think this might be related to that comment in shouldExpandAtomicCmpXchgInIR().

bmakam added a subscriber: bmakam.May 31 2017, 12:32 PM

In D33586#768830, @christof wrote:

In D33586#765995, @ajasty-cavium wrote:

In D33586#765646, @t.p.northover wrote:

One other thing you have to be careful of: if an instruction loads into xzr or wzr then it's actually an "STwhatever" operation and defined not to have acquire semantics regardless of its mnemonic. So you probably have to modify AArch64DeadRegisterDefinitionsPass.cpp to ignore these instructions.

Yeah, this was a major concern of mine, technically it's an invalid mnemonic with ladaddal x0, xz, p. One solution was to custom lower all the atomic_load_X where operand 0 wasn't used, and create new AArch64ISD's for them. I wasn't thrilled about this for the first pass (because it seems to work anyway), but for the final it's something to consider.

I don't think you can say that it works that easily. The instruction will not implement the acquire semantics, so the memory model gets broken. Not an easy thing to track down once that happens. I think ignoring these instructions in AArch64DeadRegisterDefinitionsPass.cpp as suggested is a good approach.

Understood, this gives us correctness, will also have to ensure deadregister catches ordering and allows wzr for release or weaker.

? Also, I'm using -mcpu=thunderx2t99 for lse as -mcpu=generic+lse doesn't work, is there another preferred mechanism?

The target -march=8.1-a should be enough. I gave you patch a spin. The options -march=8.1-a -O1 I can see CASAL being used, but it is not used when passing -O0. I think this might be related to that comment in shouldExpandAtomicCmpXchgInIR().

I need to debug this, for -O0 there shouldn't be an issue with the split, the reason for not splitting in fastreg is to prevent a spill between the ldx/stx.

For my testing I see CASAL with '-O0 -march=v8.1-a', can you send me your -emit-llvm?

include/llvm/CodeGen/ISDOpcodes.h
763 ↗	(On Diff #100368)	One thing about moving this to AArch64ISD namespace, I'm writing my own getNode support for ATOMIC_LOAD_CLR specifically, is there a less intrusive mechanism you can think of?

Understood, this gives us correctness, will also have to ensure deadregister catches ordering and allows wzr for release or weaker.

They're different instructions so that shouldn't be too difficult. FWIW I don't think any kind of AArch64ISD node is necessary or useful for this problem.

include/llvm/CodeGen/ISDOpcodes.h
763 ↗	(On Diff #100368)	It doesn't seem like a terrible idea to relax the assert in SelectionDAG::getAtomic so that it allows target-specific nodes (by checking Opcode against FIRST_TARGET_MEMORY_OPCODE).

In D33586#769261, @ajasty-cavium wrote:

Understood, this gives us correctness, will also have to ensure deadregister catches ordering and allows wzr for release or weaker.

After further digging, I believe that WZR can't be used for any of the LD<OP> instructions. If you do, you are actually encoding ST<OP>. The issue is that a memory barrier that synchronises on loads will synchronise on LD<OP>, but not on ST<OP>. Hence, this transformation might break the memory ordering specified by the program.

After further digging, I believe that WZR can't be used for any of the LD<OP> instructions. If you do, you are actually encoding ST<OP>.

That's pretty much true. "ST<OP> ..." is an alias for "LD<OP> [wx]zr" so you're allowed to write both and they mean the same thing, with one caveat: if the instruction has acquire semantics there is no ST<OP> alias for it. You can only write "ldaddal wzr, w1, [x0]" that way, not as "staddal w1, [x1]" for example.

The issue is that a memory barrier that synchronises on loads will synchronise on LD<OP>, but not on ST<OP>. Hence, this transformation might break the memory ordering specified by the program.

Right. And the dead definitions pass is the only thing likely to insert an XZR behind your back since the register is reserved. (Obviously an explicit pattern could do it too, but you'd only write that where it's valid).

In D33586#770042, @t.p.northover wrote:

After further digging, I believe that WZR can't be used for any of the LD<OP> instructions. If you do, you are actually encoding ST<OP>.

That's pretty much true. "ST<OP> ..." is an alias for "LD<OP> [wx]zr" so you're allowed to write both and they mean the same thing, with one caveat: if the instruction has acquire semantics there is no ST<OP> alias for it. You can only write "ldaddal wzr, w1, [x0]" that way, not as "staddal w1, [x1]" for example.

I expect you mean ldaddal w1, wzr, [x0]. Those instructions have their target register as second operand. I dislike that instruction: There is no acquire semantics if the target register is wzr, even though the mnemonic makes it look like there is. It might be better if we diagnose such an instruction instead of using them. Also, I would not be surprised if that instruction behaves exactly the same as staddl w1 [x0].

The issue is that a memory barrier that synchronises on loads will synchronise on LD<OP>, but not on ST<OP>. Hence, this transformation might break the memory ordering specified by the program.

Right. And the dead definitions pass is the only thing likely to insert an XZR behind your back since the register is reserved. (Obviously an explicit pattern could do it too, but you'd only write that where it's valid).

That is certainly what it looks like to me. I suggest to err on the safe side and not use the WZR/XZR as target for any of these LD<OP> instructions.

That is certainly what it looks like to me. I suggest to err on the safe side and not use the WZR/XZR as target for any of these LD<OP> instructions.

I think that's a bit strong, we know exactly what circumstances you can use ZR safely, and about the only way we can ban it in DeadRegisters is with an explicit list of instructions anyway, so just leave the ones where it's valid out of that list.

Oh, and remember this same issue applies to SWP (but not CAS).

mcrosier added a subscriber: mcrosier.Jun 5 2017, 6:39 AM

Updated with dead-register checking, added test.

Temporarily removed SUB/AND, will commit as separate patch.

Updated with full context.

Seems like a good start for 8.1-A atomics to me. There are some things that might be improved upon, like supporting weaker orderings. Do you plan on doing that work as well?
Just a few questions inline.

lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp
88 ↗	(On Diff #101591)	I think this will also catch a load-exclusive. As far as I know those are ok with XZR/WZR as target register. The only problematic once are the `atomicrmw` operations. These are SWP and the LD<OP> where <OP> is ADD, CLR, EOR, SET, SMAX, SMIN, UMAX, and UMIN. But then again, it at least is safe to overestimate.
lib/Target/AArch64/AArch64ISelLowering.cpp
10586 ↗	(On Diff #101591)	Any reason for expanding cmpxchg again? It was not mentioned in your comment and the test you added will probably fail on it.
test/CodeGen/AArch64/atomic-ops-lse.ll
1–2 ↗	(On Diff #101591)	Why not -mattr=+v8.1a instead of -mcpu=thunderx2t99 ?

In D33586#775316, @christof wrote:

Seems like a good start for 8.1-A atomics to me. There are some things that might be improved upon, like supporting weaker orderings. Do you plan on doing that work as well?

Absolutely, and SUB/AND support. Also, for AtomicOrdering without acquire/seq-cst I'll re-enable DeadRegister-s and xform to ST(OP).

Would prefer to land this patch before adding the weaker ordering, but ST(OP) is dependent on weaker ordering. In any case this should perform equal to, or better than current LDX/STX on hardware that supports it.

In any case this should perform equal to, or better than current LDX/STX on hardware that supports it.

For seq_cst and acq_rel accesses, yes. Switching from ldxr/stxr to instructions with full barriers is rather more dubious though. As long as you're planning to support them soon though, I've got no real objections.

lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp
88 ↗	(On Diff #101591)	It'll also catch plain "load atomic" operations. I think a blacklist of instructions is probably best.

In D33586#775434, @ajasty-cavium wrote:

Would prefer to land this patch before adding the weaker ordering, but ST(OP) is dependent on weaker ordering. In any case this should perform equal to, or better than current LDX/STX on hardware that supports it.

For that to happen, you need to get test/CodeGen/AArch64/atomic-ops-lse.ll and shouldExpandAtomicCmpXchgInIR() to agree with each other. As it stands, the test will fail because it expect CAS while the code is now expanding it. See my previous comment.

Also, I think it is best to follow Tim's suggestion and make AArch64DeadRegisterDefinitions.cpp match only the AArch::LD<OP>[A|L|AL] instructions instead of the blanket isAtomic(). The danger here is that somebody will see a regression when this is committed. Typically this means that the patch that causes it is reverted if not fixed within the day. Better not to introduce that regression.

Fixed dead-register blacklist, and disabled expandInIR for cmp-swap. Passes make check and individual test. Changed test invocation to -mattr=+lse.

Looks good to me.
If you could still add a comment in the source on why you blacklist the instruction in the DeadRegisterDefinitionPass that would be appreciated.
Thanks for the nice patch. Looking forward to the patterns for the more relaxed memory models.

lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp
89 ↗	(On Diff #102203)	If you can leave a comment here explaining the reason for this black listing, that would be great for future eyes. Something among the lines of: `// The atomic instructions part of LSE have different memory ordering semantics when targeting the zero register.`

This revision is now accepted and ready to land.Jun 15 2017, 2:37 AM

Sorry, I was too quick. That blacklist is not working properly. Also, it would be good to add some test that show the blacklisting in operation. I just tried an example:

define void @test_atomic_load_add_i32(i32 %offset) nounwind {
     %old = atomicrmw add i32* @var32, i32 %offset seq_cst
   ret void
}

Which should not use WZR, but still uses it with the latest patch.

lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp
99 ↗	(On Diff #102203)	Hold on. These are Selection DAG instructions. But this pass actually runs just before register allocation and after all DAG lowering is done, does it not? So you'll need to use `AArch64:: LDADDALs` and such here, or you will never match these instructions.

This revision now requires changes to proceed.Jun 15 2017, 3:45 AM

Found something else.

lib/Target/AArch64/AArch64InstrAtomics.td
445 ↗	(On Diff #102203)	The CAS and CASP instructions have a different operand order. Correct is: CASALb $Rs, $Rt, $Rn This also applies for the patterns below.
test/CodeGen/AArch64/atomic-ops-lse.ll
471 ↗	(On Diff #102203)	I was wondering if you can check the operand order here. I think that since OLD and NEW are directly coming from the function parameters, you should be able to match on `w0` and `w1` directly. Meaning this could be: casal w0, w1, [x[[ADDR]]] That catches any unintended swapping of operands. This holds for all the cmpxchg tests.

Fixed CAS, renamed operands to "old/new" for clarity. Also updated test to use x0/x1 to confirm proper operand order.

Also reimplemented blacklist using all aarch64 opcodes.

In D33586#783960, @ajasty-cavium wrote:

Fixed CAS, renamed operands to "old/new" for clarity. Also updated test to use x0/x1 to confirm proper operand order.

Great!

Also reimplemented blacklist using all aarch64 opcodes.

Did you test that this blacklist now indeed works as expected? If you have such a test, can you add it. Thanks.

Also reimplemented blacklist using all aarch64 opcodes.

Did you test that this blacklist now indeed works as expected? If you have such a test, can you add it. Thanks.

Tested using my development code scrap, will have to make many more test cases when ST(OP)L and barriers are enabled:

78:   f8e8012a        ldaddal x8, x10, [x9]
7c:   f8e82128        ldeoral x8, x8, [x9]

__atomic_fetch_add(&i, l, __ATOMIC_RELAXED);
      j = __sync_fetch_and_xor(&i, l);

lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp
88 ↗	(On Diff #101591)	Did not think that LDX would be issued with a XZR, though now I can see it as a SEQ_CST atomic store. Good catch.
88 ↗	(On Diff #101591)	Point taken, will look at the blacklist.
lib/Target/AArch64/AArch64ISelLowering.cpp
10586 ↗	(On Diff #101591)	Reversed this.
lib/Target/AArch64/AArch64InstrAtomics.td
445 ↗	(On Diff #102203)	My first development test case used __atomic_compare_exchange_n without checking the behavior (it could see false positives from the builtin as llvm cmps expected and desired). Reworked my test to actually run the cas in 4 cases to confirm returned value (not just success bool) as appropriate, seems to handle cases properly. Also renamed operands for clarity.
test/CodeGen/AArch64/atomic-ops-lse.ll
1–2 ↗	(On Diff #101591)	Actually +v8.1a does not give you LSE, but I found -mattr=+lse does. Will add to update.

Added additional test cases for dead-register.

OK, it looks good to me now. Thanks.

This revision is now accepted and ready to land.Jun 21 2017, 1:43 AM

Thank you Chris, I do not have commit rights, should I forward to the mailing list?

In D33586#786349, @ajasty-cavium wrote:

Thank you Chris, I do not have commit rights, should I forward to the mailing list?

I can commit it on your behalf if you want me to.

In D33586#786361, @christof wrote:

In D33586#786349, @ajasty-cavium wrote:

Thank you Chris, I do not have commit rights, should I forward to the mailing list?

I can commit it on your behalf if you want me to.

Would greatly appreciate it!

In D33586#786362, @ajasty-cavium wrote:

I can commit it on your behalf if you want me to.

Would greatly appreciate it!

You want me to use the summary as commit message?

In D33586#786369, @christof wrote:

In D33586#786362, @ajasty-cavium wrote:

I can commit it on your behalf if you want me to.

Would greatly appreciate it!

You want me to use the summary as commit message?

commit fca5da5b3fa75a7aff1ac622d3a1af44deb0d369
Author: Ananth Jasty <ajasty@cavium.com>
Date: Wed Jun 21 01:55:06 2017 -0700

[AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.

Implemented support to AArch64 codegen for ARMv8.1 Large System Extensions atomic instructions. Where supported, these instructions can provide atomic operations with higher performance.

Currently supported operations include: fetch_add, fetch_or, fetch_xor, fetch_smin, fetch_min/max (signed and unsigned), swap, and compare_exchange.

This implementation implies sequential-consistency ordering, more relaxed ordering is under development.

Subtarget->hasLSE is currently supported for Cavium ThunderX2T99.

Differential Revision: https://reviews.llvm.org/D33586

Closed by commit rL305893: [AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics. (authored by christof). · Explain WhyJun 21 2017, 3:59 AM

This revision was automatically updated to reflect the committed changes.

Is it possible that the tests were excluded from the commit?

In D33586#786691, @tschuett wrote:

Is it possible that the tests were excluded from the commit?

Ugh, yes. It seems like I missed adding the new file after applying this patch. I'll commit it now. Thanks for flagging it.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64DeadRegisterDefinitionsPass.cpp

47 lines

AArch64ISelDAGToDAG.cpp

15 lines

AArch64ISelLowering.cpp

11 lines

AArch64InstrAtomics.td

46 lines

Diff 103343

llvm/trunk/lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp

//==-- AArch64DeadRegisterDefinitions.cpp - Replace dead defs w/ zero reg --==//		//==-- AArch64DeadRegisterDefinitions.cpp - Replace dead defs w/ zero reg --==//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file When allowed by the instruction, replace a dead definition of a GPR		/// \file When allowed by the instruction, replace a dead definition of a GPR
/// with the zero register. This makes the code a bit friendlier towards the		/// with the zero register. This makes the code a bit friendlier towards the
/// hardware's register renamer.		/// hardware's register renamer.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64.h"		#include "AArch64.h"
#include "AArch64RegisterInfo.h"		#include "AArch64RegisterInfo.h"
		#include "AArch64Subtarget.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (usesFrameIndex(MI)) {
continue;		continue;
}		}
if (MI.definesRegister(AArch64::XZR) \|\| MI.definesRegister(AArch64::WZR)) {		if (MI.definesRegister(AArch64::XZR) \|\| MI.definesRegister(AArch64::WZR)) {
// It is not allowed to write to the same register (not even the zero		// It is not allowed to write to the same register (not even the zero
// register) twice in a single instruction.		// register) twice in a single instruction.
DEBUG(dbgs() << " Ignoring, XZR or WZR already used by the instruction\n");		DEBUG(dbgs() << " Ignoring, XZR or WZR already used by the instruction\n");
continue;		continue;
}		}
		if (MF.getSubtarget<AArch64Subtarget>().hasLSE()) {
		// XZ/WZ for LSE can only be used when acquire semantics are not used,
		// LDOPAL WZ is an invalid opcode.
		switch (MI.getOpcode()) {
		case AArch64::CASALb:
		case AArch64::CASALh:
		case AArch64::CASALs:
		case AArch64::CASALd:
		case AArch64::SWPALb:
		case AArch64::SWPALh:
		case AArch64::SWPALs:
		case AArch64::SWPALd:
		case AArch64::LDADDALb:
		case AArch64::LDADDALh:
		case AArch64::LDADDALs:
		case AArch64::LDADDALd:
		case AArch64::LDEORALb:
		case AArch64::LDEORALh:
		case AArch64::LDEORALs:
		case AArch64::LDEORALd:
		case AArch64::LDSETALb:
		case AArch64::LDSETALh:
		case AArch64::LDSETALs:
		case AArch64::LDSETALd:
		case AArch64::LDSMINALb:
		case AArch64::LDSMINALh:
		case AArch64::LDSMINALs:
		case AArch64::LDSMINALd:
		case AArch64::LDSMAXALb:
		case AArch64::LDSMAXALh:
		case AArch64::LDSMAXALs:
		case AArch64::LDSMAXALd:
		case AArch64::LDUMINALb:
		case AArch64::LDUMINALh:
		case AArch64::LDUMINALs:
		case AArch64::LDUMINALd:
		case AArch64::LDUMAXALb:
		case AArch64::LDUMAXALh:
		case AArch64::LDUMAXALs:
		case AArch64::LDUMAXALd:
		continue;
		default:
		break;
		}
		}
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
for (int I = 0, E = Desc.getNumDefs(); I != E; ++I) {		for (int I = 0, E = Desc.getNumDefs(); I != E; ++I) {
MachineOperand &MO = MI.getOperand(I);		MachineOperand &MO = MI.getOperand(I);
if (!MO.isReg() \|\| !MO.isDef())		if (!MO.isReg() \|\| !MO.isDef())
continue;		continue;
// We should not have any relevant physreg defs that are replacable by		// We should not have any relevant physreg defs that are replacable by
// zero before register allocation. So we just check for dead vreg defs.		// zero before register allocation. So we just check for dead vreg defs.
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	private:

template<unsigned RegWidth>		template<unsigned RegWidth>
bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {
return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);		return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);
}		}

bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos, unsigned Width);		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos, unsigned Width);

void SelectCMP_SWAP(SDNode *N);		bool SelectCMP_SWAP(SDNode *N);

};		};
} // end anonymous namespace		} // end anonymous namespace

/// isIntImmediate - This method tests to see if the node is a constant		/// isIntImmediate - This method tests to see if the node is a constant
/// operand. If so Imm will receive the 32-bit value.		/// operand. If so Imm will receive the 32-bit value.
static bool isIntImmediate(const SDNode *N, uint64_t &Imm) {		static bool isIntImmediate(const SDNode *N, uint64_t &Imm) {
if (const ConstantSDNode *C = dyn_cast<const ConstantSDNode>(N)) {		if (const ConstantSDNode *C = dyn_cast<const ConstantSDNode>(N)) {
▲ Show 20 Lines • Show All 2,391 Lines • ▼ Show 20 Lines	ReplaceNode(N, CurDAG->getMachineNode(
N->getOperand(2), N->getOperand(0)));		N->getOperand(2), N->getOperand(0)));
return true;		return true;
}		}

return false;		return false;
}		}

/// We've got special pseudo-instructions for these		/// We've got special pseudo-instructions for these
void AArch64DAGToDAGISel::SelectCMP_SWAP(SDNode *N) {		bool AArch64DAGToDAGISel::SelectCMP_SWAP(SDNode *N) {
unsigned Opcode;		unsigned Opcode;
EVT MemTy = cast<MemSDNode>(N)->getMemoryVT();		EVT MemTy = cast<MemSDNode>(N)->getMemoryVT();

		// Leave IR for LSE if subtarget supports it.
		if (Subtarget->hasLSE()) return false;

if (MemTy == MVT::i8)		if (MemTy == MVT::i8)
Opcode = AArch64::CMP_SWAP_8;		Opcode = AArch64::CMP_SWAP_8;
else if (MemTy == MVT::i16)		else if (MemTy == MVT::i16)
Opcode = AArch64::CMP_SWAP_16;		Opcode = AArch64::CMP_SWAP_16;
else if (MemTy == MVT::i32)		else if (MemTy == MVT::i32)
Opcode = AArch64::CMP_SWAP_32;		Opcode = AArch64::CMP_SWAP_32;
else if (MemTy == MVT::i64)		else if (MemTy == MVT::i64)
Opcode = AArch64::CMP_SWAP_64;		Opcode = AArch64::CMP_SWAP_64;
Show All 9 Lines	bool AArch64DAGToDAGISel::SelectCMP_SWAP(SDNode *N) {

MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
cast<MachineSDNode>(CmpSwap)->setMemRefs(MemOp, MemOp + 1);		cast<MachineSDNode>(CmpSwap)->setMemRefs(MemOp, MemOp + 1);

ReplaceUses(SDValue(N, 0), SDValue(CmpSwap, 0));		ReplaceUses(SDValue(N, 0), SDValue(CmpSwap, 0));
ReplaceUses(SDValue(N, 1), SDValue(CmpSwap, 2));		ReplaceUses(SDValue(N, 1), SDValue(CmpSwap, 2));
CurDAG->RemoveDeadNode(N);		CurDAG->RemoveDeadNode(N);

		return true;
}		}

void AArch64DAGToDAGISel::Select(SDNode *Node) {		void AArch64DAGToDAGISel::Select(SDNode *Node) {
// Dump information about the Node being selected		// Dump information about the Node being selected
DEBUG(errs() << "Selecting: ");		DEBUG(errs() << "Selecting: ");
DEBUG(Node->dump(CurDAG));		DEBUG(Node->dump(CurDAG));
DEBUG(errs() << "\n");		DEBUG(errs() << "\n");

// If we have a custom node, we already have selected!		// If we have a custom node, we already have selected!
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");		DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");
Node->setNodeId(-1);		Node->setNodeId(-1);
return;		return;
}		}

// Few custom selection stuff.		// Few custom selection stuff.
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

switch (Node->getOpcode()) {		switch (Node->getOpcode()) {
default:		default:
break;		break;

case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
SelectCMP_SWAP(Node);		if (SelectCMP_SWAP(Node))
return;		return;
		break;

case ISD::READ_REGISTER:		case ISD::READ_REGISTER:
if (tryReadRegister(Node))		if (tryReadRegister(Node))
return;		return;
break;		break;

case ISD::WRITE_REGISTER:		case ISD::WRITE_REGISTER:
if (tryWriteRegister(Node))		if (tryWriteRegister(Node))
▲ Show 20 Lines • Show All 1,336 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,557 Lines • ▼ Show 20 Lines	AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
unsigned Size = LI->getType()->getPrimitiveSizeInBits();		unsigned Size = LI->getType()->getPrimitiveSizeInBits();
return Size == 128 ? AtomicExpansionKind::LLSC : AtomicExpansionKind::None;		return Size == 128 ? AtomicExpansionKind::LLSC : AtomicExpansionKind::None;
}		}

// For the real atomic operations, we have ldxr/stxr up to 128 bits,		// For the real atomic operations, we have ldxr/stxr up to 128 bits,
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {		AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
unsigned Size = AI->getType()->getPrimitiveSizeInBits();		unsigned Size = AI->getType()->getPrimitiveSizeInBits();
return Size <= 128 ? AtomicExpansionKind::LLSC : AtomicExpansionKind::None;		if (Size > 128) return AtomicExpansionKind::None;
		// Nand not supported in LSE.
		if (AI->getOperation() == AtomicRMWInst::Nand) return AtomicExpansionKind::LLSC;
		// Currently leaving And and Sub to LLSC
		if ((AI->getOperation() == AtomicRMWInst::And) \|\| (AI->getOperation() == AtomicRMWInst::Sub))
		return AtomicExpansionKind::LLSC;
		// Leave 128 bits to LLSC.
		return (Subtarget->hasLSE() && Size < 128) ? AtomicExpansionKind::None : AtomicExpansionKind::LLSC;
}		}

bool AArch64TargetLowering::shouldExpandAtomicCmpXchgInIR(		bool AArch64TargetLowering::shouldExpandAtomicCmpXchgInIR(
AtomicCmpXchgInst *AI) const {		AtomicCmpXchgInst *AI) const {
		// If subtarget has LSE, leave cmpxchg intact for codegen.
		if (Subtarget->hasLSE()) return false;
// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement cmpxchg without spilling. If the address being exchanged is also		// implement cmpxchg without spilling. If the address being exchanged is also
// on the stack and close enough to the spill slot, this can lead to a		// on the stack and close enough to the spill slot, this can lead to a
// situation where the monitor always gets cleared and the atomic operation		// situation where the monitor always gets cleared and the atomic operation
// can never succeed. So at -O0 we need a late-expanded pseudo-inst instead.		// can never succeed. So at -O0 we need a late-expanded pseudo-inst instead.
return getTargetMachine().getOptLevel() != 0;		return getTargetMachine().getOptLevel() != 0;
}		}

▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrAtomics.td

	Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
	}			}

	let Constraints = "@earlyclobber $RdLo,@earlyclobber $RdHi,@earlyclobber $scratch",			let Constraints = "@earlyclobber $RdLo,@earlyclobber $RdHi,@earlyclobber $scratch",
	mayLoad = 1, mayStore = 1 in			mayLoad = 1, mayStore = 1 in
	def CMP_SWAP_128 : Pseudo<(outs GPR64:$RdLo, GPR64:$RdHi, GPR32:$scratch),			def CMP_SWAP_128 : Pseudo<(outs GPR64:$RdLo, GPR64:$RdHi, GPR32:$scratch),
	(ins GPR64:$addr, GPR64:$desiredLo, GPR64:$desiredHi,			(ins GPR64:$addr, GPR64:$desiredLo, GPR64:$desiredHi,
	GPR64:$newLo, GPR64:$newHi), []>,			GPR64:$newLo, GPR64:$newHi), []>,
	Sched<[WriteAtomic]>;			Sched<[WriteAtomic]>;

				// v8.1 Atomic instructions:
				def : Pat<(atomic_load_add_8 GPR64:$Rn, GPR32:$Rs), (LDADDALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_add_16 GPR64:$Rn, GPR32:$Rs), (LDADDALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_add_32 GPR64:$Rn, GPR32:$Rs), (LDADDALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_add_64 GPR64:$Rn, GPR64:$Rs), (LDADDALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_or_8 GPR64:$Rn, GPR32:$Rs), (LDSETALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_or_16 GPR64:$Rn, GPR32:$Rs), (LDSETALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_or_32 GPR64:$Rn, GPR32:$Rs), (LDSETALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_or_64 GPR64:$Rn, GPR64:$Rs), (LDSETALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_xor_8 GPR64:$Rn, GPR32:$Rs), (LDEORALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_xor_16 GPR64:$Rn, GPR32:$Rs), (LDEORALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_xor_32 GPR64:$Rn, GPR32:$Rs), (LDEORALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_xor_64 GPR64:$Rn, GPR64:$Rs), (LDEORALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_max_8 GPR64:$Rn, GPR32:$Rs), (LDSMAXALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_max_16 GPR64:$Rn, GPR32:$Rs), (LDSMAXALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_max_32 GPR64:$Rn, GPR32:$Rs), (LDSMAXALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_max_64 GPR64:$Rn, GPR64:$Rs), (LDSMAXALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_umax_8 GPR64:$Rn, GPR32:$Rs), (LDUMAXALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umax_16 GPR64:$Rn, GPR32:$Rs), (LDUMAXALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umax_32 GPR64:$Rn, GPR32:$Rs), (LDUMAXALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umax_64 GPR64:$Rn, GPR64:$Rs), (LDUMAXALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_min_8 GPR64:$Rn, GPR32:$Rs), (LDSMINALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_min_16 GPR64:$Rn, GPR32:$Rs), (LDSMINALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_min_32 GPR64:$Rn, GPR32:$Rs), (LDSMINALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_min_64 GPR64:$Rn, GPR64:$Rs), (LDSMINALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_load_umin_8 GPR64:$Rn, GPR32:$Rs), (LDUMINALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umin_16 GPR64:$Rn, GPR32:$Rs), (LDUMINALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umin_32 GPR64:$Rn, GPR32:$Rs), (LDUMINALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_load_umin_64 GPR64:$Rn, GPR64:$Rs), (LDUMINALd GPR64:$Rs, GPR64sp:$Rn)>;

				def : Pat<(atomic_cmp_swap_8 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALb GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
				def : Pat<(atomic_cmp_swap_16 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALh GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
				def : Pat<(atomic_cmp_swap_32 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALs GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
				def : Pat<(atomic_cmp_swap_64 GPR64:$Rn, GPR64:$Rold, GPR64:$Rnew), (CASALd GPR64:$Rold, GPR64:$Rnew, GPR64sp:$Rn)>;

				def : Pat<(atomic_swap_8 GPR64:$Rn, GPR32:$Rs), (SWPALb GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_swap_16 GPR64:$Rn, GPR32:$Rs), (SWPALh GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_swap_32 GPR64:$Rn, GPR32:$Rs), (SWPALs GPR32:$Rs, GPR64sp:$Rn)>;
				def : Pat<(atomic_swap_64 GPR64:$Rn, GPR64:$Rs), (SWPALd GPR64:$Rs, GPR64sp:$Rn)>;