Download Raw Diff

Details

Reviewers

joel_k_jones
fhahn
evandro
mcrosier

Commits

rG0715092c65ba: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.
rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.

Summary

This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Diff Detail

Repository: rL LLVM

Event Timeline

steleman created this revision.Nov 30 2017, 5:19 PM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptNov 30 2017, 5:20 PM

steleman mentioned this in D40177: performance improvements for ThunderX2 T99.Nov 30 2017, 5:23 PM

My points from D40177 still stand.

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

In D40696#941556, @steleman wrote:

In D40696#941537, @MatzeB wrote:

My points from D40177 still stand.

Your points from D40177 require unnecessary changes to too many files, and do not prevent changes to AArch64ISelLowering.cpp whenever a new micro-arch wants to enable aggressive FMA.

Furthermore, every single time a new micro-arch wants to enable aggressive FMA, they will always have to edit two TU's instead of just one: AArch64.td and AArch64ISelLowering.cpp, instead of just adding a new line to the existing switch() statement in AArch64TargetLowering::enableAggressiveFMAFusion().

You would not need to edit AArch64ISelLowering again after adding a subtargetfeature.

fhahn added a subscriber: fhahn.Dec 1 2017, 1:27 AM

fhahn added inline comments.Dec 1 2017, 8:00 AM

test/CodeGen/AArch64/fma-aggressive.ll
1 ↗	(On Diff #125061)	I think it would be good to also add a check that aggressive FMA is not enabled for other micro architectures.
323 ↗	(On Diff #125061)	Can we drop the meta data from here to the end or is it required for the test?

steleman added inline comments.Dec 1 2017, 8:09 AM

test/CodeGen/AArch64/fma-aggressive.ll
1 ↗	(On Diff #125061)	OK I'll add another test named 'fma-not-aggressive.ll' :-)
323 ↗	(On Diff #125061)	I'll check and if it works without the metadata, I'll remove it.

fhahn added inline comments.Dec 1 2017, 9:21 AM

test/CodeGen/AArch64/fma-aggressive.ll
1 ↗	(On Diff #125061)	Thanks. You should be able just use a `check-prefix` in the same file, e.g. as in https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AArch64/preferred-function-alignment.ll
323 ↗	(On Diff #125061)	Great thanks. On second look, I think there is no need for the test case to be that complicated, for example, do we need all those global variables, function calls, ifs, loops? AFAIK DagCombine only looks at nodes and uses and does not have a cost model tied to loop iterations. Unless I am missing something, it should be enough to have a couple of basic blocks with fadd/fmul instructions to test this, e.g. fadd/fmul with multiple users, or the more complex patterns guarded by `if (Aggressive)` in `visitFADDForFMACombine`. A much simpler test case will make it easier to debug if it fails in the future and also easier to understand when reviewing if it does the right thing.

steleman added inline comments.Dec 4 2017, 7:54 AM

test/CodeGen/AArch64/fma-aggressive.ll

323 ↗

(On Diff #125061)

It's not as simple as you think.

DAGCombiner.cpp:9636

// fold (fmul (fadd x, +1.0), y) -> (fma x, y, y)
// fold (fmul (fadd x, -1.0), y) -> (fma x, y, (fneg y))
auto FuseFADD = [&](SDValue X, SDValue Y) {
  if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse())) {
    auto XC1 = isConstOrConstSplatFP(X.getOperand(1));
    if (XC1 && XC1->isExactlyValue(+1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y, Y);
    if (XC1 && XC1->isExactlyValue(-1.0))
      return DAG.getNode(PreferredFusedOpcode, SL, VT, X.getOperand(0), Y,
                         DAG.getNode(ISD::FNEG, SL, VT, Y));
  }
  return SDValue();
};

Note:

if (X.getOpcode() == ISD::FADD && (Aggressive || X->hasOneUse()))

What we want to do here is make sure that X has more than one use. I.e. X->hasOneUse() evaluates to false. That's the only way of testing that FMA happens when Aggressive is true.

In order to make sure that X ends up having more than one use, the code needs to be complicated.

This code isn't lifted from some existing program. I wrote it specifically for this purpose. If there was no need for these loops, ifs, global variables, etc, I wouldn't have written them.

fhahn added inline comments.Dec 4 2017, 8:29 AM

test/CodeGen/AArch64/fma-aggressive.ll
323 ↗	(On Diff #125061)	You do not need complicated code to have get multiple uses. In the function below, %mul has 2 uses and will only be combined by DAGCombine, if aggressive FMA fusion is enabled (that's the pattern defined in DAGCombiner.cpp:9099) define double @test(double %x, double %y, double %z) { %mul = fmul fast double %x, %y %add = fadd fast double %mul, %z %use2 = fdiv fast double %mul, %add ret double %use2 }

On a related note, the MachineCombiner ignores enableAggressiveFMAFusion, so at -O3, the example won't be combined. I plan to put up a patch soon that updates the machine combiner to consider combining multiple uses, if profitable.

steleman added inline comments.Dec 5 2017, 9:11 AM

test/CodeGen/AArch64/fma-aggressive.ll
323 ↗	(On Diff #125061)	I will be happy to include your test case in a separate, and different test for Aggressive FMA, but I do not see how your test demonstrates Aggressive FMA as triggered by the compilation of a real program, written in a language that software developers realistically write software in. Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture.

Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}.
Included Florian Hahn's small test case for FMA.
Metadata in the large LLVM IR test case is required.

Thanks! I agree with Matthias in that a target feature would be better to enable/disable aggressive FMA. In AArch64.td there are already quite a lot of similar target features, like FeatureFuseAES or FeatureSlowSTRQro that enable certain optimizations for some micro architectures.

test/CodeGen/AArch64/fma-aggressive.ll
323 ↗	(On Diff #125061)	Also: your test does not provide any method of validating the results produced by enabling Aggressive FMA. Nor does it provide an easy method of detecting that some potential future changes have introduced a regression. Sorry, I just intended to highlight how to create a function that tests an isolated pattern that is triggered by aggressive FMA. Ideally we would have similar functions for all important patterns. Then the test should catch all relevant regressions in DAGCombine with aggressive FMA. For Aggressive FMA, I believe that it is important to have a real-life program that can be compiled and run, and whose results can be compared against the results produced by a different compiler, preferably on a different architecture. Agreed! But for that, the LLVM test-suite is probably a better place to make sure there are no performance regressions as it uses Clang directly, like users would, see http://llvm.org/docs/TestingGuide.html . `llvm/test` should contain small, isolated regression tests. I see why it is tempting to add a big test case like that to llvm/test, but 1) it still only guards against changes in Codegen, but opt or clang could mess up the IR to begin with and 2) it makes the regression test suite more fragile than it needs to be. If others are happy with the big test, I won't object though

MatzeB added inline comments.Dec 7 2017, 1:02 PM

test/CodeGen/AArch64/fma-aggressive.ll
323 ↗	(On Diff #125061)	Yes bigger tests are better of in the test-suite; you summed up llvms testing strategy nicely!

steleman added a child revision: D41810: test case for Aggressive FMA on AArch64.Jan 7 2018, 11:56 AM

Updated diff with latest changes:

Added FeatureAggressiveFMAFloat and FeatureAggressiveFMADouble in AArch64.td as micro-arch features.
Added EnableAggressiveFMA cl::opt in AArch64ISelLowering.cpp.
Implemented AArch64TargetLowering::enableAggressiveFMAFusion in terms of the changes from AArch64.td.
I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

Ping!

It looks like the latest changes address the earlier concerns expressed in the review comments:

Use target feature: Present in AArch64.td, AArch64ISelLowering.h, AArch64ISelLowering.cpp, and AArch64Subtarget.h
Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll
Combined optimized/missed in .ll tests: Present in fma-aggressive.ll and fma-simple.ll

Is there anything else that is needed?

Thanks for updating the patch! I have a few suggestions about reducing fma-aggressive.ll , but besides that and removing aarch64-enable-aggressive-fma it looks quite good.

I would like the test in test/CodeGen/AArch64/fma-aggressive.ll to be included in this changeset, simply because the run-time test from D41810 has no way of determining whether or not AggressiveFMA was enabled, or not.

It's fine to have a more complex test with more patterns than test/CodeGen/AArch64/fma-simple.ll, but test/CodeGen/AArch64/fma-aggressive.ll still contains lots of stuff unrelated the FMA fusion, e.g. define double @reset(double %x, double %y) local_unnamed_addr #0 is not used, the attributes and debug metadata is not needed, the global variables are not needed, the calls to fprint/frwite and so on are not needed and the loop should have not have an impact on the DAGCombiner.

I think the following test function should still expose patterns very similar to the original fma-aggressive.ll, but only focuses on the relevant bits for fusion:

define double @test1(double %a, double %b, double %conv, i32 %rem) {
entry:
  %conv.neg = fsub fast double -0.000000e+00, %conv
  %cmp3 = icmp eq i32 %rem, 0
  %. = select i1 %cmp3, double 1.000000e+00, double -1.000000e+00
  %add = fadd fast double %a, 1.000000e+00
  %add8 = fadd fast double %add, %.
  %mul = fmul fast double %add8, %a
  %add9 = fadd fast double %mul, 1.000000e+01
  %mul10 = fmul fast double %add9, 2.000000e+00
  %add11 = fsub fast double %conv.neg, %mul
  %sub = fadd fast double %add11, %mul10
  %mul14 = fmul fast double %sub, %mul
  %mul15 = fmul fast double %mul14, %.
  %sub17 = fadd fast double %mul15, %mul14
 br i1 %cmp3, label %if.then22, label %if.else27

if.then22:
  %add23 = fadd fast double %sub17, -1.000000e+00
  %sub24 = fadd fast double %sub17, 1.000000e+00
  %mul25 = fmul fast double %add23, %sub24
  %sub26 = fsub fast double 1.000000e+00, %mul25
  br label %if.end32

if.else27:
  %sub28 = fsub fast double -1.000000e+00, %sub17
  %sub29 = fadd fast double %sub17, 1.000000e+00
  %mul30 = fmul fast double %sub28, %sub29
  %add31 = fadd fast double %mul30, 1.000000e+00
  br label %if.end32

if.end32:                                         ; preds = %if.else27, %if.then22
  %b.1 = phi double [ %sub26, %if.then22 ], [ %add31, %if.else27 ]
  %sub33 = fsub fast double 1.000000e+00, %sub17
  %mul34 = fmul fast double %sub17, %conv
  %mul35 = fmul fast double %mul34, %sub33
  %sub36 = fsub fast double %b.1, %.
  %add37 = fadd fast double %b.1, %mul35
  %add38 = fadd fast double %add37, %sub36
  ret double %add38
}

If you think this is not sufficient for some reason, please at least remove the unneeded metadata, unused functions from the test case, replace the calls to fprintf & co that are needed to have uses for the FMA results with simpler functions (or return the result) and replace the globals by adding parameters if required. Also the other ifs should not matter to the combiner either, so I would simplify the CFG there as well.

Simplified test case: Present in test/CodeGen/AArch64/fma-simple.ll

pending simplification in fma-aggressive, I would move this test function to fma-aggressive.ll too.

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	How likely is it that we aggressive FMA is beneficial for floats or double only? IMO it's probably enough to just have `-aggressive-fma` for now.
lib/Target/AArch64/AArch64ISelLowering.cpp
116 ↗	(On Diff #128884)	I think there is no need for this option any longer, `-mattr=+aggressive-fma-float+aggressive-fma-double` can be used instead.
10977 ↗	(On Diff #128884)	If we are only using subtarget features here, we can move this to the header I think.
test/CodeGen/AArch64/fma-simple.ll
2 ↗	(On Diff #128884)	Please also add a test that enables aggressive fma fusion using mattr for non-thunderX2.

steleman added inline comments.Jan 22 2018, 8:26 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Could you please clarify this? I.e. are you referring to quad-precision? T99 doesn't have quad-precision FMA. Only floats and doubles. Are there any AArch64 micro-arch'es that can do quad-precision FMA? Or are you asking about SIMD?

fhahn added inline comments.Jan 22 2018, 8:31 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Sorry, I meant I think a single option to enable aggressive FMA would be enough and separate options for float and double seems too fine grained, unless I am missing something :)

steleman added inline comments.Jan 22 2018, 8:42 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	Aaah, OK. Got it. Here's my thinking: There may be some AArch64 micro-arch'es that would want to enable FMA for floats, but not doubles. Or doubles, but not floats. Because of latency, or some other constraint. So the idea is to allow this differentiation. Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. And then there's also FMLA and FMLS. For T99 it doesn't matter because all the FMA instructions have the same latency. But I'm not sure this applies to all the other AArch64 micro-arch'es.

fhahn added inline comments.Jan 22 2018, 9:06 AM

lib/Target/AArch64/AArch64.td
152 ↗	(On Diff #128884)	I see what you mean, I just think that we should make our lives more complicated only when there actually are micro-arch'es that want to enable aggressive FMA for double or float only ;) Now that FMA is a Subtarget feature, if we don't have this differentiation between floats and doubles, then we're back to the original implementation: the micro-arch Subtarget will have to switch() on the CPU and EVT type in AArch64ISelLowering.cpp (AArch64TargetLowering::enableAggressiveFMAFusion()). That was frowned upon, which is why I created these two FMA Subtarget features. I am not sure I follow. I think @MatzeB 's issue was using `getProcFamily()` and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

In D40696#984012, @steleman wrote:

I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974

If there's only one EnableAggressiveFMA boolean in AArch64.td:

What happens if a certain micro-arch wants to enable AggressiveFMA for scalar doubles and vector doubles, but not for scalar floats and vector floats? (as an example).

Won't they have to call getProcFamily() in enableAggressiveFMAFusion()?

As there is only one boolean in AArch64.td, AggressiveFMA will be either enabled for all the floating-point types, or disabled for all the floating-point types, scalars or vectors. The only way of knowing which particular floating-point type is being tested for AggressiveFMA is by testing the type of the EVT, and then making a decision based on getProcFamily() and EVT type.

Is it safe to speculate that each and every AArch64 micro-arch wants AggressiveFMA enabled or disabled for all possible floating-point types?

My 2 cents: In general, we could think of probably thousands of subtarget features describing how different potential micro-architectures might handle specific instructions or sequences of instructions.
Implementing all of those thousands of subtarget features without knowing that it actually makes a difference for any of the micro-architectures that LLVM does know about doesn't make sense.
Following that same logic, I'd choose to have 1 subtarget feature introduced in this patch, assuming IIUC that you only really need one for the functionality you introduce in this patch.
If in the future it becomes clear there is a micro-architecture for which in LLVM we want to model different behaviour with respect to AggressiveFMA for floats, doubles, vectors, or something else, we can introduce new subtarget features then.
Introducing them now, without there being a demonstrated need for it, feels a bit like premature optimization to me.
I hope that makes sense?

Updated per latest comments:

Removed AggressiveFMAFloat and AggressiveFMADouble from AArch64.td.
Created FeatureAggressiveFMA in AArch64.td.
Merged test cases into one, including the IR test case from Florian Hahn.

LGTM with 2 small comments, I think all comments should be addressed now, thanks! Please wait with committing for another day or so, to give people some time to raise additional concerns.

When committing, please adjust the commit title & message to mention the target feature rather than the option. And it would be great if you could prefix the title again with [AArch64]

lib/Target/AArch64/AArch64.td
153 ↗	(On Diff #131192)	Just `aggressive-fma` seems more in line with the existing target feature names. `+aggressive-fma` enables the target feature, `-aggressive-fma` explicitly disables it.
409 ↗	(On Diff #131192)	nit: move it to the other FeatureXXX in the list.

This revision is now accepted and ready to land.Jan 24 2018, 1:38 AM

Updated per latest comments.

Closed by commit rL323474: [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. (authored by joel_k_jones). · Explain WhyJan 25 2018, 1:57 PM

This revision was automatically updated to reflect the committed changes.

Diff 131507

llvm/trunk/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	def FeatureNoNegativeImmediates : SubtargetFeature<"no-neg-immediates",
"to their negated or complemented "		"to their negated or complemented "
"equivalent when the immediate does "		"equivalent when the immediate does "
"not fit in the encoding.">;		"not fit in the encoding.">;

def FeatureLSLFast : SubtargetFeature<		def FeatureLSLFast : SubtargetFeature<
"lsl-fast", "HasLSLFast", "true",		"lsl-fast", "HasLSLFast", "true",
"CPU has a fastpath logical shift of up to 3 places">;		"CPU has a fastpath logical shift of up to 3 places">;

		def FeatureAggressiveFMA :
		SubtargetFeature<"aggressive-fma",
		"HasAggressiveFMA",
		"true",
		"Enable Aggressive FMA for floating-point.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Architectures.		// Architectures.
//		//

def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",		def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",
"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM]>;		"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM]>;

def HasV8_2aOps : SubtargetFeature<"v8.2a", "HasV8_2aOps", "true",		def HasV8_2aOps : SubtargetFeature<"v8.2a", "HasV8_2aOps", "true",
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	def ProcSaphira : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
FeaturePredictableSelectIsExpensive,		FeaturePredictableSelectIsExpensive,
FeatureZCZeroing,		FeatureZCZeroing,
FeatureLSLFast,		FeatureLSLFast,
HasV8_3aOps]>;		HasV8_3aOps]>;

def ProcThunderX2T99 : SubtargetFeature<"thunderx2t99", "ARMProcFamily",		def ProcThunderX2T99 : SubtargetFeature<"thunderx2t99", "ARMProcFamily",
"ThunderX2T99",		"ThunderX2T99",
"Cavium ThunderX2 processors", [		"Cavium ThunderX2 processors", [
		FeatureAggressiveFMA,
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureArithmeticBccFusion,		FeatureArithmeticBccFusion,
FeatureNEON,		FeatureNEON,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeaturePredictableSelectIsExpensive,		FeaturePredictableSelectIsExpensive,
FeatureLSE,		FeatureLSE,
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	public:
void insertCopiesSplitCSR(		void insertCopiesSplitCSR(
MachineBasicBlock *Entry,		MachineBasicBlock *Entry,
const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;		const SmallVectorImpl<MachineBasicBlock *> &Exits) const override;

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

		/// Enable aggressive FMA fusion on targets that want it.
		bool enableAggressiveFMAFusion(EVT VT) const override;

/// Returns the size of the platform's va_list object.		/// Returns the size of the platform's va_list object.
unsigned getVaListSizeInBits(const DataLayout &DL) const override;		unsigned getVaListSizeInBits(const DataLayout &DL) const override;

/// Returns true if \p VecTy is a legal interleaved access type. This		/// Returns true if \p VecTy is a legal interleaved access type. This
/// function checks the vector element type and the overall width of the		/// function checks the vector element type and the overall width of the
/// vector.		/// vector.
bool isLegalInterleavedAccessType(VectorType *VecTy,		bool isLegalInterleavedAccessType(VectorType *VecTy,
const DataLayout &DL) const;		const DataLayout &DL) const;
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,976 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::isIntDivCheap(EVT VT, AttributeList Attr) const {
// integer division, leaving the division as-is is a loss even in terms of		// integer division, leaving the division as-is is a loss even in terms of
// size, because it will have to be scalarized, while the alternative code		// size, because it will have to be scalarized, while the alternative code
// sequence can be performed in vector form.		// sequence can be performed in vector form.
bool OptSize =		bool OptSize =
Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);		Attr.hasAttribute(AttributeList::FunctionIndex, Attribute::MinSize);
return OptSize && !VT.isVector();		return OptSize && !VT.isVector();
}		}

		bool AArch64TargetLowering::enableAggressiveFMAFusion(EVT VT) const {
		return Subtarget->hasAggressiveFMA() && VT.isFloatingPoint();
		}

unsigned		unsigned
AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {		AArch64TargetLowering::getVaListSizeInBits(const DataLayout &DL) const {
if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())		if (Subtarget->isTargetDarwin() \|\| Subtarget->isTargetWindows())
return getPointerTy(DL).getSizeInBits();		return getPointerTy(DL).getSizeInBits();

return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;		return 3 * getPointerTy(DL).getSizeInBits() + 2 * 32;
}		}

void AArch64TargetLowering::finalizeLowering(MachineFunction &MF) const {		void AArch64TargetLowering::finalizeLowering(MachineFunction &MF) const {
MF.getFrameInfo().computeMaxCallFrameSize(MF);		MF.getFrameInfo().computeMaxCallFrameSize(MF);
TargetLoweringBase::finalizeLowering(MF);		TargetLoweringBase::finalizeLowering(MF);
}		}

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	protected:
bool HasRAS = false;		bool HasRAS = false;
bool HasRDM = false;		bool HasRDM = false;
bool HasPerfMon = false;		bool HasPerfMon = false;
bool HasFullFP16 = false;		bool HasFullFP16 = false;
bool HasSPE = false;		bool HasSPE = false;
bool HasLSLFast = false;		bool HasLSLFast = false;
bool HasSVE = false;		bool HasSVE = false;
bool HasRCPC = false;		bool HasRCPC = false;
		bool HasAggressiveFMA = false;

// HasZeroCycleRegMove - Has zero-cycle register mov instructions.		// HasZeroCycleRegMove - Has zero-cycle register mov instructions.
bool HasZeroCycleRegMove = false;		bool HasZeroCycleRegMove = false;

// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.		// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.
bool HasZeroCycleZeroing = false;		bool HasZeroCycleZeroing = false;
bool HasZeroCycleZeroingFPWorkaround = false;		bool HasZeroCycleZeroingFPWorkaround = false;

▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	public:
bool supportsAddressTopByteIgnored() const;		bool supportsAddressTopByteIgnored() const;

bool hasPerfMon() const { return HasPerfMon; }		bool hasPerfMon() const { return HasPerfMon; }
bool hasFullFP16() const { return HasFullFP16; }		bool hasFullFP16() const { return HasFullFP16; }
bool hasSPE() const { return HasSPE; }		bool hasSPE() const { return HasSPE; }
bool hasLSLFast() const { return HasLSLFast; }		bool hasLSLFast() const { return HasLSLFast; }
bool hasSVE() const { return HasSVE; }		bool hasSVE() const { return HasSVE; }
bool hasRCPC() const { return HasRCPC; }		bool hasRCPC() const { return HasRCPC; }
		bool hasAggressiveFMA() const { return HasAggressiveFMA; }

bool isLittleEndian() const { return IsLittle; }		bool isLittleEndian() const { return IsLittle; }

bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }		bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }
bool isTargetIOS() const { return TargetTriple.isiOS(); }		bool isTargetIOS() const { return TargetTriple.isiOS(); }
bool isTargetLinux() const { return TargetTriple.isOSLinux(); }		bool isTargetLinux() const { return TargetTriple.isOSLinux(); }
bool isTargetWindows() const { return TargetTriple.isOSWindows(); }		bool isTargetWindows() const { return TargetTriple.isOSWindows(); }
bool isTargetAndroid() const { return TargetTriple.isAndroid(); }		bool isTargetAndroid() const { return TargetTriple.isAndroid(); }
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 131507

llvm/trunk/lib/Target/AArch64/AArch64.td

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.h

This is an archive of the discontinued LLVM Phabricator instance.

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch'sClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 131507

llvm/trunk/lib/Target/AArch64/AArch64.td

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.h

Enable aggressive FMA on T99 and provide AArch64 option for other micro-arch's
ClosedPublic