Page MenuHomePhabricator

AArch64 : Add FastCSEL feature.
Needs RevisionPublic

Authored by flyingforyou on Feb 24 2017, 5:00 PM.

Details

Summary

Add option for some cores which are better to use csel instead of csinv or csinc.

Diff Detail

Event Timeline

flyingforyou created this revision.Feb 24 2017, 5:00 PM
MatzeB added a subscriber: MatzeB.Feb 24 2017, 5:27 PM
MatzeB added inline comments.
lib/Target/AArch64/AArch64Subtarget.cpp
72–74 ↗(On Diff #89749)

This is unnecessary, the target feature will already set it.

Addressing Matthias's comment.
Thanks.

rengolin added inline comments.Feb 27 2017, 2:55 PM
lib/Target/AArch64/AArch64.td
54

Can't this be mapped in the cost model?

rengolin requested changes to this revision.Feb 27 2017, 2:57 PM

If the core doesn't support CSEL, let's mark it as unavailable. If it does, but the cost is bad (for whatever reason), this really really should be in the cost model...

lib/Target/AArch64/AArch64InstrInfo.td
46

Wait, "prefer" or "don't use"? This is confusing...

This revision now requires changes to proceed.Feb 27 2017, 2:57 PM

Thanks for comment Renato.

This patch was inspired by below ARM's code. May I change this? What is your suggestion for the name "Don't use".?

// Some targets (e.g. Cortex-A9) prefer VMOVSR to VMOVDRR even when using NEON
// for scalar FP, as this allows more effective execution domain optimization.
def FeaturePreferVMOVSR : SubtargetFeature<"prefer-vmovsr", "PreferVMOVSR",
                                           "true", "Prefer VMOVSR">;
def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() ||"
                          "!Subtarget->useNEONForSinglePrecisionFP()">;
def DontUseVMOVSR : Predicate<"!Subtarget->preferVMOVSR() &&"
                              "Subtarget->useNEONForSinglePrecisionFP()">;
jmolloy edited edge metadata.Feb 28 2017, 1:02 AM

Hi Renato,

If the core doesn't support CSEL, let's mark it as unavailable. If it does, but the cost is bad (for whatever reason), this really really should be in the cost model...

I have a bit more background on this as Junmo has talked to me offline. The core *does* support CSINV and friends, but they aren't as efficient as CSEL. Therefore, Junmo would like to disable the patterns that generate CSINV and friends (this could increase register pressure, but for his target the payoff is great enough).

James

rengolin added a comment.EditedFeb 28 2017, 6:49 AM

This patch was inspired by below ARM's code. May I change this? What is your suggestion for the name "Don't use".?

I thought as much, but that's a bad design and shouldn't repeat.

We had similar discussions with the Samsung and Code Aurora people and we managed to work around all the issues. Shouldn't be too hard to do the same here.

We have actually removed a lot of code that was doing similar things, so I don't think we should add it back again.

cheers,
-renato

evandro edited edge metadata.Feb 28 2017, 8:36 AM

If the core doesn't support CSEL, let's mark it as unavailable. If it does, but the cost is bad (for whatever reason), this really really should be in the cost model...

I agree.

evandro requested changes to this revision.Feb 28 2017, 8:37 AM

Hi Renato,

If it does, but the cost is bad (for whatever reason), this really really should be in the cost model...

For my own understanding - how does the cost model impact the choices ISel makes? I didn't think it could. Or are you saying *in addition to* disabling the relevant Pat<>s with a feature, this should *also* be reflected in the cost model?

James

flyingforyou edited edge metadata.
flyingforyou retitled this revision from AArch64 : Add PreferCSEL feature for Exynos-M3. to AArch64 : Add PreferCSEL feature..
flyingforyou edited the summary of this revision. (Show Details)

After internal(Samsung only) discussion, revert patch about Exynos-M3 part.

Renato, before we add new feature for core, we use the code likes

IsExynosM1 or IsCortexA57 ...

But after discussion, we move to use Feature things instead of above code.

I am not sure that cost model can cover all the things what uArchitecture's all the specific details.

Renato, before we add new feature for core, we use the code likes

IsExynosM1 or IsCortexA57 ...

But after discussion, we move to use Feature things instead of above code.

This is correct.

I am not sure that cost model can cover all the things what uArchitecture's all the specific details.

I have to agree with you and James. We have had discussions about this in the past and in some cases, it's possible, in others, it's not.

This is because the current ISel has some thresholds which can be used as very rudimentary cost models, but unless we have a way to do this generically, any work around will be as hacky as a feature.

However, this patch goes one extra step: It changes the table-gen definitions to *forbid* CSEL based on a feature called "PreferCSEL". This is confusing and a misrepresentation of what the flag actually means.

IIRC, other flags have C++ code in the style:

if (FeaturePreferFoo)
  emitFOO();
else
  emitBAR();

which, for ISel purposes, it's the same effect as your table-gen solution. However, this is not true for the assembler. CSEL is valid on that core, and if I try to assemble files with it in this new table-gen, it will bail saying that the instruction requires "DontUseCSEL", which in itself is quite opaque and meaningless to the user.

So, using the flag as a dynamic choice during ISel only should be fine for the time being, but not adding it to the table-gen as a hard requirement.

Makes sense?

--renato

Thanks for your opinion & example Renato.

When I asked about this issue to James, actually he suggested to use term "SlowCSINV".
In ARMInstrInfo.td, there is definition use "DontUse" or "Use" even if they support instructions as I mentioned before.

def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() ||"
                          "!Subtarget->useNEONForSinglePrecisionFP()">;
def DontUseVMOVSR : Predicate<"!Subtarget->preferVMOVSR() &&"
                              "Subtarget->useNEONForSinglePrecisionFP()">;

So, are they also changed to "HasFastVMOVSR" and "HasSlowVMOVSR" ?

And

def DontUseCSEL : Predicate<"!Subtarget->preferCSEL()">;

And Is this ok that change the definition from DontUseCSEL to HasFastCSEL?

And Is this ok that change the definition from DontUseCSEL to HasFastCSEL?

That'd be much better, yes. Thanks!

mcrosier resigned from this revision.Mar 3 2017, 8:24 AM

Addressing Renato, James's comments.
Thanks.

flyingforyou retitled this revision from AArch64 : Add PreferCSEL feature. to AArch64 : Add FastCSEL feature..

Change the feature name PreferCSEL to FastCSEL.

kristof.beyls added inline comments.
lib/Target/AArch64/AArch64InstrInfo.td
46

I think that the proposed name change would make this closer to the intended meaning:

def HasFastCSINCV : Predicate<"!Subtarget->HasSlowCSINCV()">;
1137–1138

With the name change I'm proposing, I think this reads more natural and is easier to understand:

def : Pat<(AArch64csel (i32 0), (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr WZR, WZR, (i32 imm:$cc))>, Requires<[HasFastCSINCV]>;
lib/Target/AArch64/AArch64Subtarget.h
79–80

Sorry to be late in my review feedback here.

Wouldn't "HasSlowCSINCV" be more natural and make the intent a tiny bit easier to understand?

e.g.

// If true, CSINV or CSINC will not be generated, using CSEL instead.
bool HasSlowCSINCV = false;
evandro requested changes to this revision.Mar 6 2017, 8:22 AM
This comment was removed by evandro.
This revision now requires changes to proceed.Mar 6 2017, 8:22 AM

Why add a feature that no target uses?

@evandro
This is code review about the case "Some targets using CSEL is more prefer than CSINV, CSINC.".

And there were good discussions which way is better & correct for handling the issue. As you said, there is no specific target for this feature,now.
I just wanted to get a confirm from other members, How to handle when we want to do this.

@kristof.beyls
Thanks for comments. Your suggestion looks better.

@evandro
This is code review about the case "Some targets using CSEL is more prefer than CSINV, CSINC.".

And there were good discussions which way is better & correct for handling the issue. As you said, there is no specific target for this feature,now.
I just wanted to get a confirm from other members, How to handle when we want to do this.

There may be more cases resulting in CSET that perhaps should remain CSEL. In order to deal with all such situations, instead of just these matched by these Pats, It seems to me that TargetLowering would be more flexible, whatever the reasons a target has for this.

A few other things missing:

  • A core where this is set by default? Otherwise, why do we have this?
  • Tests. Regardless of cores, tests with and without that feature must exist.

cheers,
--renato

lib/Target/AArch64/AArch64InstrInfo.td
1137–1138

Yup.

lib/Target/AArch64/AArch64Subtarget.h
79–80

I agree with Kristof here.