This is an archive of the discontinued LLVM Phabricator instance.

[globalisel] Sort RuleMatchers by priority.
ClosedPublic

Authored by dsanders on Feb 8 2017, 4:15 AM.

Details

Summary

This makes more important rules have priority over less important rules.
For example, '%a = G_ADD $b:s64, $c:s64' has priority over
'%a = G_ADD $b:s32, $c:s32'. Previously these rules were emitted in the
correct order by chance.

NFC in this patch but it is required to make the next patch work correctly.

Depends on D29710

Diff Detail

Repository
rL LLVM

Event Timeline

dsanders created this revision.Feb 8 2017, 4:15 AM
ab added inline comments.Feb 8 2017, 6:41 PM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Huh, how can this be a problem?

dsanders added inline comments.Feb 9 2017, 1:01 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

It works fine for s32 and s64 since "s32" < "s64" but the overall ordering is wrong when the number of digits vary:

"s128" < "s16" < "s32" < "s64" < "s8"

Once I've dealt with the LLT layering issue we'll be able to do a numeric comparison for the size such that the order is:

s8 < s16 < s32 < s64 < s128
rovka edited edge metadata.Feb 9 2017, 6:12 AM

Hi Daniel,

Is this NFC in general, or just NFC with regards to the existing AArch64 description? More to the point, is it possible to concoct a useful test case that would fail without this ordering?

Also, will the priorities always be the same for all targets, or is it going to be possible to tweak things?

Thanks,
Diana

utils/TableGen/GlobalISelEmitter.cpp
169 ↗(On Diff #87630)

I think this comment could be fleshed out a bit. The way it is phrased now, it isn't clear what other distinctions between rules are more important (so maybe an example would help). It also sounds like a last resort, when in fact we can have more specific criteria for predicates of the same kind (e.g. in LLTOperandMatcher::isHigherPriorityThan).

171 ↗(On Diff #87630)

OPM_Int doesn't exist yet (you're introducing it in the next patch)

227 ↗(On Diff #87630)

Just curious: have you given any thought to how this would work for vectors?

231 ↗(On Diff #87630)

Nitpick: I would replace this with

if (B.OperandPredicateMatcher::isHigherPriorityThan(*this))
  return false;

and then compare Ty's. I think that's more robust against changes to OPM::isHigherPriorityThan.

299 ↗(On Diff #87630)

Shouldn't you also return false if predicates_size() < B.predicates_size() ?

417 ↗(On Diff #87630)

Same as above, return false when it's the other way around.

538 ↗(On Diff #87630)

Same as above.

724 ↗(On Diff #87630)

I'd add an assert here on the asymmetry of isHigherPriorityThan.

rovka added a comment.Feb 9 2017, 6:22 AM

Is this NFC in general, or just NFC with regards to the existing AArch64 description? More to the point, is it possible to concoct a useful test case that would fail without this ordering?

Scratch that, this one isn't marked NFC :) I probably imagined it from the other patches...

ab added inline comments.Feb 9 2017, 7:15 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Sorry; I'm asking why does the ordering matter in the first place?

dsanders added inline comments.Feb 9 2017, 7:37 AM
utils/TableGen/GlobalISelEmitter.cpp
169 ↗(On Diff #87630)

I see your point here. I'll see if I can think of a better explanation

224–230 ↗(On Diff #87630)

Ah ok. Thinking about it again, I'm not sure it does. My original thinking was partly based on a comment from D26878 that talked about the order being correct by luck but we don't have the widenable concept at the moment. The other part of the thinking was based on the order I've seen the matcher tables be emitted but I think that order stems more from additional Predicates<> than from the type.

227 ↗(On Diff #87630)

Not yet. I'm not sure they need to have a priority ordering at the moment.

299 ↗(On Diff #87630)

I think that makes sense but I don't think it has a functional effect on the matcher. I'll make this change (and the others like it).

724 ↗(On Diff #87630)

I'm not sure what you mean here.

rovka added inline comments.Feb 9 2017, 8:11 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Ha, I had the same comment in mind!

299 ↗(On Diff #87630)

It might have a functional effect on the sort, but I'll admit I haven't thought enough about it to be sure. Anyway, it looks like a simple way to make sure we're preserving asymmetry at least in a few extra cases.

724 ↗(On Diff #87630)

Sorry, I wasn't being very specific :)

I meant assert that if A.isHigherPriorityThan(B) then !B.isHigherPriorityThan(A). I think this is important because we're using a pretty complex function for sorting and it's a good idea to check that it's a strict weak ordering, so we don't have trouble with the sort. I don't know if it's worth worrying about transitivity, but asymmetry (which implies irreflexivity) is easy enough to verify.

dsanders added inline comments.Feb 9 2017, 8:41 AM
utils/TableGen/GlobalISelEmitter.cpp
724 ↗(On Diff #87630)

That makes sense to me. Thanks

ab added inline comments.Feb 9 2017, 7:47 PM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Ah yeah you're right, if we want to reinstate the isWidenable logic, we'd need this ordering. But one of the reasons why I'm more and more convinced it's a bad idea is precisely the need for an ordering: I have a gut feeling the patterns should never require that for correctness, letting us order them as we please for performance. In which case, the types should never be a distinguishing factor (as we're only looking at performance indicators for breaking "ties" between "equivalent" patterns). Does that sound right? If so, I guess the type should just be ignored entirely.

302–303 ↗(On Diff #87630)

Ooh, nice! It's unfortunate that we still have to deal with std::get<> =(

536 ↗(On Diff #87630)

This should maybe take into account the AddedComplexity somehow, but that's not really compatible with the isHigherPriorityThan scheme :/

I'm not sure using AddedComplexity is a good idea in the first place: it's very magic-number-y and totally SDAG implementation specific. Still, care has been put into that, so if there's any value we can easily extract from that, let's do it.

Maybe instead sum up the "complexity" of each Matcher? We'd want to make sure to prioritize significant metrics like more instructions over operand kinds and such.

Assuming you don't have another usage in mind for the *Kind enums, that might let you simplify that to virtual methods in each Matcher returning its cost.

kristof.beyls added inline comments.Feb 10 2017, 1:59 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

I'm trying to follow the discussion here, but find it a bit hard to understand in depth.
I'm wondering if priority of patterns/rules is written up somewhere already?
(This comment seems to suggest it's still very much an in-progress design discussion, so isn't written up anywhere?)
I'm assuming the priority of which pattern to try and match first will remain the same as under DAGISel? Or do we consider making some tweaks?
If it remains the same, what further ordering rules are being discussed here, and why does more ordering need to be defined compared to DAGISel?

Please don't let my comment slow you down in getting to a solution, but I think it'd be useful for this area to be explained a bit better in the end somewhere, e.g. in the comment at the top of this file.
Doesn't necessarily need to be as part of this patch.

dsanders added inline comments.Feb 10 2017, 9:33 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Ah yeah you're right, if we want to reinstate the isWidenable logic, we'd need this
ordering. But one of the reasons why I'm more and more convinced it's a bad idea is
precisely the need for an ordering: I have a gut feeling the patterns should never
require that for correctness, letting us order them as we please for performance. In
which case, the types should never be a distinguishing factor (as we're only looking at
performance indicators for breaking "ties" between "equivalent" patterns). Does that
sound right? If so, I guess the type should just be ignored entirely.

I think a certain amount of ordering is unavoidable because we have a mixture of special cases that overlap with general cases. Consider these two patterns:

(set GPR32:$rd, (add GPR32:$rs1, GPR32:$rs2))
(set GPR32:$rd, (add (mul GPR32:$rs1, GPR32:$rs2), GPR32:$rs3))

If we check them in the order I've written them then we can never match a multiply-accumulate but the other way around will always match the multiply accumulate when it's possible to do so. An alternative is to pick the best match at run time but that prevents early exits from the matcher.

Another example is:

(set GPR32:$rd, (add GPR32:$rs1, uimm16:$imm))
(set GPR32:$rd, (add GPR32:$rs1, uimm8:$imm))

On a variable-length instruction set, there may be code-size and/or performance gains from preferring the uimm8 version.

I'm trying to follow the discussion here, but find it a bit hard to understand in depth.
I'm wondering if priority of patterns/rules is written up somewhere already?
(This comment seems to suggest it's still very much an in-progress design discussion, so
isn't written up anywhere?)

That's right, we haven't decided what kind of ordering GlobalISel really needs yet. At the moment I'm trying to stick to the minimum needed for the matcher to work correctly. By the time we start to optimise the generated matcher I'm hoping to have a better idea of what the ordering issues are.

For SelectionDAG it's defined in PatternSortingPredicate from utils/TableGen/DAGISelEmitter.cpp.

I'm assuming the priority of which pattern to try and match first will remain the same
as under DAGISel? Or do we consider making some tweaks?

Maybe. Ideally, the imported rules have the same overall behaviour as they do in SelectionDAG so that we produce the same or better code out of the box but I'm not sure that's possible due to the differences in representation in GlobalISel. For example, LLT can't distinguish floating point from integer like MVT can so some of SelectionDAG's ordering rules cannot be transferred to GlobalISel.

kristof.beyls added inline comments.Feb 10 2017, 10:00 AM
utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

Brilliant explanation, thanks! Much appreciated!

utils/TableGen/GlobalISelEmitter.cpp
536 ↗(On Diff #87630)

Our backend has scripts that takes into account ISA information and auto generates various DAGPatterns and also calculates the added complexity for each pattern and emits them (not in sorted order by complexity).
If we try to reprioritize patterns and come up with newer metrics/aggregates for matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing about costs).

utils/TableGen/GlobalISelEmitter.cpp
536 ↗(On Diff #87630)

Perhaps there needs to be a "-respect-added-complexity" option so backends can rely on GlobalISel not re-ordering patterns it thinks is better?

Would it be possible to deal with the smaller changes to the sort order in this patch and come back to the bigger issues in a later one? I'd like to unblock the other patches and I'll have a clearer picture of the bigger issues once there are more importable rules.

utils/TableGen/GlobalISelEmitter.cpp
536 ↗(On Diff #87630)

This should maybe take into account the AddedComplexity somehow, but that's not
really compatible with the isHigherPriorityThan scheme :/

This isn't really blocked by isHigherPriorityThan(). One of the higher level objects can always weigh complexity into its result.

I'm not sure using AddedComplexity is a good idea in the first place: it's very magic
number-y and totally SDAG implementation specific. Still, care has been put into that,
so if there's any value we can easily extract from that, let's do it.

I'd prefer not to add AddedComplexity for the same reasons. If possible, I'd prefer a more intuitive metric but I'm not sure what that metric would be yet.

I did some digging today and quite a few of the cases where AddedComplexity affects the order (determined by checking if the less-than operator gave different results with and without the AddedComplexity adjustment during std::sort) would be unaffected by the removal of AddedComplexity if sorting based on the root operator replaced it. For example:

(ld:f16 (am_indexed16:iPTR GPR64sp:i64:$Rn, uimm12s2:i64:$offset))<<P:Predicate_unindexedload>><<P:Predicate_load>>
(fadd:f32 (vector_extract:f32 FPR128:v4f32:$Rn, 1:i64), (vector_extract:f32 FPR128:v4f32:$Rn, 0:i64))
 ## 23 (10 added) vs 19 (0 added)

Once you exclude those obvious cases, there's 58 checks of which 26 are of the form:

(add:i32 addsub_shifted_imm32:i32:$imm, GPR32sp:i32:$Rn)
(add:i64 (mul:i64 (zext:i64 GPR32:i32:$Rn), (imm:i64)<<P:Predicate_i64imm_32bit>>:$C), GPR64:i64:$Ra)
 ## 18 (6 added) vs 18(5 added) ##

and are separated by the tree structure. In this example, the addsub_shifted_imm32 one gets priority in the end.

The other 32 are of the form:

(add:i32 arith_shifted_reg32:i32:$Rm, GPR32:i32:$Rn)
(add:i64 neg_addsub_shifted_imm64:i64:$imm, GPR64:i64:$Rn)
 ## 12 (0 added) vs 13(1 added) ##
(st GPR64:i64:$Rt, (am_indexed8:iPTR GPR64sp:i64:$Rn, uimm12s1:i64:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_truncstore>><<P:Predicate_truncstorei8>>
(st FPR128:v16i8:$Rt, (am_indexed7s64:iPTR GPR64sp:i64:$Rn, simm7s8:i32:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_nontemporalstore>>
 ## 23 (10 added) vs 28(15 added) ##

and will have a direct effect on the output when we can import them all.

Maybe instead sum up the "complexity" of each Matcher? We'd want to make sure to
prioritize significant metrics like more instructions over operand kinds and such.

Assuming you don't have another usage in mind for the *Kind enums, that might let
you simplify that to virtual methods in each Matcher returning its cost.

Our backend has scripts that takes into account ISA information and auto generates
various DAGPatterns and also calculates the added complexity for each pattern and emits
them (not in sorted order by complexity).

As far as I can tell, the AddedComplexity adjustment is a constant that's added in certain manually-chosen cases. We'll need to take this discussion off-list to be able to talk details.

If we try to reprioritize patterns and come up with newer metrics/aggregates for
matching, it may not be accurate/expected (wrt to pattern authors/scripts knowing
about costs).

I agree, the aim is to avoid re-prioritizing the patterns if at all possible. However, SelectionDAG's metrics are based on the SDNode and MVT representations that do not exist in GlobalISel. Purely GlobalISel rules will not be able integrate with this scheme.

I think we should aim to use something that makes sense for GlobalISel so that we have one ordering that applies to all GlobalISel rules regardless of whether they were imported from SelectionDAG or not. My hope is that the order resulting from a GlobalISel-based scheme will be functionally the same, or at least very close to the SelectionDAG one.

dsanders updated this revision to Diff 88542.Feb 15 2017, 8:13 AM
dsanders marked 25 inline comments as done.

Addressed most of the review comments.

The big one about complexity has not been done and I'm hoping to defer that to a later patch.

dsanders added inline comments.Feb 15 2017, 8:16 AM
utils/TableGen/GlobalISelEmitter.cpp
169 ↗(On Diff #87630)

I've dropped the majority of this comment and will re-introduce a better version of it in the patch that adds OPM_Int

224–230 ↗(On Diff #87630)

I've dropped this override for now since the lack of Widenable means we don't really need it yet.

231 ↗(On Diff #87630)

Dropping the function (see above) removed the need for this change

dsanders added inline comments.Feb 15 2017, 11:13 AM
utils/TableGen/GlobalISelEmitter.cpp
299 ↗(On Diff #87630)

While updating the next patch, I've spotted a bug that the previous version of this patch was hiding.

We shouldn't be checking the size of the predicates list first because IntOperandMatcher needs priority over the RegisterBankOperandMatcher+LLTOperandMatcher pair used by register operands.

I'll post an update that fixes this tomorrow (either by adding an LLTOperandMatcher, or moving the predicate list size check down).

dsanders added inline comments.Feb 16 2017, 7:51 AM
utils/TableGen/GlobalISelEmitter.cpp
299 ↗(On Diff #87630)

I decided to fix this by adding an LLTOperandMatcher in D29712

Just to double-check, is this ok to commit?

rovka accepted this revision.Feb 23 2017, 8:13 AM

I don't see anything wrong with it, I think it's ok to iterate on the ordering and other tricky parts in tree.

This revision is now accepted and ready to land.Feb 23 2017, 8:13 AM
ab accepted this revision.Feb 23 2017, 9:52 AM

I suppose it's always easier to iterate in-tree, and, setting aside the tricky bits, this is useful; LGTM

utils/TableGen/GlobalISelEmitter.cpp
224–230 ↗(On Diff #87630)

I think a certain amount of ordering is unavoidable because we have a mixture of special cases that overlap with general cases. Consider these two patterns:

(set GPR32:$rd, (add GPR32:$rs1, GPR32:$rs2))
(set GPR32:$rd, (add (mul GPR32:$rs1, GPR32:$rs2), GPR32:$rs3))
If we check them in the order I've written them then we can never match a multiply-accumulate but the other way around will always match the multiply accumulate when it's possible to do so. An alternative is to pick the best match at run time but that prevents early exits from the matcher.

Right, that's what I mean by "performance": in this case, ignoring the second pattern should not cause any selection failure: we never *have to* obey any ordering.

536 ↗(On Diff #87630)

Once you exclude those obvious cases, there's 58 checks of which 26 are of the form:

(add:i32 addsub_shifted_imm32:i32:$imm, GPR32sp:i32:$Rn)
(add:i64 (mul:i64 (zext:i64 GPR32:i32:$Rn), (imm:i64)<<P:Predicate_i64imm_32bit>>:$C), GPR64:i64:$Ra)
 ## 18 (6 added) vs 18(5 added) ##

and are separated by the tree structure. In this example, the addsub_shifted_imm32 one gets priority in the end.

This sounds like the interesting pattern that will need some form of manual override.

(add:i32 arith_shifted_reg32:i32:$Rm, GPR32:i32:$Rn)
(add:i64 neg_addsub_shifted_imm64:i64:$imm, GPR64:i64:$Rn)
 ## 12 (0 added) vs 13(1 added) ##
(st GPR64:i64:$Rt, (am_indexed8:iPTR GPR64sp:i64:$Rn, uimm12s1:i64:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_truncstore>><<P:Predicate_truncstorei8>>
(st FPR128:v16i8:$Rt, (am_indexed7s64:iPTR GPR64sp:i64:$Rn, simm7s8:i32:$offset))<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_nontemporalstore>>
 ## 23 (10 added) vs 28(15 added) ##

and will have a direct effect on the output when we can import them all.

Huh, why is there an effect? Aren't these incompatible because of the types? Or am I misreading this?

This revision was automatically updated to reflect the committed changes.
dsanders added inline comments.Feb 24 2017, 8:40 AM
utils/TableGen/GlobalISelEmitter.cpp
536 ↗(On Diff #87630)

You're correct in both cases, sorry. I'll double check whether there were any genuine examples amongst the rules.