This patch fixes a bug which skipped
adding predicate matcher for a pattern in many cases.
For example, if predicate is Load and
its memoryVT is non-null then the loop
continues and never reaches to the end which
adds the predicate matcher. This patch moves the
matcher addition to the top of the loop
so that it gets added regardless of contextual checks
later in the loop.
Other way to fix this issue is to remove all "continue" statements
in checks and let the loop continue till end.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
180 ms | linux > Polly.ScopInfo::Unknown Unit Message ("") |
Event Timeline
Could you provide a more concrete example that goes wrong without this change? I believe that could help understanding, would be good to add to the commit message for this.
Could you add a test case or point us at a example of an existing test that goes wrong? The description in the commit message is unlikely to be the full story as we already cover loads with non-null MemoryVT's. My best guess is you are attempting to use builtin predicates and custom predicates together. I don't see a reason why that shouldn't be allowed but it's not something that was intended as the goal was aiming to fully remove the custom C++ from the tablegen headers so that tablegen could do some transformations on sextload/zextload and similar to fix DAGISel vs GlobalISel modelling mismatches.
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3637–3647 | These two cases covered all of the loads with predicates that were moved to built-in predicates. |
I think the problem is broader than just combining custom predicates and builtin. The emitter here implicitly assumes all of these builtin PatFrag predicates are used exactly as the hierarchy used to define the default/generic load/store patterns. However, the PatFrags are much more free form and you can define a patfrag that combines multiple of these predicates in the same "layer" of the load/store hierarchy. The AMDGPU patterns redefine the entire set of load/store patterns, and in some cases it makes sense to combine all the predicates at once. I had to somewhat artificially add new layers of PatFrags to only apply a single predicate at a time. This also isn't consistently applied (for example, it does work to combine the AddressSpaces predicate and alignment)
Yes in addition to what Matt explained here, I believe the custom predicates should be allowed to use on any patterns because we cannot always guarantee that everything can be covered in other checks. Sometimes, we may want to do complicated checks for which custom predicates makes more sense that adding a builtin for each sub-checks. FWIW, the function needs refactoring because it does checks like isAtomic() and isLoad() several times. We can really do these checks once and then add an appropriate checker. The function seems tailored to a class of instructions but the name of the function seems generic.
I have added a test.
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | Well, then we have to remove all "continue" statements. Are we fine with it? I always thought that we'll use custom predicates for things which are *not* covered in other checks. I am fine with keeping it at the end, just looking for consequences. |
That's because the builtin ones implement the predicates from the PatFrag hierarchy that was in the Target.td loads. Those PatFrags resulted in three distinct predicates applied to the same node on most load/stores. That's what I mean when I say mixing builtin/custom wasn't intended. The builtins were only intended to replace that specific hierarchy. Once we had C++ predicates there wasn't an apparent need to extend the builtins any further.
However, the PatFrags are much more free form and you can define a patfrag that combines multiple of these predicates in the same "layer" of the load/store hierarchy. The AMDGPU patterns redefine the entire set of load/store patterns, and in some cases it makes sense to combine all the predicates at once. I had to somewhat artificially add new layers of PatFrags to only apply a single predicate at a time.
Yep, I don't see a reason why you shouldn't be allowed to use these in different ways. I'm just saying that wasn't intended when they were added. The expectation was that other predicates would continue to use C++ predicates as they did before. There are some benefits to factoring out the portions of the C++ that are equivalent to these predicates though (the generated state machine can make use of this information when it's optimized) which is partly why I think it ought to be allowed.
This also isn't consistently applied (for example, it does work to combine the AddressSpaces predicate and alignment)
IIRC, that's because there was originally a C++ predicate that checked both together. I think it was target specific rather than Target.td though
I'm not sure I'm following this. Custom predicates can be used on any patterns. What I'm saying is that you can't (currently) have a single PatFrag that is simultaneously a builtin and a custom predicate. They have to be distinct PatFrags at the moment (i.e. a PatFrag with a custom predicate that matches another PatFrag with a builtin predicate) like how sextloadi32 contains sextload which contains unindexedload. These three add distinct predicates to form the three predicates on one node that we had in the internal representation of the pattern.
FWIW, the function needs refactoring because it does checks like isAtomic() and isLoad() several times. We can really do these checks once and then add an appropriate checker. The function seems tailored to a class of instructions but the name of the function seems generic.
The alternatives I tried at the time generally ended up more confusing. The current builtins code is generally structured as:
if (isBuiltin1()) { addBuiltin1(); continue; } if (isBuiltin2()) { addBuiltin2(); continue; }
but the isBuiltinN()'s were split into subpredicates because the MemVT's made the number of specific predicates excessive and because the predicates left over after extracting that left a fair bit of code duplication.
I do think it should be hoisted into a function though.
I have added a test.
Thanks. That confirmed my understanding of the issue
llvm/test/TableGen/ContextlessPredicates.td | ||
---|---|---|
14–16 | Thanks, this confirms what I was thinking. This PatFrag is simultaneously a builtin and a custom which wasn't really intended but should probably be allowed (even if it wasn't at the very least it should error, silently dropping it is not ok). The equivalent that fits into the original intent is: def atomic_swap_i32 : PatFrag<(ops node:$ptr, node:$val), (atomic_swap node:$ptr, node:$val)> { let IsAtomic = 1; let MemoryVT = i32; } def test_atomic_op_frag : PatFrag<(ops node:$ptr, node:$val), (atomic_swap_i32 node:$ptr, node:$val)> { let GISelPredicateCode = [{ dbgs() << return !MRI.use_nodbg_empty(MI.getOperand(0).getReg()); }]; } | |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
3764 | I'd be inclined to factor the builtins out into a function instead so they can return and then you resume at the code to check+add for custom predicates |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | Also, just to confirm I agree it should be added after the builtins. Otherwise your C++ can't assume anything covered by the builtins. |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | Is it just for "assumption" or it would affect the functionality if custom predicates are added later? | |
3764 | Sure, I am up for refactoring, but can you please explain a bit more, how would you that refactoring look? I was more inclined to rewriting the loop which iterates over PredicateCalls and add matcher. One of the many ways in which we can refactor the loop is like if (isAtomic()) if (isLoad()) add all matchers ... but this may change the order in which matchers are added (which brings me back to the question - is the order important?). I can further outline this in a static function. So, I'd like to see what you're imagining in a form of pseudo code :) |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | Suppose the builtin checks it's a i32 non-extending load and the custom predicate wants to check the address generation. If the custom predicate is first then it needs to check the following pseudo-code: isLoad() && isNonExtending() && isMMO32bit() && checkAddrGen(...) whereas if the builtin is first then the first three components are tested by the builtin first so the custom predicate is: checkAddrGen(...) | |
3764 |
What I had in mind was:
Then after those predicates are added it will resume execution from line 3709 and do the custom predicates
In general, the order predicates are tested does matter. For example, testing an MMO property (e.g. the MemoryVT) without knowing that an MMO is there will lead to a crash. Specific pairs might be reorderable but unless there's a strong need to change the current order and re-verify the new order works for all targets, I'd preserve it. |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | With your suggested way, we'd be extracting few statements of loop in the function but for rest of the loop body, we'll still have to run the loop in caller i.e. current function. So, we'd be running the loop twice which seems time consuming to me. Line 3707 belongs to this loop. The loop I am referring to is for (const TreePredicateCall &Call : Src->getPredicateCalls()) { at line 3604 in the original code. Are you sure about this? I was thinking of just refactoring the loop something like for (const TreePredicateCall &Call : Src->getPredicateCalls()) { addBuiltinPredicate(); // new function if (Predicate.hasGISelPredicateCode()) { InsnMatcher.addPredicate<GenericInstructionPredicateMatcher>(Predicate); continue; }} |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | I believe we're describing the same thing. My suggestion left it at: for (const TreePredicateCall &Call : Src->getPredicateCalls()) { const TreePredicateFn &Predicate = Call.Fn; if (Predicate.isAlwaysTrue()) continue; if (Predicate.isImmediatePattern()) { InsnMatcher.addPredicate<InstructionImmPredicateMatcher>(Predicate); continue; } addBuiltinPredicate(...); if (Predicate.hasGISelPredicateCode()) { InsnMatcher.addPredicate<GenericInstructionPredicateMatcher>(Predicate); continue; } return failedImport("Src pattern child has predicate (" + explainPredicates(Src) + ")"); } the code that was extracted into addBuiltinPredicate() then does return wherever it currently does continue.
I'm not sure we're using the same line numbers here. 3604 is: continue; in the original code and 0, MemoryVsLLTSizePredicateMatcher::EqualTo, 0); in the new code according to phabricator. |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3764 | Yeah, we're not referring to same line numbers :) |
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3665 | It is returning to the parent createAndImportSelDAGMatcher which further adds custom predicate. |
LGTM. I think this is only a half solution, and it's still a problem that multiple builtin predicates will be silently ignored
llvm/utils/TableGen/GlobalISelEmitter.cpp | ||
---|---|---|
3665 | OK, so this is *only* fixing custom predicate + builtin predicate. It is not fixing adding multiple builtin predicates at once. I think that can be a separate patch, but is just as broken |
Don't need dbgs() << here (not that you really need anything)