This makes selecting immediate instructions simpler. At -O1 or higher this already happens at the IR level, but for -O0 we end up with arbitrary order.
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Amara,
That patch worries me, because I feel that we are going to pull all the LLVM IR code that does canonicalization.
The way I see it, is the input IR should already be in a canonical form and thus we don't have to do that.
If that's not the case, I would argue that bad output code is fine (garbage in, garbage out).
What do you think?
Cheers,
-Quentin
GISel is currently causing some significant code size regressions, and while this isn't the biggest issue, it looked like something that one would expect even -O0 to do. In fastisel we don't select immediate forms, however if fast-isel falls back to SDISel then the comparison between GISel enabled/disabled starts to look much worse, as SDISel selects immediates, resulting in less code and register pressure.
If we don't do this here, then I'll extend the manual G_ICMP AArch64 instruction selector code because it's extremely cheap/free. I'm ok with that at -O0, I just think that other targets will end up doing the same thing so this would be an overall code/complexity saver if we canonicalized earlier. Other optimizations will be on the way even at -O0 (like jump table creation), because being 30% larger in code size on some CTMark benchmarks is not an acceptable state at the moment.
I agree with the motivation but I don’t think this is the right approach. I would prefer we run a canonicalization step even at O0 before GISel rather than duplicating this logic between IR and MI.
Jump tables are a different beast and have their place in here because they don’t really make sense to do on IR.
What I am saying is code duplication is bad :)
I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?
I had a similar discussion with Quentin a while ago and it seemed there are two solutions:
- Add a pass to canonicalize MIR
- Extend MIRBuilder to canonicalize instructions
I think it makes sense to handle this in MIRBuilder because the other passes may insert instructions that are not in canonical form after the canonicalization.
Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.
No, I meant that with -O1 instcombine is the pass that's currently doing the transformation. At -O0 IC doesn't run at all, it's codegen that's handling it for fastisel/sdisel.
I like the idea of putting this into MIRBuilder then.
If IRTranslator itself uses canonicalizing MIRBuilder, it will translate IR into canonicalized MIR seamlessly w/ no evident increase in IRTranslator's complexity as it's all encapsulated within the builder.
Then using the same builder everywhere else, including combines, will make sure that the canonical form is maintained across GlobalISel pipeline. For instance, one combine doesn't need to explicitly worry about not breaking the form, relying on the MIRBuilder, while the very next one can rely on MIR being in canonical form.
I guess first we'll need to define what exactly canonicalization means for this - are we talking beyond just putting constants at the end?
Does that mean that something after the last InstCombine produces icmp's with the non-canonical operand order? If so maybe that something needs to be chased down and fixed instead.
That's exactly the point :).
I understand that O0 doesn't run instcombine, thus the problem. It may be a good idea to identify what is canonicalization and what is optimization in instcombine and separate both so that it can be run at O0.
Regarding the solution of doing that in the MIRBuilder, this has preference given all MIR pass can benefit from it. That said, I would like we make a conscious choice of what we put here again to avoid code duplication as much as possible.
Basically what I am saying is we should ask ourselves, assuming the input IR is canonicalized, is this a pattern we may introduce and therefore that we need to clean-up one way or another. I would tend to think that when we create instruction with constant we can discipline ourselves for not putting them on the LHS.
Ideally, we could express the canonicalization rules in a higher-level language and use them both in IR and MIR. Though, we don't want to add such dependency to the progress of GISel.
I would tend to think that when we create instruction with constant we can discipline ourselves for not putting them on the LHS.
I'm not sure about this bit. Say, it's a mid-GlobalISel combine we're talking about. It matches a number of vregs along the boundary of a pattern it's supposed to match, and then uses them to generate an equivalent sequence. All it matched is a bunch of vregs from different places, it may not know what's constant and what's not, so how would it decide in which order to put those vregs in the generated instructions w/o explicitly checking for constants?
how would it decide in which order to put those vregs in the generated instructions w/o explicitly checking for constants?
Fair point.
I was more thinking of target specific node, where you'll have to check anyway otherwise you cannot match the imm variant.
Going back to a higher-level description for all that stuff, I wonder if ISel could already handle this.
Indeed, it seems easy, at least conceptually, to teach tablegen to check for the commuted patterns to get the immediate version whenever possible before trying the non-immediate variant.
It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).
So overall I think we should have some form of IR canonicalization before translation for the majority of cases to be handled for -O0.
If the legalizer or combiner is causing issues with introducing non-canonical forms then I think we should either fix them, or if that's not feasible, we accept the worse codegen at -O0 and introduce re-canonicalization either through MIRBuilder or a separate pass when optimizations are enabled.
It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).
Why not?
I understand G_ICMPs are special, but we could come up with whatever complicated logic in TableGen.
So overall I think we should have some form of IR canonicalization before translation for the majority of cases to be handled for -O0.
If the legalizer or combiner is causing issues with introducing non-canonical forms then I think we should either fix them, or if that's not feasible, we accept the worse codegen at -O0 and introduce re-canonicalization either through MIRBuilder or a separate pass when optimizations are enabled.
Agreed.
Some forms of an instruction, like certain I_CMP predicates, may not be legal and the target might want to transform them into a sequence of compares for example.
If we agree that an input canonicalization pass is needed then doing this there adds much less complexity than teaching tablegen that G_ICMPs are special (or somehow adding legalisation rules to tablegen?).
If we agree that an input canonicalization pass is needed then doing this there adds much less complexity than teaching tablegen that G_ICMPs are special (or somehow adding legalisation rules to tablegen?).
Agree.
I'd like to revive this patch, as the other IR canonicalization pass is much more controversial.
I still believe we shouldn't do that here and more generally O0 should be kept out of doing things on the code.
My main concern is if we allow this kind of combine here, where do we draw the line for future combines?
Thus, I would rather keep this pass dumb and bite the bullet at O0.