This change adds a pass (which will be currently turned off by default) that can change
byte and word instructions into zero extending instructions. This is an attempt to improve
performance by eliminating some penalties associated with the byte and word versions
of load instructions only at this time. My vision, however, is that this should be able to
be extended over time to provide much more optimization of byte and word instructions
into more efficient forms.
This is intended to eventually solve: https://llvm.org/bugs/show_bug.cgi?id=23155
but won't until turned on by default.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Kevin -
Thanks for working on this! Given that the motivation / direction has already been discussed and that this is off by default, I see no reason not to commit this ASAP. We can test / benchmark / evolve it faster in trunk vs. privately.
I'm cc'ing some of the people from the bug report discussions just so they are aware of this pass.
Quick question: Is this going to get rid of some of the partial reg stall code that we've already got in the source base?
Sanjay - Thanks for the encouraging words. I'll be happy to commit whenever folks think that is appropriate.
Eric - My expectation is this could be used both to get rid or partial stall code in other places, but I really don't know where those other places are at this time. If you have suggestions for places to look at for that, I'd like to look into fixing them using this. It may be a bit before it can fix all issues. The legality proving needs some improvement, not live really isn't quite the right proof, but it does catch important cases, and it uses existing LLVM infrastructure.
There are a few spots that I know of:
- X86TargetLowering::EmitCmp() has the change introduced with http://reviews.llvm.org/rL195496 .
- The ExecutionDepsFix pass uses getPartialRegUpdateClearance() and getUndefRegClearance() to insert xor insts ( https://llvm.org/bugs/show_bug.cgi?id=22024 ).
- PerformTargetShuffleCombine() has a comment about movsd vs. blendpd.
There are probably other cases in X86ISelLowering. It would be good to consolidate those kinds of transforms in a machine pass. So it may be worth renaming this to something more general (FixupPartialRegStalls?) before checking it in just to save on some naming churn.
I named it X86FixupBWInsts because the analysis needed to prove safety was going to be specific to the byte and word insts; that is to prove that only the specific byte/word reg that was the destination of the instruction was going to be "live" out of the instruction, so that these could be changed in any way that was useful. One example would be to change addw reg16,small_immediate into add reg32,small_immediate, as this could provide a code size saving. And I hadn't intended for it to be able to resolve partial updates of XMM or other SIMD types instructions, as the analysis needed for that is different. Given those considerations, I thought X86FixupBWInsts was a more appropriate name. But if after hearing this explanation, you (or others) feel strongly about using a different file name, I will change it.
Naming is hard and shouldn't block anything here. That said, what do you think about using this to get rid of some of the reg stall code that was listed? :)
I am in favor of it. I think the staging should be:
- Get this in.
- Get it turned on by default.
- Pull the other changes out, 1 by 1, making sure that this catches those cases, improving the safety analysis, or heuristics as necessary to make sure that performance is maintained/improved.
Each of those stages seems like it ought to be a separate change, following on to this.
Seems reasonable to me. Note that I haven't looked at it much more than the few inline comments I just gave, but it seems good to me - especially as it isn't turned on by default.
lib/Target/X86/X86FixupBWInsts.cpp | ||
---|---|---|
137–138 ↗ | (On Diff #47322) | Can you elaborate on this as far as what kind of interface you'd need? |
184 ↗ | (On Diff #47322) | Comment here as to why this is a sufficient check please? I've seen the comment above, but... |
189 ↗ | (On Diff #47322) | Extra whitespace. |
205 ↗ | (On Diff #47322) | Delete. |
216 ↗ | (On Diff #47322) | Delete. |
lib/Target/X86/X86FixupBWInsts.cpp | ||
---|---|---|
10 ↗ | (On Diff #47322) | The explanation should use doxygen style comments. |
18 ↗ | (On Diff #47322) | Please explain why only those instructions can be affected. |
29 ↗ | (On Diff #47322) | I have to admit I don’t remember what the documentation says, but when we write a 16-bit register we may already have partial register dependencies, right? What I want to point out is that this comment seems to go against one of the initial statement that this pass will avoid false-dependencies on the upper portions of the registers. I think this deserves more explanation like when we do kill those false-dependencies and when we might introduce new ones. No need to be fancy here, I believe something simply stating that some instructions set the upper bits and some don’t and thus have false-dependencies would be sufficient. |
73 ↗ | (On Diff #47322) | \p SuerDestReg |
77 ↗ | (On Diff #47322) | I am assuming we want something like that as well |
90 ↗ | (On Diff #47322) | Either don’t put this comment or explain why we need it. |
101 ↗ | (On Diff #47322) | We usually write the comment with doxygen style and before the field. |
132 ↗ | (On Diff #47322) | Don’t put doxygen comments on both the declaration and definition. |
197 ↗ | (On Diff #47322) | We need to copy the implicit operands as well. |
Oh, and don't take my approval as enough. This touches enough that a cursory glance is insufficient, please wait for someone like Quentin to look at it.
Thanks.
Thanks for the comments. I'll work on fixing all those, and have a new version early tomorrow.
Addressed inline comments.
lib/Target/X86/X86FixupBWInsts.cpp | ||
---|---|---|
198 ↗ | (On Diff #47670) | I think the loop in lines 217-219 already gets all the implicit ops. I looked at the code for copyImplicitOps, which looks like this: i != e; ++i) { const MachineOperand &MO = MI->getOperand(i); if ((MO.isReg() && MO.isImplicit()) || MO.isRegMask()) addOperand(MF, MO); } |
LGTM.
Thanks,
-Quentin
lib/Target/X86/X86FixupBWInsts.cpp | ||
---|---|---|
198 ↗ | (On Diff #47670) | Thanks for double checking. |