Page MenuHomePhabricator

[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
ClosedPublic

Authored by craig.topper on May 2 2019, 4:24 PM.

Details

Summary

INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.May 2 2019, 4:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2019, 4:24 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Overall, the patch looks good to me.

I also don't think that those tied operands should have had the kill flag set. But then, I don't know how important is to accurately preserve kill flags in FixupLEAs; my understanding is that FixupLEAs runs quite late in the codegen pipeline (since it is a pre-emit pass). So, I don't know if passes running after it still try to check for the presence of those flags.

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2019, 10:08 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Where did all the clang diffs come from?

Not sure. I'll fix it.

Get rid of clang stuff.

I haven't looked closely at the series of transforms that gets us here, so let me ask: would it be more efficient to produce the add/inc/dec machine instructions directly rather than LEA? Or do we do this because the 3-address opportunity helps register allocation, so this late reversal is a special-case of the more generally useful transforms that produce LEA in the 1st place?

llvm/lib/Target/X86/X86FixupLEAs.cpp
11 ↗(On Diff #198303)

Should update this comment to better match the logic:
"It replaces LEAs with ADD/INC/DEC when that is better for size/speed." ?

I haven't looked closely at the series of transforms that gets us here, so let me ask: would it be more efficient to produce the add/inc/dec machine instructions directly rather than LEA? Or do we do this because the 3-address opportunity helps register allocation, so this late reversal is a special-case of the more generally useful transforms that produce LEA in the 1st place?

I believe a lot of these are caused TwoAddressInstructionPass converting to three address sometimes when it maybe shouldn't. I think there's some overlap with the test diffs in D52109. But I don't think the change I made in that diff is correct. There appear to be some test changes here that aren't affected by that patch either. So maybe there are other issues in TwoAddressInstruction heuristics.

spatel accepted this revision.May 24 2019, 1:46 PM

I haven't looked closely at the series of transforms that gets us here, so let me ask: would it be more efficient to produce the add/inc/dec machine instructions directly rather than LEA? Or do we do this because the 3-address opportunity helps register allocation, so this late reversal is a special-case of the more generally useful transforms that produce LEA in the 1st place?

I believe a lot of these are caused TwoAddressInstructionPass converting to three address sometimes when it maybe shouldn't. I think there's some overlap with the test diffs in D52109. But I don't think the change I made in that diff is correct. There appear to be some test changes here that aren't affected by that patch either. So maybe there are other issues in TwoAddressInstruction heuristics.

Ok - these diffs look alright, so LGTM. Maybe we can adjust the earlier transforms to avoid some back-and-forth though as follow-up work.

This revision is now accepted and ready to land.May 24 2019, 1:46 PM
This revision was automatically updated to reflect the committed changes.