This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't disable slow INC/DEC if optimizing for size
ClosedPublic

Authored by craig.topper on Aug 26 2017, 12:12 AM.

Details

Summary

Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 26 2017, 12:12 AM
spatel edited edge metadata.Aug 26 2017, 8:19 AM

This patch intersects with a topic that @RKSimon was hoping to discuss at this year's dev meeting (it was proposed as a BOF for all targets, but I think we were too late to get an official BOF; also, I think Simon is out of the office for a week, so I don't expect any immediate comments from him):
Do we need fake "features" like FeatureSlowIncDec in isel? Can we just uniformly isel inc/dec and then transform it to add/sub in MI based on the CPU's scheduler model? Or maybe the opposite: we isel add/sub as the default and then convert to inc/dec?

This could be a MachineCombiner transform, or it might be easier to have a single-purpose inc/dec conversion pass that uses properties of the scheduler model to drive the decision.

test/CodeGen/X86/slow-incdec.ll
2–5

I don't know why these tests are this complicated. We can check for inc/dec output with tests as simple as:

$ cat incdec.ll 
define i32 @inc(i32 %x) {
  %r = add i32 %x, 1
  ret i32 %r
}
define i32 @dec(i32 %x) {
  %r = add i32 %x, -1
  ret i32 %r
}

$ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=-slow-incdec | grep eax | grep -v mov
incl %eax
decl %eax
$ ./llc -o - incdec.ll -mtriple=i386-unknown-unkown -mattr=+slow-incdec | grep eax | grep -v mov
addl $1, %eax
addl $-1, %eax

Use a simpler test case.

I'd also like to see less feature flags. Particularly the ones that encode the CPU name into a flag.

Is there anything in the CPU scheduler model that captures INC/DEC being slow. I think its only "slow" because it doesn't update the carry flag and creates false dependencies later.

I also think it should be disabled on earlier CPUs than Haswell. I don't think anything change in microarchitecture at Haswell that made it different. gcc disables INC/DEC at least back to "core2".

I'd also like to see less feature flags. Particularly the ones that encode the CPU name into a flag.

Yes, those are extra wrong. :)

Is there anything in the CPU scheduler model that captures INC/DEC being slow. I think its only "slow" because it doesn't update the carry flag and creates false dependencies later.

There's nothing in there currently that I can see. We could transfer fake feature bits into the SchedMachineModel (a rename of that object might be due at that point) or something off to the side of it?

I also think it should be disabled on earlier CPUs than Haswell. I don't think anything change in microarchitecture at Haswell that made it different. gcc disables INC/DEC at least back to "core2".

Yes, I doubt we'll ever get a complete 1-to-1 mapping from uarch characteristic to the output code we want to see, so we'll end up fudging it. But at least if it's at the MI layer, we'll have a more principled approach about when it gets activated?

Side note specifically about this one: the goal of -Os (optsize) is fuzzy. Where to draw the line between size and perf is never clear to me. I thought -Oz was defined more concretely (reduce size no matter what it does to perf), but even that is not clear based on the current text ("Like -Os but reduces code size further"). So I don't object to this patch if it makes llvm behavior less suprising vs. gcc (or there's some measurable win somewhere?), but if we can move this case to MI just as easily and/or create some infrastructure to get all of the fake features moved over, I think that's preferable.

The surprising thing to me, and the reason I even looked at this is that we gave up the size optimization of MOV32r1/MOV32r_1 at -Os when the SlowIncDec flag is set.

RKSimon edited edge metadata.Sep 8 2017, 6:36 AM

Submit your changes to the slow-incdec.ll tests so this patch shows the diff?

test/CodeGen/X86/slow-incdec.ll
3

Add a common -check-prefix=CHECK check

Test has been pre-commited so now the test diff is relative to just this patch.

chandlerc edited edge metadata.Sep 8 2017, 11:08 AM

I generally like the direction, peanut gallery comment. No need to wait for me to land or anything...

lib/Target/X86/X86InstrInfo.td
911

Maybe name this AvoidIncDec or NoIncDec or DisableIncDec or ... rather than explaining the rationale in the name?

RKSimon accepted this revision.Sep 9 2017, 3:54 AM

LGTM - Chandler's suggestion of renaming NotSlowIncDec_Or_OptForSize makes sense as well

This revision is now accepted and ready to land.Sep 9 2017, 3:54 AM

I'll rename, but the polarity of those suggestions is backwards. It needs to be something like UseIncDec or DontAvoidIncDec.

This revision was automatically updated to reflect the committed changes.