Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.
This appears to match gcc behavior.
Differential D37177
[X86] Don't disable slow INC/DEC if optimizing for size craig.topper on Aug 26 2017, 12:12 AM. Authored by
Details Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior.
Diff Detail Event TimelineComment Actions This patch intersects with a topic that @RKSimon was hoping to discuss at this year's dev meeting (it was proposed as a BOF for all targets, but I think we were too late to get an official BOF; also, I think Simon is out of the office for a week, so I don't expect any immediate comments from him): This could be a MachineCombiner transform, or it might be easier to have a single-purpose inc/dec conversion pass that uses properties of the scheduler model to drive the decision.
Comment Actions Use a simpler test case. I'd also like to see less feature flags. Particularly the ones that encode the CPU name into a flag. Is there anything in the CPU scheduler model that captures INC/DEC being slow. I think its only "slow" because it doesn't update the carry flag and creates false dependencies later. I also think it should be disabled on earlier CPUs than Haswell. I don't think anything change in microarchitecture at Haswell that made it different. gcc disables INC/DEC at least back to "core2". Comment Actions Yes, those are extra wrong. :)
There's nothing in there currently that I can see. We could transfer fake feature bits into the SchedMachineModel (a rename of that object might be due at that point) or something off to the side of it?
Yes, I doubt we'll ever get a complete 1-to-1 mapping from uarch characteristic to the output code we want to see, so we'll end up fudging it. But at least if it's at the MI layer, we'll have a more principled approach about when it gets activated? Side note specifically about this one: the goal of -Os (optsize) is fuzzy. Where to draw the line between size and perf is never clear to me. I thought -Oz was defined more concretely (reduce size no matter what it does to perf), but even that is not clear based on the current text ("Like -Os but reduces code size further"). So I don't object to this patch if it makes llvm behavior less suprising vs. gcc (or there's some measurable win somewhere?), but if we can move this case to MI just as easily and/or create some infrastructure to get all of the fake features moved over, I think that's preferable. Comment Actions The surprising thing to me, and the reason I even looked at this is that we gave up the size optimization of MOV32r1/MOV32r_1 at -Os when the SlowIncDec flag is set. Comment Actions Submit your changes to the slow-incdec.ll tests so this patch shows the diff?
Comment Actions I generally like the direction, peanut gallery comment. No need to wait for me to land or anything...
Comment Actions LGTM - Chandler's suggestion of renaming NotSlowIncDec_Or_OptForSize makes sense as well Comment Actions I'll rename, but the polarity of those suggestions is backwards. It needs to be something like UseIncDec or DontAvoidIncDec. |
Maybe name this AvoidIncDec or NoIncDec or DisableIncDec or ... rather than explaining the rationale in the name?