This is an archive of the discontinued LLVM Phabricator instance.

[X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g
ClosedPublic

Authored by MaskRay on Jan 12 2021, 12:15 PM.

Details

Summary

Fix PR48742: the D75203 assembler optimization
locates MCRelaxableFragment's within two MCSymbol's and relaxes some
MCRelaxableFragment's to reduce the size of a MCAlignFragment.
A -g build has more MCSymbol's and therefore may have different assembler output
(e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g).

.p2align 4, 0x90 is common due to loops. For a larger program, with a
lot of temporary labels, the assembly output difference is somewhat
destined. The cost seems to overweigh the benefits so we default to
-x86-pad-for-align=false until the heuristic is improved.

Diff Detail

Event Timeline

MaskRay created this revision.Jan 12 2021, 12:15 PM
MaskRay requested review of this revision.Jan 12 2021, 12:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 12:15 PM

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I can disable the other one, too. Due to if (!X86PadForAlign && !X86PadForBranchAlign), disabling one suffices.

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

I explained the cause in the description: "A -g build has more MCSymbol's and therefore may have different assembler output (e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g)."

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

I explained the cause in the description: "A -g build has more MCSymbol's and therefore may have different assembler output (e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g)."

This doesn't get at the root cause though. Are those labels doing anything in the debug build? Why are they emitted? Can they be reasonably removed?

skan added a comment.Jan 12 2021, 6:26 PM

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I can disable the other one, too. Due to if (!X86PadForAlign && !X86PadForBranchAlign), disabling one suffices.

It looks good to me either to turn off both options by default or just turn off x86-pad-for-align, since x86-pad-for-branch-align doesn't work
indeed if x86-align-branch-boundary or x86-align-branch is disabled. If you prefer to turn off both, remember to set x86-pad-for-branch-align to true when both x86-align-branch-boundary and x86-align-branch have valid non-zero value.

skan accepted this revision.Jan 13 2021, 5:53 PM

LGTM

This revision is now accepted and ready to land.Jan 13 2021, 5:53 PM
MaskRay edited the summary of this revision. (Show Details)Jan 14 2021, 2:50 PM
MaskRay added a subscriber: jyknight.

When -mbranches-within-32B-boundaries (to mitigate microcode update for Intel JCC Erratum) is used, there are many alignment fragments. I think D75203 has advantage in that case.
In the absence of -mbranches-within-32B-boundaries, I don't think there is demonstrable improvement. I think the assembler relaxation does not pull its weight.
I'll wait another two days.

This doesn't get at the root cause though. Are those labels doing anything in the debug build? Why are they emitted? Can they be reasonably removed?

The debug sections refer to offsets in the code, and thus need labels. They can't be removed.

But (as per my comment on the old review), the optimization COULD have code added to allow those debug labels to be moved, thus allowing padding of instructions not to stop at such a label, thus fixing the codegen difference with -g. It just doesn't look trivial to figure out when that should be allowed, though.

I'm not convinced that disabling these optimizations is warranted. I'm not actively opposing the patch, just want to list my concerns for the record.

As a general matter, -O3 -g means "perform optimization while preserving the best debug experience we can", not "strictly w/o reducing debug quality optimize as best as possible". We have numerous places in the optimizer that we perform optimizations that destroy debug information. We also have lots of places - one use restrictions which haven't been audited for instance - where the presence of debug info restricts optimization. We treat the later as bugs, not the former.

It's not clear to be me why this particular optimization should be disabled just because it happens to emit different code w/ and w/o debuging enabled.

The bug (https://bugs.llvm.org/show_bug.cgi?id=42138) that triggered this review appears to be an automated testing framework. I fully understand the value of that type of automatic bug discovery, but when the framework reports something which is not a bug, the framework should be fixed. I believe this is a bug in the reporting framework, not the assembler.

I'm not convinced that disabling these optimizations is warranted. I'm not actively opposing the patch, just want to list my concerns for the record.

As a general matter, -O3 -g means "perform optimization while preserving the best debug experience we can", not "strictly w/o reducing debug quality optimize as best as possible". We have numerous places in the optimizer that we perform optimizations that destroy debug information. We also have lots of places - one use restrictions which haven't been audited for instance - where the presence of debug info restricts optimization. We treat the later as bugs, not the former.

It's not clear to be me why this particular optimization should be disabled just because it happens to emit different code w/ and w/o debuging enabled.

The bug (https://bugs.llvm.org/show_bug.cgi?id=42138) that triggered this review appears to be an automated testing framework. I fully understand the value of that type of automatic bug discovery, but when the framework reports something which is not a bug, the framework should be fixed. I believe this is a bug in the reporting framework, not the assembler.

Admittedly I don't fully understand this bug (I'd /love/ a concrete, small example (short assembly file, few command lines) showing how an extra label (happy for the test not to contain full/complete DWARF, I'll happily believe the extra labels needed to describe scopes, variable locations, etc)), but it is pretty important that turning on debug info does not change the generated machine code - otherwise there's the chance of heisenbugs.

If this is a case where enabling debug info changes the selected machine code, that is a bug and one worth fixing one way or another. If the optimization can't be performed in the presence of extra labels, that's something we should figure out - perhaps there's some way to allow the relaxation even in the presence of the labels & cause the labels to be elided/removed, degrading debugging in some way. But that sounds... complicated. If those backing the optimization aren't willing to sign up to do that work, then I think it's plausible to suggest that the optimization isn't feasible to add at this time.

... is pretty important that turning on debug info does not change the generated machine code - otherwise there's the chance of heisenbugs.

If this is a case where enabling debug info changes the selected machine code, that is a bug and one worth fixing one way or another.

David, thanks for chiming in here. Reading your wording made me realize I was wrong in my take here. I had been thinking of this as a debug info quality problem, but you're right, that's not what is actually going on here.

If those backing the optimization aren't willing to sign up to do that work, then I think it's plausible to suggest that the optimization isn't feasible to add at this time.

I don't have the bandwidth to take this on.

This does leave us in an unfortunate place where to mitigate the JCC errata performance issues you need to enable a flag which introduces the heisenbugs David mentions, but I really don't see a way around that. That's not an argument to leave all users effected by the issue.

MaskRay added a comment.EditedJan 14 2021, 9:58 PM

https://bugs.llvm.org/show_bug.cgi?id=42138 was a BranchFolding bug which was probably found by an automating framework. @condy reopened it in #c13 because the symptom looks similar (his case is from real applications. The C example in the first comment can be used to observe -g vs non-g difference).
I closed 42138 in favor of the X86AsmBackend dedicated https://bugs.llvm.org//show_bug.cgi?id=48742 .

I think it is possible to repair the optimization. But as https://reviews.llvm.org/D75203#2496082 and the previous few comments said, doing it without causing -g vs non-g difference is difficult.
When the optimization gets improved, I have no problem re-enabling -x86-pad-for-align=true. The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

The performance impact of the microcode fix were highly variable depending on the exact details of each workload. Mostly barely moved, some really suffered. I know of a couple of organizations using the mitigation, but I agree with your general point about this being something that decays in value with time. Hopefully, in a few years, we can stop talking about this.

The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

The performance impact of the microcode fix were highly variable depending on the exact details of each workload. Mostly barely moved, some really suffered. I know of a couple of organizations using the mitigation, but I agree with your general point about this being something that decays in value with time. Hopefully, in a few years, we can stop talking about this.

Yeah hopefully...

I'll push this tomorrow.

fhahn added a subscriber: fhahn.Jun 17 2021, 4:39 AM

We also had user reports hitting the issue fixed by the change in the default. I added a relatively small test case that shows the binary differences caused by a single .loc: 0bd5bbb31e03 Perhaps this is helpful for anyone wanting to take a closer look at the underlying issue.