This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/MCTargetDesc/
-
Target/
-
X86/
-
MCTargetDesc/
-
X86AsmBackend.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
align-via-padding-corner.s
-
align-via-padding.s
-
align-via-relaxation.s
-
prefix-padding-32.s
-
prefix-padding-64.s

Differential D94542

[X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g
ClosedPublic

Authored by MaskRay on Jan 12 2021, 12:15 PM.

Download Raw Diff

Details

Reviewers

reames
skan

Commits

rGa048ce13e32d: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or…

Summary

Fix PR48742: the D75203 assembler optimization
locates MCRelaxableFragment's within two MCSymbol's and relaxes some
MCRelaxableFragment's to reduce the size of a MCAlignFragment.
A -g build has more MCSymbol's and therefore may have different assembler output
(e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g).

.p2align 4, 0x90 is common due to loops. For a larger program, with a
lot of temporary labels, the assembly output difference is somewhat
destined. The cost seems to overweigh the benefits so we default to
-x86-pad-for-align=false until the heuristic is improved.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Jan 12 2021, 12:15 PM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptJan 12 2021, 12:15 PM

MaskRay requested review of this revision.Jan 12 2021, 12:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 12:15 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B84896: Diff 316195.Jan 12 2021, 12:58 PM

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

In D94542#2494774, @reames wrote:

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I can disable the other one, too. Due to if (!X86PadForAlign && !X86PadForBranchAlign), disabling one suffices.

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

I explained the cause in the description: "A -g build has more MCSymbol's and therefore may have different assembler output (e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g)."

In D94542#2494838, @MaskRay wrote:

In D94542#2494774, @reames wrote:

I also ask that a bit more background be given to justify this change. I found the bug (https://bugs.llvm.org/show_bug.cgi?id=42138#c13), but that gives no information about the cause of the assembly difference. Has anyone examined the cause of the labels being emitted in debug mode to see if they're necessary/useful?

I explained the cause in the description: "A -g build has more MCSymbol's and therefore may have different assembler output (e.g. a MCRelaxableFragment (jmp) may have 5 bytes with -O1 while 2 bytes with -O1 -g)."

This doesn't get at the root cause though. Are those labels doing anything in the debug build? Why are they emitted? Can they be reasonably removed?

In D94542#2494838, @MaskRay wrote:

In D94542#2494774, @reames wrote:

Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.

I can disable the other one, too. Due to if (!X86PadForAlign && !X86PadForBranchAlign), disabling one suffices.

It looks good to me either to turn off both options by default or just turn off x86-pad-for-align, since x86-pad-for-branch-align doesn't work
indeed if x86-align-branch-boundary or x86-align-branch is disabled. If you prefer to turn off both, remember to set x86-pad-for-branch-align to true when both x86-align-branch-boundary and x86-align-branch have valid non-zero value.

LGTM

This revision is now accepted and ready to land.Jan 13 2021, 5:53 PM

MaskRay edited the summary of this revision. (Show Details)Jan 14 2021, 2:50 PM

MaskRay added a subscriber: jyknight.

When -mbranches-within-32B-boundaries (to mitigate microcode update for Intel JCC Erratum) is used, there are many alignment fragments. I think D75203 has advantage in that case.
In the absence of -mbranches-within-32B-boundaries, I don't think there is demonstrable improvement. I think the assembler relaxation does not pull its weight.
I'll wait another two days.

This doesn't get at the root cause though. Are those labels doing anything in the debug build? Why are they emitted? Can they be reasonably removed?

The debug sections refer to offsets in the code, and thus need labels. They can't be removed.

But (as per my comment on the old review), the optimization COULD have code added to allow those debug labels to be moved, thus allowing padding of instructions not to stop at such a label, thus fixing the codegen difference with -g. It just doesn't look trivial to figure out when that should be allowed, though.

I'm not convinced that disabling these optimizations is warranted. I'm not actively opposing the patch, just want to list my concerns for the record.

As a general matter, -O3 -g means "perform optimization while preserving the best debug experience we can", not "strictly w/o reducing debug quality optimize as best as possible". We have numerous places in the optimizer that we perform optimizations that destroy debug information. We also have lots of places - one use restrictions which haven't been audited for instance - where the presence of debug info restricts optimization. We treat the later as bugs, not the former.

It's not clear to be me why this particular optimization should be disabled just because it happens to emit different code w/ and w/o debuging enabled.

The bug (https://bugs.llvm.org/show_bug.cgi?id=42138) that triggered this review appears to be an automated testing framework. I fully understand the value of that type of automatic bug discovery, but when the framework reports something which is not a bug, the framework should be fixed. I believe this is a bug in the reporting framework, not the assembler.

In D94542#2500058, @reames wrote:

I'm not convinced that disabling these optimizations is warranted. I'm not actively opposing the patch, just want to list my concerns for the record.

As a general matter, -O3 -g means "perform optimization while preserving the best debug experience we can", not "strictly w/o reducing debug quality optimize as best as possible". We have numerous places in the optimizer that we perform optimizations that destroy debug information. We also have lots of places - one use restrictions which haven't been audited for instance - where the presence of debug info restricts optimization. We treat the later as bugs, not the former.

It's not clear to be me why this particular optimization should be disabled just because it happens to emit different code w/ and w/o debuging enabled.

The bug (https://bugs.llvm.org/show_bug.cgi?id=42138) that triggered this review appears to be an automated testing framework. I fully understand the value of that type of automatic bug discovery, but when the framework reports something which is not a bug, the framework should be fixed. I believe this is a bug in the reporting framework, not the assembler.

Admittedly I don't fully understand this bug (I'd /love/ a concrete, small example (short assembly file, few command lines) showing how an extra label (happy for the test not to contain full/complete DWARF, I'll happily believe the extra labels needed to describe scopes, variable locations, etc)), but it is pretty important that turning on debug info does not change the generated machine code - otherwise there's the chance of heisenbugs.

If this is a case where enabling debug info changes the selected machine code, that is a bug and one worth fixing one way or another. If the optimization can't be performed in the presence of extra labels, that's something we should figure out - perhaps there's some way to allow the relaxation even in the presence of the labels & cause the labels to be elided/removed, degrading debugging in some way. But that sounds... complicated. If those backing the optimization aren't willing to sign up to do that work, then I think it's plausible to suggest that the optimization isn't feasible to add at this time.

In D94542#2500116, @dblaikie wrote:

In D94542#2500058, @reames wrote:

... is pretty important that turning on debug info does not change the generated machine code - otherwise there's the chance of heisenbugs.

If this is a case where enabling debug info changes the selected machine code, that is a bug and one worth fixing one way or another.

David, thanks for chiming in here. Reading your wording made me realize I was wrong in my take here. I had been thinking of this as a debug info quality problem, but you're right, that's not what is actually going on here.

If those backing the optimization aren't willing to sign up to do that work, then I think it's plausible to suggest that the optimization isn't feasible to add at this time.

I don't have the bandwidth to take this on.

This does leave us in an unfortunate place where to mitigate the JCC errata performance issues you need to enable a flag which introduces the heisenbugs David mentions, but I really don't see a way around that. That's not an argument to leave all users effected by the issue.

https://bugs.llvm.org/show_bug.cgi?id=42138 was a BranchFolding bug which was probably found by an automating framework. @condy reopened it in #c13 because the symptom looks similar (his case is from real applications. The C example in the first comment can be used to observe -g vs non-g difference).
I closed 42138 in favor of the X86AsmBackend dedicated https://bugs.llvm.org//show_bug.cgi?id=48742 .

I think it is possible to repair the optimization. But as https://reviews.llvm.org/D75203#2496082 and the previous few comments said, doing it without causing -g vs non-g difference is difficult.
When the optimization gets improved, I have no problem re-enabling -x86-pad-for-align=true. The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

In D94542#2500145, @MaskRay wrote:

The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

The performance impact of the microcode fix were highly variable depending on the exact details of each workload. Mostly barely moved, some really suffered. I know of a couple of organizations using the mitigation, but I agree with your general point about this being something that decays in value with time. Hopefully, in a few years, we can stop talking about this.

In D94542#2501267, @reames wrote:

In D94542#2500145, @MaskRay wrote:

The value of -mbranches-within-32B-boundaries also dilutes over time (it mitigates issues for some Skylake architectures).
(Surprising to me, this feature has been available for one year but I don't see a lot of adoption).

The performance impact of the microcode fix were highly variable depending on the exact details of each workload. Mostly barely moved, some really suffered. I know of a couple of organizations using the mitigation, but I agree with your general point about this being something that decays in value with time. Hopefully, in a few years, we can stop talking about this.

Yeah hopefully...

I'll push this tomorrow.

Closed by commit rGa048ce13e32d: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or… (authored by MaskRay). · Explain WhyJan 16 2021, 4:40 PM

This revision was automatically updated to reflect the committed changes.

MaskRay added a commit: rGa048ce13e32d: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or….

fhahn mentioned this in rG0bd5bbb31e03: [X86] Add test showing binary differences with -x86-pad-for-align..Jun 17 2021, 4:38 AM

We also had user reports hitting the issue fixed by the change in the default. I added a relatively small test case that shows the binary differences caused by a single .loc: 0bd5bbb31e03 Perhaps this is helpful for anyone wanting to take a closer look at the underlying issue.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

MCTargetDesc/

X86AsmBackend.cpp

5 lines

test/

MC/

X86/

align-via-padding-corner.s

2 lines

align-via-padding.s

2 lines

align-via-relaxation.s

16 lines

prefix-padding-32.s

2 lines

prefix-padding-64.s

2 lines

Diff 316195

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	cl::desc(
"assumptions about labels corresponding to particular instructions, "		"assumptions about labels corresponding to particular instructions, "
"and should be used with caution."));		"and should be used with caution."));

cl::opt<unsigned> X86PadMaxPrefixSize(		cl::opt<unsigned> X86PadMaxPrefixSize(
"x86-pad-max-prefix-size", cl::init(0),		"x86-pad-max-prefix-size", cl::init(0),
cl::desc("Maximum number of prefixes to use for padding"));		cl::desc("Maximum number of prefixes to use for padding"));

cl::opt<bool> X86PadForAlign(		cl::opt<bool> X86PadForAlign(
"x86-pad-for-align", cl::init(true), cl::Hidden,		"x86-pad-for-align", cl::init(false), cl::Hidden,
cl::desc("Pad previous instructions to implement align directives"));		cl::desc("Pad previous instructions to implement align directives"));

cl::opt<bool> X86PadForBranchAlign(		cl::opt<bool> X86PadForBranchAlign(
"x86-pad-for-branch-align", cl::init(true), cl::Hidden,		"x86-pad-for-branch-align", cl::init(true), cl::Hidden,
cl::desc("Pad previous instructions to implement branch alignment"));		cl::desc("Pad previous instructions to implement branch alignment"));

class X86ELFObjectWriter : public MCELFObjectTargetWriter {		class X86ELFObjectWriter : public MCELFObjectTargetWriter {
public:		public:
▲ Show 20 Lines • Show All 831 Lines • ▼ Show 20 Lines	void X86AsmBackend::finishLayout(MCAssembler const &Asm,
// nop bytes required for code alignment. The actual win is in reducing		// nop bytes required for code alignment. The actual win is in reducing
// instruction count, not number of bytes. Modern X86-64 can easily end up		// instruction count, not number of bytes. Modern X86-64 can easily end up
// decode limited. It is often better to reduce the number of instructions		// decode limited. It is often better to reduce the number of instructions
// (i.e. eliminate nops) even at the cost of increasing the size and		// (i.e. eliminate nops) even at the cost of increasing the size and
// complexity of others.		// complexity of others.
if (!X86PadForAlign && !X86PadForBranchAlign)		if (!X86PadForAlign && !X86PadForBranchAlign)
return;		return;

		// The processed regions are delimitered by LabeledFragments. -g may have more
		// MCSymbols and therefore different relaxation results. X86PadForAlign is
		// disabled by default to eliminate the -g vs non -g difference.
DenseSet<MCFragment *> LabeledFragments;		DenseSet<MCFragment *> LabeledFragments;
for (const MCSymbol &S : Asm.symbols())		for (const MCSymbol &S : Asm.symbols())
LabeledFragments.insert(S.getFragment(false));		LabeledFragments.insert(S.getFragment(false));

for (MCSection &Sec : Asm) {		for (MCSection &Sec : Asm) {
if (!Sec.getKind().isText())		if (!Sec.getKind().isText())
continue;		continue;

▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-via-padding-corner.s

	# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=5 \| llvm-objdump -d - \| FileCheck %s			# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=5 -x86-pad-for-align=1 \| llvm-objdump -d - \| FileCheck %s


	# The first test check the correctness cornercase - can't add prefixes on a			# The first test check the correctness cornercase - can't add prefixes on a
	# instruction following by a prefix.			# instruction following by a prefix.
	.globl labeled_prefix_test			.globl labeled_prefix_test
	labeled_prefix_test:			labeled_prefix_test:
	# CHECK: 0: 2e 2e 2e 2e 2e e9 06 00 00 00 jmp			# CHECK: 0: 2e 2e 2e 2e 2e e9 06 00 00 00 jmp
	# CHECK: a: 3e e9 00 00 00 00 jmp			# CHECK: a: 3e e9 00 00 00 00 jmp
	Show All 20 Lines

llvm/test/MC/X86/align-via-padding.s

	# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=5 \| llvm-objdump -d --section=.text - \| FileCheck %s			# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=5 -x86-pad-for-align=1 \| llvm-objdump -d - \| FileCheck %s

	# This test file highlights the interactions between prefix padding and			# This test file highlights the interactions between prefix padding and
	# relaxation padding.			# relaxation padding.

	.file "test.c"			.file "test.c"
	.text			.text
	.section .text			.section .text
	# We can both relax and prefix for padding purposes, but the moment, we			# We can both relax and prefix for padding purposes, but the moment, we
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-via-relaxation.s

	# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu -x86-pad-max-prefix-size=0 %s \| llvm-objdump -d --section=.text - \| FileCheck %s			# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu -x86-pad-max-prefix-size=0 %s \| llvm-objdump -d - \| FileCheck %s --check-prefix=NOPAD
				# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu -x86-pad-max-prefix-size=0 -x86-pad-for-align=1 %s \| llvm-objdump -d - \| FileCheck %s

	# This test exercises only the padding via relaxation logic. The interaction			# This test exercises only the padding via relaxation logic. The interaction
	# etween prefix padding and relaxation logic can be seen in align-via-padding.s			# etween prefix padding and relaxation logic can be seen in align-via-padding.s

	.file "test.c"			.file "test.c"
	.text			.text
	.section .text			.section .text

				# NOPAD-LABEL: <.text>:
				# NOPAD-NEXT: 0: eb 1f jmp 0x21 <foo>
				# NOPAD-NEXT: 2: eb 1d jmp 0x21 <foo>
				# NOPAD-NEXT: 4: eb 1b jmp 0x21 <foo>
				# NOPAD-NEXT: 6: eb 19 jmp 0x21 <foo>
				# NOPAD-NEXT: 8: eb 17 jmp 0x21 <foo>
				# NOPAD-NEXT: a: eb 15 jmp 0x21 <foo>
				# NOPAD-NEXT: c: eb 13 jmp 0x21 <foo>
				# NOPAD-NEXT: e: 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
				# NOPAD-NEXT: 1d: 0f 1f 00 nopl (%rax)
				# NOPAD-NEXT: 20: cc int3

	# Demonstrate that we can relax instructions to provide padding, not			# Demonstrate that we can relax instructions to provide padding, not
	# just insert nops. jmps are being used for ease of demonstration.			# just insert nops. jmps are being used for ease of demonstration.
	# CHECK: .text			# CHECK: .text
	# CHECK: 0: eb 1f jmp 0x21 <foo>			# CHECK: 0: eb 1f jmp 0x21 <foo>
	# CHECK: 2: e9 1a 00 00 00 jmp 0x21 <foo>			# CHECK: 2: e9 1a 00 00 00 jmp 0x21 <foo>
	# CHECK: 7: e9 15 00 00 00 jmp 0x21 <foo>			# CHECK: 7: e9 15 00 00 00 jmp 0x21 <foo>
	# CHECK: c: e9 10 00 00 00 jmp 0x21 <foo>			# CHECK: c: e9 10 00 00 00 jmp 0x21 <foo>
	# CHECK: 11: e9 0b 00 00 00 jmp 0x21 <foo>			# CHECK: 11: e9 0b 00 00 00 jmp 0x21 <foo>
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/MC/X86/prefix-padding-32.s

	# RUN: llvm-mc -filetype=obj -triple i386-pc-linux-gnu %s -x86-pad-max-prefix-size=15 \| llvm-objdump -d --section=.text - \| FileCheck %s			# RUN: llvm-mc -filetype=obj -triple i386-pc-linux-gnu %s -x86-pad-max-prefix-size=15 -x86-pad-for-align=1 \| llvm-objdump -d - \| FileCheck %s

	# Check prefix padding generation for all cases on 32 bit x86.			# Check prefix padding generation for all cases on 32 bit x86.

	# CHECK: 1: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 e1 01 00 00 00 andl $1, %ecx			# CHECK: 1: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 e1 01 00 00 00 andl $1, %ecx
	# CHECK: 10: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 21 01 00 00 00 andl $1, %ds:(%ecx)			# CHECK: 10: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 21 01 00 00 00 andl $1, %ds:(%ecx)
	# CHECK: 1f: 2e 2e 2e 2e 2e 2e 2e 2e 2e 81 21 01 00 00 00 andl $1, %cs:(%ecx)			# CHECK: 1f: 2e 2e 2e 2e 2e 2e 2e 2e 2e 81 21 01 00 00 00 andl $1, %cs:(%ecx)
	# CHECK: 2e: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 21 01 00 00 00 andl $1, %ds:(%ecx)			# CHECK: 2e: 3e 3e 3e 3e 3e 3e 3e 3e 3e 81 21 01 00 00 00 andl $1, %ds:(%ecx)
	# CHECK: 3d: 26 26 26 26 26 26 26 26 26 81 21 01 00 00 00 andl $1, %es:(%ecx)			# CHECK: 3d: 26 26 26 26 26 26 26 26 26 81 21 01 00 00 00 andl $1, %es:(%ecx)
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/MC/X86/prefix-padding-64.s

	# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=15 \| llvm-objdump -d --section=.text - \| FileCheck %s			# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=15 -x86-pad-for-align=1 \| llvm-objdump -d - \| FileCheck %s

	# Check prefix padding generation for all cases on 64 bit x86.			# Check prefix padding generation for all cases on 64 bit x86.

	# CHECK: 0: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 e1 00 00 00 00 andq $0, %rcx			# CHECK: 0: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 e1 00 00 00 00 andq $0, %rcx
	# CHECK: f: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 21 00 00 00 00 andq $0, %cs:(%rcx)			# CHECK: f: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 21 00 00 00 00 andq $0, %cs:(%rcx)
	# CHECK: 1e: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 21 00 00 00 00 andq $0, %cs:(%rcx)			# CHECK: 1e: 2e 2e 2e 2e 2e 2e 2e 2e 48 81 21 00 00 00 00 andq $0, %cs:(%rcx)
	# CHECK: 2d: 3e 3e 3e 3e 3e 3e 3e 3e 48 81 21 00 00 00 00 andq $0, %ds:(%rcx)			# CHECK: 2d: 3e 3e 3e 3e 3e 3e 3e 3e 48 81 21 00 00 00 00 andq $0, %ds:(%rcx)
	# CHECK: 3c: 26 26 26 26 26 26 26 26 48 81 21 00 00 00 00 andq $0, %es:(%rcx)			# CHECK: 3c: 26 26 26 26 26 26 26 26 48 81 21 00 00 00 00 andq $0, %es:(%rcx)
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -gClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316195

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-via-padding-corner.s

llvm/test/MC/X86/align-via-padding.s

llvm/test/MC/X86/align-via-relaxation.s

llvm/test/MC/X86/prefix-padding-32.s

llvm/test/MC/X86/prefix-padding-64.s

[X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g
ClosedPublic