Page MenuHomePhabricator

[MC] Resolve the difference of symbols in consecutive MCDataFragements
ClosedPublic

Authored by jcai19 on Oct 24 2019, 3:03 PM.

Details

Summary

Try to resolve the difference of two symbols in consecutive MCDataFragments.
This is important for an idiom like "foo:instr; .if . - foo; instr; .endif"
(https://bugs.llvm.org/show_bug.cgi?id=43795).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jcai19 edited the summary of this revision. (Show Details)Nov 11 2019, 10:54 AM
jcai19 updated this revision to Diff 228746.Nov 11 2019, 11:55 AM
jcai19 marked an inline comment as done.
jcai19 edited the summary of this revision. (Show Details)

Fix test cases.

jcai19 updated this revision to Diff 228751.Nov 11 2019, 12:06 PM

Fix a typo.

jcai19 marked an inline comment as done.Nov 12 2019, 12:08 AM
jcai19 added inline comments.
llvm/test/MC/AsmParser/directive_if_offset.s
5 ↗(On Diff #226839)

This line is required to reproduce this issue, as it changes the subtarget and forces the creation of a new MCDataSegment (https://llvm.org/doxygen/MCObjectStreamer_8cpp_source.html#l00157). I found a similar directive for i386, .arch + cpu_type (https://sourceware.org/binutils/docs/as/i386_002dArch.html#i386_002dArch), although the parser does not support it. To have a similar test case on x86, it seems we would have to first support that directive. However, I could not find any uses of this directive in llvm or Linux kernel, so I assume it is not commonly used. Will keep the test case for arm for now.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.
    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

From the perspective of the linux kernel, is D76002 sufficient? For example if the Subtarget changing directives are used in such a way that they don't create new MCDataFragments in a sensitive location then we may not need this. For example the following will assemble with D76002

        .arch_extension sec // Outside of .text
        
        .text
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

As will:

        .text
        .arch_extension sec                
        nop
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

Could the examples in the kernel be altered to not require this? It does seem like we are writing a lot of code for a small number of easily resolved cases.

I note that it is possible to write a contrived example that this patch can't handle (needs --triple=armv8a-linux-gnu as crypto and crc are V8 extensions) although I'm not suggesting implementing support for it.

        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

if.s:11:5: error: expected absolute expression
.if . - 9997b == 0

llvm/lib/MC/MCExpr.cpp
627

With the new information coming from D70062 I think we need to make this comment more specific. For example:

When the Subtarget is changed a new MCDataFragment is created. This handles the case of 
foo: instr; .arch_extension ext; instr .if . - foo
llvm/test/MC/ARM/directive_if_offset.s
5 ↗(On Diff #228751)

Can we add a comment to explain the importance of .arch_extension, such as:
// Create a new MCDataFragment due to Subtarget change

Thanks for the summary.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.

I'll follow up on this. There's likely a lot of code that can be dropped if that's the case.

    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

From the perspective of the linux kernel, is D76002 sufficient?

I can still reproduce https://github.com/ClangBuiltLinux/linux/issues/742 with https://reviews.llvm.org/D70062 applied.

For example if the Subtarget changing directives are used in such a way that they don't create new MCDataFragments in a sensitive location then we may not need this. For example the following will assemble with D76002

        .arch_extension sec // Outside of .text
        
        .text
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

As will:

        .text
        .arch_extension sec                
        nop
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

Could the examples in the kernel be altered to not require this? It does seem like we are writing a lot of code for a small number of easily resolved cases.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

jcai19 added a comment.EditedNov 12 2019, 11:08 AM

I note that it is possible to write a contrived example that this patch can't handle (needs --triple=armv8a-linux-gnu as crypto and crc are V8 extensions) although I'm not suggesting implementing support for it.

        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

if.s:11:5: error: expected absolute expression
.if . - 9997b == 0

If we need to support cases like this, we can probably add a while loop and check if the fragments between two symbols are all MCDataFragments and sum their sizes up.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

This issue happens when the ALT_UP macro is expanded in files like arch/arm/mm/proc-v7.S. This is how I reproduce the code of interest
clang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -DASSEMBLY --target=arm-linux-gnueabihf -DLINUX_ARM_ARCH=7 -march=armv7-a -include asm/unified.h -S arch/arm/mm/proc-v7.S &> clang-proc-v7.s

Thanks for the summary.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.

I'll follow up on this. There's likely a lot of code that can be dropped if that's the case.

    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

From the perspective of the linux kernel, is D76002 sufficient?

I can still reproduce https://github.com/ClangBuiltLinux/linux/issues/742 with https://reviews.llvm.org/D70062 applied.

That's unfortunate. I've found out how to reproduce the original issue:

clang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.Sclang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.S

If I extract out just the small part of the test case it works even without 76002.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

They are there, I preprocessed to get a rather large output file including:

.globl cpu_v7_dcache_clean_area ; .align 0 ; cpu_v7_dcache_clean_area:
 9998: nop @ MP extensions imply L1 PTW
 .equ up_b_offset, 1f - 9998b ; .pushsection ".alt.smp.init", "a" ; .long 9998b ; b . + up_b_offset ; .popsection
 ret lr
1: dcache_line_size r2, r3
2: mcr p15, 0, r0, c7, c10, 1 @ clean D entry
 add r0, r0, r2
 subs r1, r1, r2
 bhi 2b
 dsb ishst
 ret lr
.type cpu_v7_dcache_clean_area, %function; .size cpu_v7_dcache_clean_area, .-cpu_v7_dcache_clean_area


 .arch_extension sec
.globl cpu_v7_smc_switch_mm ; .align 0 ; cpu_v7_smc_switch_mm:
 stmfd sp!, {r0 - r3}
 movw r0, #:lower16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
 movt r0, #:upper16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
 smc #0
 ldmfd sp!, {r0 - r3}
 b cpu_v7_switch_mm
.type cpu_v7_smc_switch_mm, %function; .size cpu_v7_smc_switch_mm, .-cpu_v7_smc_switch_mm

...
...

 9998: orr r1, r1, #((0 << 0) | (1 << 6))|(1 << 1)|(1 << 5)|(1 << 3)
 .pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

This confused me for a bit, I think something like the following is happening.
.pushsection ".alt.smp.init"
// some instructions with Subtarget X
.popsection

.arch_extension sec or anything to change the subtarget
more stuff

.pushsection ".alt.smp.init"
// we are returning to .alt.smp.init but our subtarget is now X +sec
next instruction will start a new fragment
.popsection

It is getting late over here so I need to go home. Would jcai19 or yourself be able to investigate to confirm, check further? We would need to start a new MCDataFragment within .alt.smp.init for the new SubTarget, but I'd expect it all to happen before the 9997: in the failing case.

Making this work only on ".if" is IMO a non-starter, at least without understanding why. So I looked into what broke with llvm/test/MC/MachO/reloc-diff.s.

Basically, with LLVM's current representation, assuming that consecutive fragments are never moved with respect to each-other is invalid. In both ELF and Mach-O. In ELF, we have the ability to use numbered subsections that can insert new fragments between existing fragments. In Mach-O, fragments can effectively be turned into subsections with the 'subsections_via_symbols' directive. Committing this patch as-is would both be ugly and wrong, since we'd allow computing nonsense offsets -- even if we only restrict it to ".if".

I think it's a mistake that it works like that, and I'm going to spend a little bit of time to see if it'll be easy to fix this representational mistake in LLVM (in a separate patch), so that we have fragment lists which _are_ guaranteed to be kept exactly in the order they appear.

For this review, I have two requests:

  1. Undo all the ".if"-only hacks. (understanding that it will cause some tests to fail, for now).
  2. Fix the code to support arbitrary numbers of fragments between the symbols, of kinds MCFillFragment and MCDataFragment (and maybe others if they are fixed size -- I haven't looked through all of the kinds). Probably best would be to introduce a helper function to calculate the delta between two arbitrary symbols, given an optional layout (and either return an answer or a failure indication). This should make AttemptToFoldSymbolOffsetDifference significantly more straightforward.

I believe these examples should work, and will after making the latter change:

1:
.arch armv7a
nop
.arch armv4
nop
.arch armv7a
nop
.if . - 1b != 0
.word 0x12345678
.endif

2:
nop
.zero 0x10000
nop
.if . - 2b == 4
.word 0x12345678
.endif

Making this work only on ".if" is IMO a non-starter, at least without understanding why. So I looked into what broke with llvm/test/MC/MachO/reloc-diff.s.

Basically, with LLVM's current representation, assuming that consecutive fragments are never moved with respect to each-other is invalid. In both ELF and Mach-O. In ELF, we have the ability to use numbered subsections that can insert new fragments between existing fragments. In Mach-O, fragments can effectively be turned into subsections with the 'subsections_via_symbols' directive. Committing this patch as-is would both be ugly and wrong, since we'd allow computing nonsense offsets -- even if we only restrict it to ".if".

I think it's a mistake that it works like that, and I'm going to spend a little bit of time to see if it'll be easy to fix this representational mistake in LLVM (in a separate patch), so that we have fragment lists which _are_ guaranteed to be kept exactly in the order they appear.

For this review, I have two requests:

  1. Undo all the ".if"-only hacks. (understanding that it will cause some tests to fail, for now).
  2. Fix the code to support arbitrary numbers of fragments between the symbols, of kinds MCFillFragment and MCDataFragment (and maybe others if they are fixed size -- I haven't looked through all of the kinds). Probably best would be to introduce a helper function to calculate the delta between two arbitrary symbols, given an optional layout (and either return an answer or a failure indication). This should make AttemptToFoldSymbolOffsetDifference significantly more straightforward.

I believe these examples should work, and will after making the latter change:

1:
.arch armv7a
nop
.arch armv4
nop
.arch armv7a
nop
.if . - 1b != 0
.word 0x12345678
.endif

2:
nop
.zero 0x10000
nop
.if . - 2b == 4
.word 0x12345678
.endif

Thanks for the clarification. It will be great if we can remove the restriction to ".if". I will making changes accordingly while the representation of subsections are being changed.

jcai19 updated this revision to Diff 229225.Nov 13 2019, 9:04 PM

Support arbitrary numbers of fragments between the symbols of kind
MCDataFragment. Getting the size of MCFillFragments seems to be less
straightforward https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00287.

jcai19 added a comment.EditedAug 5 2020, 4:58 PM

@jyknight while working on a different issue, I happened to take a look at the implementation of subsection (MCSection::getSubsectionInsertionPoint) and from what I understood, fragments in subsections were always placed after fragments not in any subsection., and fragments in the same subsection (or not in any subsection) were always inserted in order and separate from fragments in other subsections. Is it safe to assume what between two fragments either not in any subsections or in the same subsection will be remain unchanged once they are inserted to the fragment list of a section? If so, can we resolve differences of two labels in fragments with such restriction?

jcai19 updated this revision to Diff 286172.Aug 17 2020, 4:13 PM

Limit the scope to fragments in the same subsection (fragments not in any subsections are placed in subsection 0). Fragments are inserted into each subsection in order and the difference of their offsets should be fixed once they are inserted. This is experimental as it breaks many tests. Will address the test failure if this idea is proven correct.

jcai19 retitled this revision from [MC] Parse .if conditions with symbols in consecutive MCDataFragements to [MC] Resolve the difference of symbols in consecutive MCDataFragements.Aug 17 2020, 4:19 PM
jcai19 edited the summary of this revision. (Show Details)
jcai19 updated this revision to Diff 286441.Aug 18 2020, 5:48 PM

(1) Fixed a bug while rebasing the patch last time. The failed unit tests dropped to 2 after the fix.
(2) Updated the 2 failed unit tests.

jcai19 updated this revision to Diff 286445.Aug 18 2020, 6:05 PM

Set the subsection number for fragments in subsecitons.

Worth mentioning that this is needed by (this assembly file does not assemble with the integrated assembler as of today):

make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 LLVM_IAS=1 defconfig arch/arm/mm/proc-v7.o
llvm/include/llvm/MC/MCFragment.h
78

Move below LayoutOrder. sizeof(MCDataFragment) will not change

Alternatively, disable the folding for Mach-O because this is too subtle (if getSubsectionViaSymbols())

llvm/lib/MC/MCExpr.cpp
629

Consider swapping then and else branches to avoid !

llvm/test/MC/ARM/directive_if_offset.s
1 ↗(On Diff #286445)

In this directory file-name.s is more common. What about directive-if-sub.s? sub is more meaningful than offset.

Add a llvm-mc -triple armv7a-linux-gnueabihf %s -o /dev/null 2>&1 test to show that -filetype=asm does not work (MCAssembler * is null so, but this is less of an issue)

llvm/test/MC/ARM/directive_if_offset_error.s
1 ↗(On Diff #286445)

See ELF/reloc-directive.s

You can use --defsym=ERR=1 to merge the tests into directive-if-sub.s

llvm/test/MC/ARM/thumb2_directive_if_offset_error.s
1 ↗(On Diff #286445)

ditto

jcai19 updated this revision to Diff 287545.Aug 24 2020, 6:47 PM

Address some of @MaskRay's comments.

jcai19 added a comment.EditedAug 24 2020, 6:55 PM
This comment has been deleted.
llvm/include/llvm/MC/MCFragment.h
78

Thanks for all the comments! Can you please elaborate this one? Are you suggesting to move SubsectionNumber right after LayoutOrder?

llvm/test/MC/ARM/directive_if_offset_error.s
1 ↗(On Diff #286445)

--defsym=ERR=1 does not seem to work if I move the code from this file into directive-if-subtraction.s, as the run commands still fail. Maybe I am missing something? Also directive-if-subtraction.s requires armv7a so this run command will fail once I make the move.

MaskRay added inline comments.Aug 27 2020, 10:43 AM
llvm/test/MC/ARM/directive_if_offset_error.s
1 ↗(On Diff #286445)

See ELF/reloc-directive.s for an example.

# RUN: not llvm-mc ....... | FileCheck %s --check-prefix=ERR
normal assembly

.ifdef ERR
# ERR: {{.*}}.s:[[#@LINE+1]]:10: error: expected comma
error line
.endif

I think it is clearer to place working and non-working examples in one file.

jcai19 updated this revision to Diff 288471.Aug 27 2020, 3:36 PM

Combine test cases into one file.

MaskRay added inline comments.Aug 27 2020, 4:09 PM
llvm/include/llvm/MC/MCFragment.h
78

SubsectionNumber looks too subtle. I'd hope we just remove the variable, and avoid the computation for Mach-O.

Happy to hear what @jyknight will say, and whether this is a reasonable (imperfect) approach. Personally I think it is mostly good if SubsectionNumber is removed.

MaskRay edited reviewers, added: psmith; removed: peter.smith.Aug 27 2020, 4:09 PM
llvm/include/llvm/MC/MCFragment.h
77

Second line needs three slashes and punctuation.

llvm/lib/MC/MCExpr.cpp
605

Could this be a static function accepting (const MCAssembler *Asm, const MCSymbol &SA, const MCSymbolRefExpr *&A, const MCSymbolRefExpr *&B), rather than a closure?

612

drop extra parens?

613–614

return FinalizeFolding();

638

Does the subexpression (SA.getOffset() - SB.getOffset()) change throughout the loop? If not, consider initializing Offset to that value.

639–640

return FinalizeFolding();

658–663

This can be 3 lines rather than four by swapping the condition:

if (... != ...)
  return
Offset ...

Probably can drop the extra parens around the cast, too.

669

Check spelling and punctuation here. :set spell in vim.

jcai19 updated this revision to Diff 288724.Aug 28 2020, 3:57 PM
jcai19 marked 6 inline comments as done.

Address @nickdesaulniers' comments.

llvm/include/llvm/MC/MCFragment.h
77

Punctuation. (Period at end of sentence in comment).

llvm/lib/MC/MCExpr.cpp
629

I agree; prefer:

if x:
  foo()
else:
  bar()

to:

if !x:
  bar()
else:
  foo()

if with a negated conditional is ok when there is only a then-clause. If there's an else-clause, then it's a code smell.

669

Punctuation (Period at end of sentence in comment).

jcai19 updated this revision to Diff 289062.Aug 31 2020, 7:22 PM
jcai19 marked an inline comment as done.

Address more comments.

psmith added a comment.Sep 1 2020, 2:47 PM

Thanks for the update and sorry to take so long to comment. I can't see anything immediately wrong and I think limiting this to fragments in the same subsection makes sense.

llvm/lib/MC/MCExpr.cpp
649

It would be good to have a comment here as we have Offset and getOffset() meaning two different things. IIUC getOffset() is really getOffsetWithinFragment(). Perhaps use Displacement instead of Offset as the accumulating variable name.

For example:
// Try to find a constant displacement from FA to FB, add the displacement between the offset in FA of SA and the offset in FB of SB.

jcai19 updated this revision to Diff 289320.Sep 1 2020, 4:47 PM

Address @psmith's comments.

nickdesaulniers accepted this revision.Sep 2 2020, 12:21 PM

With this patch applied, I no longer observe the error from https://github.com/ClangBuiltLinux/linux/issues/742 for 32b ARM, though I can't do a full kernel build with Clang's IA yet due to missing support for the adrl pseudo instruction. (https://github.com/ClangBuiltLinux/linux/issues/430, https://bugs.llvm.org/show_bug.cgi?id=24350) in order to boot test. I did boot test x86 (32b and 64b) and arm64 Linux kernels with this change just fine (using Clang's integrated assembler).

It looks like there may have been unresolved comments from @MaskRay . I'm also curious whether @jyknight or @psmith had parting thoughts? (@jcai19 maybe wait a week for their feedback?)

This revision is now accepted and ready to land.Sep 2 2020, 12:21 PM

I don't have any objections to the approval. Thanks for updating the comment.

jcai19 added a subscriber: nick.Sep 2 2020, 1:37 PM

It looks like there may have been unresolved comments from @MaskRay . I'm also curious whether @jyknight or @psmith had parting thoughts? (@jcai19 maybe wait a week for their feedback?)

@nick sounds good! Thanks for the verification.

I don't have any objections to the approval. Thanks for updating the comment.

Thanks @psmith.

nick removed a subscriber: nick.Sep 2 2020, 1:41 PM

With this patch applied, I no longer observe the error from https://github.com/ClangBuiltLinux/linux/issues/742 for 32b ARM, though I can't do a full kernel build with Clang's IA yet due to missing support for the adrl pseudo instruction. (https://github.com/ClangBuiltLinux/linux/issues/430, https://bugs.llvm.org/show_bug.cgi?id=24350) in order to boot test. I did boot test x86 (32b and 64b) and arm64 Linux kernels with this change just fine (using Clang's integrated assembler).

It looks like there may have been unresolved comments from @MaskRay . I'm also curious whether @jyknight or @psmith had parting thoughts? (@jcai19 maybe wait a week for their feedback?)

I am waiting on @jyknight's opinion about the potentially subtle unsigned SubsectionNumber = 0; (at the very least, if we want to keep it, it should be reordered as my comment says, and don't repeat the name of the variable)

jcai19 added a comment.Sep 2 2020, 3:44 PM

at the very least, if we want to keep it, it should be reordered as my comment says, and don't repeat the name of the variable

Hi @MaskRay, thanks for the comment. By reordering, do you mean moving the definition of SubsectionNumber right after LayoutOrder? I do not understand what difference that would make, would you care to explain a little bit more? Also what by not repeating the name of the variable, I assume you were referring to the comment right above its definition?

jcai19 added a comment.Sep 2 2020, 4:02 PM

at the very least, if we want to keep it, it should be reordered as my comment says, and don't repeat the name of the variable

Hi @MaskRay, thanks for the comment. By reordering, do you mean moving the definition of SubsectionNumber right after LayoutOrder? I do not understand what difference that would make, would you care to explain a little bit more? Also what by not repeating the name of the variable, I assume you were referring to the comment right above its definition?

NVM. I realized moving it after LayoutOrder would not introduce extra padding.

jcai19 updated this revision to Diff 289600.Sep 2 2020, 4:09 PM

Moved the definition of SubsectionNumber to avoid padding bytes. Also updated the comments.

With this patch applied, I no longer observe the error from https://github.com/ClangBuiltLinux/linux/issues/742 for 32b ARM, though I can't do a full kernel build with Clang's IA yet due to missing support for the adrl pseudo instruction. (https://github.com/ClangBuiltLinux/linux/issues/430, https://bugs.llvm.org/show_bug.cgi?id=24350) in order to boot test. I did boot test x86 (32b and 64b) and arm64 Linux kernels with this change just fine (using Clang's integrated assembler).

It looks like there may have been unresolved comments from @MaskRay . I'm also curious whether @jyknight or @psmith had parting thoughts? (@jcai19 maybe wait a week for their feedback?)

I am waiting on @jyknight's opinion about the potentially subtle unsigned SubsectionNumber = 0; (at the very least, if we want to keep it, it should be reordered as my comment says, and don't repeat the name of the variable)

@jyknight What do you think of MCFragment::SubsectionNumber ?