Page MenuHomePhabricator

[MC] Parse .if conditions with symbols in consecutive MCDataFragements
Needs ReviewPublic

Authored by jcai19 on Oct 24 2019, 3:03 PM.

Details

Summary

Parse .if conditions when comparing two symbol in consecutive MCDataFragments.
This is important for an idiom like "foo:instr; .if . - foo; instr; .endif"
(https://bugs.llvm.org/show_bug.cgi?id=43795).

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
peter.smith added inline comments.Oct 29 2019, 6:59 AM
llvm/lib/MC/MCExpr.cpp
575

I think it would be useful to add a comment explaining this line and what kind of expressions it is expected to support. It isn't intuitive given
how AttemptToFoldSymbolOffsetDifference is called.

llvm/test/MC/AsmParser/directive_if_offset.s
1 ↗(On Diff #226839)

The test fails when there is asm output from llvm-mc.

llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf directive_if_offset.s -o /dev/null
echo $?
0

However when using the assembler output, this fails:

llvm-mc --triple armv7a-linux-gnueabihf directive_if_offset.s -filetype=asm -o /dev/null
directive_if_offset.s:7:5: error: expected absolute expression
.if . - 9997b == 4 ;

It may be that there is no way to resolve this in the asm output as the data structures might not exist. It will be worth a check to see though. I think that differences between the asm and obj outputs of llvm-mv are frowned on, but I don't think that they are necessarily fatal if coming from a assembler file and not something generated by clang. This might be a problem if the .if exists in inline assembler, not that I'd recommend anyone do that.

10 ↗(On Diff #226839)

You could use llvm-objdump -d on the ELF output to test that the correct instruction had been generated. Normally we'd use llvm-mc -filetype=asm and use FileCheck on that but that isn't working in this case.

When I run check-llvm with this patch applied I get 7 failures:

LLVM :: DebugInfo/Mips/delay-slot.ll
LLVM :: DebugInfo/Mips/dsr-fixed-objects.ll
LLVM :: DebugInfo/Mips/dsr-non-fixed-objects.ll
LLVM :: DebugInfo/Mips/fn-call-line.ll
LLVM :: MC/AsmParser/directive_fill_2.s
LLVM :: MC/MachO/reloc-diff.s
LLVM :: MC/X86/expand-var.s

This is applied to master revision 12c9ffd108345f643df98dfa8653af1a4311ed86 and the tests don't fail with master. Is it possible your most recent changes have broken something? I'm testing a release build of LLVM with clang, assertions enabled.

There are also some test cases for other ISA's in https://bugs.llvm.org/show_bug.cgi?id=41825.

When I run check-llvm with this patch applied I get 7 failures:

LLVM :: DebugInfo/Mips/delay-slot.ll
LLVM :: DebugInfo/Mips/dsr-fixed-objects.ll
LLVM :: DebugInfo/Mips/dsr-non-fixed-objects.ll
LLVM :: DebugInfo/Mips/fn-call-line.ll
LLVM :: MC/AsmParser/directive_fill_2.s
LLVM :: MC/MachO/reloc-diff.s
LLVM :: MC/X86/expand-var.s

This is applied to master revision 12c9ffd108345f643df98dfa8653af1a4311ed86 and the tests don't fail with master. Is it possible your most recent changes have broken something? I'm testing a release build of LLVM with clang, assertions enabled.

Yes, I am aware of failure on LLVM :: MC/MachO/reloc-diff.s and working on it, but my build does not have assertions on so I haven't seen the other failures. Thanks for verifying, I will take a look on them.

jcai19 updated this revision to Diff 227537.Nov 1 2019, 2:26 PM

Made sure the patch pass check-clang and check-llvm.

jcai19 added a comment.Nov 1 2019, 2:28 PM

I'd like to see some more test cases for the error cases. Off the top of my head I can think of these cases:

// Align fragment in between MCDataFragments
9997: nop ;
      .align 4
      nop
.if . - 9997b == 4 ;
// Fill (I think) fragment in between MCDataFragments
9997: nop ;
      .space 4
      nop
.if . - 9997b == 4 ;
// Relaxable (in Thumb) fragment in between MCDataFragments
9997: nop ;
      b external
      nop
.if . - 9997b == 4 ;
// Constant pool between MCDataFragments,
9997:
      ldr r0,=0x12345678 ;
      .ltorg
      nop  
.if . - 9997b == 4 ;

There is also something called a MCCompactEncodingInstFragment that would cause this to fail, but I think that only exists as part of NaCl bundle locking, which I think isn't too much of a concern.

Thanks for providing these test cases! It seems the third and fourth case do not produce any errors as expected (even with arm gcc). Will dig more.

jcai19 added a comment.Nov 1 2019, 2:40 PM

When I run check-llvm with this patch applied I get 7 failures:

LLVM :: DebugInfo/Mips/delay-slot.ll
LLVM :: DebugInfo/Mips/dsr-fixed-objects.ll
LLVM :: DebugInfo/Mips/dsr-non-fixed-objects.ll
LLVM :: DebugInfo/Mips/fn-call-line.ll
LLVM :: MC/AsmParser/directive_fill_2.s
LLVM :: MC/MachO/reloc-diff.s
LLVM :: MC/X86/expand-var.s

This is applied to master revision 12c9ffd108345f643df98dfa8653af1a4311ed86 and the tests don't fail with master. Is it possible your most recent changes have broken something? I'm testing a release build of LLVM with clang, assertions enabled.

The latest patch should address these issues. It seems other than .if conditions, llvm::MCExpr::evaluateAsAbsolute is allowed to not immediately resolve the difference of two symbols in adjacent fragments, such as ".long _external_def - _local_def" in llvm/test/MC/MachO/reloc-diff.s. They will be resolved later when finalizing the layout of the object file. The newer version of the code essentially passes an additional flag to tell the function whether the substraction to evaluate is an if condition.

jcai19 added a comment.EditedNov 1 2019, 2:42 PM

There are also some test cases for other ISA's in https://bugs.llvm.org/show_bug.cgi?id=41825.

The cause of 41825 is different from the one this patch tries to solve. But their solution looks similar enough so I have added code to handle those as well.

jcai19 updated this revision to Diff 227542.Nov 1 2019, 2:45 PM

Fix format.

MaskRay added a subscriber: MaskRay.Nov 1 2019, 4:11 PM
jcai19 updated this revision to Diff 227549.Nov 1 2019, 4:21 PM

Update test cases and move their location.

jcai19 updated this revision to Diff 227552.Nov 1 2019, 4:38 PM
jcai19 marked 5 inline comments as done.

Update test cases.

jcai19 added inline comments.Nov 1 2019, 4:45 PM
llvm/test/MC/AsmParser/directive_if_offset.s
1 ↗(On Diff #226839)

I have moved the tests to MC/ARM directory.

10 ↗(On Diff #226839)

I did verify myself with -filetype=obj and llvm-objdump -disassemble, although I can't seem to trigger

if (Asm->isThumbFunc(&SA))
  Addend |= 1;

and therefore cannot the else branch here. Any thoughts on this? Thanks.

jcai19 updated this revision to Diff 227553.Nov 1 2019, 4:49 PM

Update tests.

I think that this needs some wider review for a couple of reasons. It looks like this is heading towards a special case evaluation just for .if. I'm not comfortable going too much further in the approval process as this is exceeding my knowledge of MC, and I'd like see some supporting opinions on whether this is the right thing to do. Will be worth adding some more reviewers, particularly someone familiar with Mach-O. You may need an RFC to the llvm-dev mailing list to draw some attention.

I'd also like to see if there is a way of implementing this without special casing .if. That is a pragmatic solution for a specific problem, but it does make the code harder to understand so it has a cost. Looking at the reloc-diff.s test case the commit log says:

MC/Mach-O: Use the SECTDIFF relocation type for (A - B + constant) where A is external.
 - I'm not sure why, but this is what 'as' does.

I've not been able to easily find a definition of what SECTDIFF is, but it appears to be a relocation that supports subtraction, I don't think that there is an equivalent in ELF. It maybe that the case can be resolved by not doing the folding for MachO.

some other specific comments.

  • Some of the examples I gave would only fail in Thumb as Arm instructions are all the same size, whereas some Thumb-2 instructions are assembled initially as the 2-byte narrow form, and are relaxed to the wider 4-byte form if the instruction is out of range. You'll need something like -triple=thumbv7a-linux-gnueabihf to see the errors. I'd not expect some of these to fail on GNU as (that is a 2-pass assembler).
  • If you can make an X86_64 test I recommend doing so as few contributors will have this optionally removed. Substantially more have ARM optionally removed, as do several buildbots.
llvm/include/llvm/MC/MCExpr.h
60

There is a comment in evaluateAsAbsolute

// Setting InSet causes us to absolutize differences across sections and that
// is what the MachO writer uses Addrs for.

I think it would be useful to have something similar for IsCond.

llvm/lib/MC/MCExpr.cpp
495

When there is more than one boolean parameter it can get difficult to track which is which, can you annotate the call sites with literal values with a comment? I've left a comment on the ones that I've noticed.

/* InSet */ true, /* IsCond */ false.

523

/* IsCond */ false.

579

If I've understood the code correctly I think we could be a bit more specific here:

// When there is no layout our ability to resolve differences between symbols is
// limited. In specific cases where the symbols are both defined in consecutive
// MCDataFragments the difference can be calculated. This is important for an
// idiom like foo:instr; .if . - foo; instr; .endif
// We cannot handle any kind of case where the difference may change due to
// layout.
581

It is difficult to see how it would be possible to resolve .if conditions at layout time in a single pass assembler. In theory the assembler could evaluate all conditional blocks and select between them at layout time, if such a layout could be converged on.

721

/* InSet */ false, /* IsCond */ false.

727

/* InSet */ true, /* IsCond */ false.

768

/* IsCond */ false.

768

We have IsCond passed in as a parameter to evaluateAsRelocatableImpl, so even if it is passed in true, we set it to false here? Is it important for the value to be false here? If so then it implies that IsCond might not be specific enough a name. If it just doesn't matter then can we pass in IsCond here?

798

/* IsCond */ false.

jcai19 updated this revision to Diff 227937.Nov 5 2019, 11:27 AM
jcai19 marked 11 inline comments as done.

Address some concerns.

llvm/lib/MC/MCExpr.cpp
575

This is inherited from the original code, except it was before the check for Layout originally. I am not exactly sure why this check is needed in the first place. In my opinion this is redundant with the check for Layout, as we can always calculate the differences of two symbols based on the order of the sections (MCAsmLayout::getSectionOrder ()) they are in and their offset within its own section respectively if the layout is provided., but I do need this check anyway for our case. Any thoughts on why this check is necessary?

581

Thanks for the clarification, although I am not sure I follow. The code looks iterative to me https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00785. I was thinking as the loop iterates and calls layoutOnce, we can relax (if needed) and calculate the sizes of the fragments before a .if statement and resolve the condition there. But I am not completely convinced by myself that it is doable. Also there some cases like the one below will create more complexity.

foo: jump to bar
...
.if . - foo = ${constant integer}
instr1
.else.
instr2
.endif
...
bar:

This creates a loop of dependency as depending on the instruction selected in the if-else block, the size of the jump instruction may change due to the number of bits it need to specify the offset, which in turn affects which instruction should be chosen.

768

Yes, thanks for the catching this. IsCond is never used here so I simply replaced it false. But passing in IsCond is a better choice.

jcai19 updated this revision to Diff 227980.Nov 5 2019, 3:59 PM

Add test cases.

jcai19 added a comment.Nov 5 2019, 4:23 PM

I think that this needs some wider review for a couple of reasons. It looks like this is heading towards a special case evaluation just for .if. I'm not comfortable going too much further in the approval process as this is exceeding my knowledge of MC, and I'd like see some supporting opinions on whether this is the right thing to do. Will be worth adding some more reviewers, particularly someone familiar with Mach-O. You may need an RFC to the llvm-dev mailing list to draw some attention.

I have sent an RFC and am waiting for comments.

I'd also like to see if there is a way of implementing this without special casing .if. That is a pragmatic solution for a specific problem, but it does make the code harder to understand so it has a cost. Looking at the reloc-diff.s test case the commit log says:

MC/Mach-O: Use the SECTDIFF relocation type for (A - B + constant) where A is external.
 - I'm not sure why, but this is what 'as' does.

I've not been able to easily find a definition of what SECTDIFF is, but it appears to be a relocation that supports subtraction, I don't think that there is an equivalent in ELF. It maybe that the case can be resolved by not doing the folding for MachO.

some other specific comments.

  • Some of the examples I gave would only fail in Thumb as Arm instructions are all the same size, whereas some Thumb-2 instructions are assembled initially as the 2-byte narrow form, and are relaxed to the wider 4-byte form if the instruction is out of range. You'll need something like -triple=thumbv7a-linux-gnueabihf to see the errors. I'd not expect some of these to fail on GNU as (that is a 2-pass assembler).

Thanks for the clarification. I have included all the four test cases.

  • If you can make an X86_64 test I recommend doing so as few contributors will have this optionally removed. Substantially more have ARM optionally removed, as do several buildbots.

Yes, I agree. Will try to make test cases for x64.

Thanks for the update. I'll wait a few days to see what comments we get. If there are none then I guess there aren't any strong objections.

To clarify the remarks about resolving .if at layout time. I think that there are two major obstacles.

  • MC assembles instructions once, if the condition in the .if is not satisfied the contents of the block aren't even parsed. For example, the following will assemble if the .if condition fails, but will fail to parse if the .if condition passes. We'd need to change MC to either reparse (effectively making it a multipass assembler), or to parse and remember all parts. This is doable but it isn't a small change.
.text
.if 0
You aren't parsing me are you?
.else
nop
.endif
  • The second problem, and not one unique to llvm-mc, is that it is possible to write a program that doesn't converge, something like (not testable as it needs relaxation and late evaluation of .if):
label:
 beq after // In thumb 2 this 2-byte branch will be relaxed to a 4-byte branch if out of range. 
 .if . - label == 2 
 .space 1024 * 1024 // sufficient to make after out of range
 .endif
after:
 nop

In pass 1, beq is 2-bytes in size, the .if passes, beq is then relaxed to 4-bytes, which would make the .if fail, which then makes beq 2-bytes ...
The interaction with relaxation could be fixed by making all size increases permanent. However I think it is possible to write multiple .if conditions that conflict. In Arm's old 2 - pass assembler (pass -1 find all sizes so layout is known, pass-2 encode instructions knowing layout), we had an error message when the equivalent of .if evaluated differently in subsequent passes. In summary, I think it would be a major change to MC to support something like late evaluation of .if in a reliable way.

nickdesaulniers added subscribers: sunfish, dschuff.
llvm/lib/MC/MCExpr.cpp
581

See also b/132538429.

In-order to evaluate the if statement truthfully, it will have to be done after (well, during, since the result could change the relaxation) relaxation, and LLVM MC is just not set up for this currently. Even then, you could probably construct an if statement that could create a paradox, and never converge on a valid relaxation.

llvm/lib/MC/MCExpr.cpp
486

You can move this declaration down closer to where it is used. No need to construct if the below conditional returns early.

jcai19 marked an inline comment as done.Nov 6 2019, 11:38 AM

Thanks for the update. I'll wait a few days to see what comments we get. If there are none then I guess there aren't any strong objections.

To clarify the remarks about resolving .if at layout time. I think that there are two major obstacles.

  • MC assembles instructions once, if the condition in the .if is not satisfied the contents of the block aren't even parsed. For example, the following will assemble if the .if condition fails, but will fail to parse if the .if condition passes. We'd need to change MC to either reparse (effectively making it a multipass assembler), or to parse and remember all parts. This is doable but it isn't a small change. ` .text .if 0 You aren't parsing me are you? .else nop .endif `
  • The second problem, and not one unique to llvm-mc, is that it is possible to write a program that doesn't converge, something like (not testable as it needs relaxation and late evaluation of .if): ` label: beq after In thumb 2 this 2-byte branch will be relaxed to a 4-byte branch if out of range. .if . - label == 2 .space 1024 * 1024 sufficient to make after out of range .endif after: nop ` In pass 1, beq is 2-bytes in size, the .if passes, beq is then relaxed to 4-bytes, which would make the .if fail, which then makes beq 2-bytes ... The interaction with relaxation could be fixed by making all size increases permanent. However I think it is possible to write multiple .if conditions that conflict. In Arm's old 2 - pass assembler (pass -1 find all sizes so layout is known, pass-2 encode instructions knowing layout), we had an error message when the equivalent of .if evaluated differently in subsequent passes. In summary, I think it would be a major change to MC to support something like late evaluation of .if in a reliable way.

Thanks for all the clarification and examples. I was wondering if there would be an easier way for late evaluate of .if condition so we could avoid the current implementation, which I totally agree is not straightforward and somehow ad-hoc. But I guess a major haul to MC seems to be the only alternative based on what you and Nick said, which is probably an overkill for the particular issue this patch tries to solve. However, this patch will not solve the second case you brought up, so how likely do you think we would encounter such cases, and should we consider the multiple-pass solution to future-proof these cases if they happen frequently enough, or maybe we could rewrite the assembly instead to avoid such complexity?

llvm/lib/MC/MCExpr.cpp
581

Yeah, thanks for the explanation. I wonder if GAS would be able to support all these paradoxes.

jcai19 retitled this revision from [MC] Calculate difference of symbols in two fragments when possible to [MC] Enhance parsing of .if conditions.Nov 9 2019, 10:42 AM
jcai19 edited the summary of this revision. (Show Details)
llvm/lib/MC/MCExpr.cpp
588

This might not be needed with D70062 as the !SA.isUnset() and !SB.isUnset() on line 568 will both evaluate true after it.

jcai19 updated this revision to Diff 228740.Nov 11 2019, 10:49 AM
jcai19 marked an inline comment as done.

Remove the code redundant to D70062.

jcai19 marked 2 inline comments as done.Nov 11 2019, 10:50 AM
jcai19 added inline comments.
llvm/lib/MC/MCExpr.cpp
588

Thanks for the clarification. I have adjusted my code accordingly.

jcai19 retitled this revision from [MC] Enhance parsing of .if conditions to [MC] Parse .if conditions with symbols in consecutive MCDataFragements.Nov 11 2019, 10:54 AM
jcai19 edited the summary of this revision. (Show Details)
jcai19 updated this revision to Diff 228746.Nov 11 2019, 11:55 AM
jcai19 marked an inline comment as done.
jcai19 edited the summary of this revision. (Show Details)

Fix test cases.

jcai19 updated this revision to Diff 228751.Nov 11 2019, 12:06 PM

Fix a typo.

jcai19 marked an inline comment as done.Nov 12 2019, 12:08 AM
jcai19 added inline comments.
llvm/test/MC/AsmParser/directive_if_offset.s
5 ↗(On Diff #226839)

This line is required to reproduce this issue, as it changes the subtarget and forces the creation of a new MCDataSegment (https://llvm.org/doxygen/MCObjectStreamer_8cpp_source.html#l00157). I found a similar directive for i386, .arch + cpu_type (https://sourceware.org/binutils/docs/as/i386_002dArch.html#i386_002dArch), although the parser does not support it. To have a similar test case on x86, it seems we would have to first support that directive. However, I could not find any uses of this directive in llvm or Linux kernel, so I assume it is not commonly used. Will keep the test case for arm for now.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.
    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

From the perspective of the linux kernel, is D76002 sufficient? For example if the Subtarget changing directives are used in such a way that they don't create new MCDataFragments in a sensitive location then we may not need this. For example the following will assemble with D76002

        .arch_extension sec // Outside of .text
        
        .text
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

As will:

        .text
        .arch_extension sec                
        nop
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

Could the examples in the kernel be altered to not require this? It does seem like we are writing a lot of code for a small number of easily resolved cases.

I note that it is possible to write a contrived example that this patch can't handle (needs --triple=armv8a-linux-gnu as crypto and crc are V8 extensions) although I'm not suggesting implementing support for it.

        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

if.s:11:5: error: expected absolute expression
.if . - 9997b == 0

llvm/lib/MC/MCExpr.cpp
582

With the new information coming from D70062 I think we need to make this comment more specific. For example:

When the Subtarget is changed a new MCDataFragment is created. This handles the case of 
foo: instr; .arch_extension ext; instr .if . - foo
llvm/test/MC/ARM/directive_if_offset.s
6

Can we add a comment to explain the importance of .arch_extension, such as:
// Create a new MCDataFragment due to Subtarget change

Thanks for the summary.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.

I'll follow up on this. There's likely a lot of code that can be dropped if that's the case.

    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

    From the perspective of the linux kernel, is D76002 sufficient?

I can still reproduce https://github.com/ClangBuiltLinux/linux/issues/742 with https://reviews.llvm.org/D70062 applied.

For example if the Subtarget changing directives are used in such a way that they don't create new MCDataFragments in a sensitive location then we may not need this. For example the following will assemble with D76002

        .arch_extension sec // Outside of .text
        
        .text
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

As will:

        .text
        .arch_extension sec                
        nop
9997:
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

Could the examples in the kernel be altered to not require this? It does seem like we are writing a lot of code for a small number of easily resolved cases.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

jcai19 added a comment.EditedNov 12 2019, 11:08 AM

I note that it is possible to write a contrived example that this patch can't handle (needs --triple=armv8a-linux-gnu as crypto and crc are V8 extensions) although I'm not suggesting implementing support for it.

        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif        .text

        nop
9997:
        .arch_extension crypto
        nop
        .arch_extension crc
        nop
.if . - 9997b == 0
// CHECK-NOT: error: expected absolute expression
orr r1, r1, #1
.else
orr r1, r1, #2
.endif

if.s:11:5: error: expected absolute expression
.if . - 9997b == 0

If we need to support cases like this, we can probably add a while loop and check if the fragments between two symbols are all MCDataFragments and sum their sizes up.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

This issue happens when the ALT_UP macro is expanded in files like arch/arm/mm/proc-v7.S. This is how I reproduce the code of interest
clang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -DASSEMBLY --target=arm-linux-gnueabihf -DLINUX_ARM_ARCH=7 -march=armv7-a -include asm/unified.h -S arch/arm/mm/proc-v7.S &> clang-proc-v7.s

Thanks for the summary.

To summarise where I think we are right now.

  • D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
  • This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
    • When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.

I'll follow up on this. There's likely a lot of code that can be dropped if that's the case.

    • When there is a Subtarget change between MCDataFragments.
  • In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
  • The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.

    From the perspective of the linux kernel, is D76002 sufficient?

I can still reproduce https://github.com/ClangBuiltLinux/linux/issues/742 with https://reviews.llvm.org/D70062 applied.

That's unfortunate. I've found out how to reproduce the original issue:

clang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.Sclang -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__ASSEMBLY__ --target=arm-linux-gnueabihf -D__LINUX_ARM_ARCH__=7 -march=armv7-a -include asm/unified.h -c -o /tmp/proc-v7.o arch/arm/mm/proc-v7.S

If I extract out just the small part of the test case it works even without 76002.

The case from the kernel is testing that a single instruction is 2B rather than 4B (narrow vs wide). See https://github.com/ClangBuiltLinux/linux/blob/de620fb99ef2bd52b2c5bc52656e89dcfc0e223a/arch/arm/include/asm/assembler.h#L255-L270.

I don't see any subarch directives there.

They are there, I preprocessed to get a rather large output file including:

.globl cpu_v7_dcache_clean_area ; .align 0 ; cpu_v7_dcache_clean_area:
 9998: nop @ MP extensions imply L1 PTW
 .equ up_b_offset, 1f - 9998b ; .pushsection ".alt.smp.init", "a" ; .long 9998b ; b . + up_b_offset ; .popsection
 ret lr
1: dcache_line_size r2, r3
2: mcr p15, 0, r0, c7, c10, 1 @ clean D entry
 add r0, r0, r2
 subs r1, r1, r2
 bhi 2b
 dsb ishst
 ret lr
.type cpu_v7_dcache_clean_area, %function; .size cpu_v7_dcache_clean_area, .-cpu_v7_dcache_clean_area


 .arch_extension sec
.globl cpu_v7_smc_switch_mm ; .align 0 ; cpu_v7_smc_switch_mm:
 stmfd sp!, {r0 - r3}
 movw r0, #:lower16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
 movt r0, #:upper16:(((1) << 31) | ((0) << 30) | (((0) & 0x3F) << 24) | ((0x8000) & 0xFFFF))
 smc #0
 ldmfd sp!, {r0 - r3}
 b cpu_v7_switch_mm
.type cpu_v7_smc_switch_mm, %function; .size cpu_v7_smc_switch_mm, .-cpu_v7_smc_switch_mm

...
...

 9998: orr r1, r1, #((0 << 0) | (1 << 6))|(1 << 1)|(1 << 5)|(1 << 3)
 .pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

This confused me for a bit, I think something like the following is happening.
.pushsection ".alt.smp.init"
// some instructions with Subtarget X
.popsection

.arch_extension sec or anything to change the subtarget
more stuff

.pushsection ".alt.smp.init"
// we are returning to .alt.smp.init but our subtarget is now X +sec
next instruction will start a new fragment
.popsection

It is getting late over here so I need to go home. Would jcai19 or yourself be able to investigate to confirm, check further? We would need to start a new MCDataFragment within .alt.smp.init for the new SubTarget, but I'd expect it all to happen before the 9997: in the failing case.

Making this work only on ".if" is IMO a non-starter, at least without understanding why. So I looked into what broke with llvm/test/MC/MachO/reloc-diff.s.

Basically, with LLVM's current representation, assuming that consecutive fragments are never moved with respect to each-other is invalid. In both ELF and Mach-O. In ELF, we have the ability to use numbered subsections that can insert new fragments between existing fragments. In Mach-O, fragments can effectively be turned into subsections with the 'subsections_via_symbols' directive. Committing this patch as-is would both be ugly and wrong, since we'd allow computing nonsense offsets -- even if we only restrict it to ".if".

I think it's a mistake that it works like that, and I'm going to spend a little bit of time to see if it'll be easy to fix this representational mistake in LLVM (in a separate patch), so that we have fragment lists which _are_ guaranteed to be kept exactly in the order they appear.

For this review, I have two requests:

  1. Undo all the ".if"-only hacks. (understanding that it will cause some tests to fail, for now).
  2. Fix the code to support arbitrary numbers of fragments between the symbols, of kinds MCFillFragment and MCDataFragment (and maybe others if they are fixed size -- I haven't looked through all of the kinds). Probably best would be to introduce a helper function to calculate the delta between two arbitrary symbols, given an optional layout (and either return an answer or a failure indication). This should make AttemptToFoldSymbolOffsetDifference significantly more straightforward.

I believe these examples should work, and will after making the latter change:

1:
.arch armv7a
nop
.arch armv4
nop
.arch armv7a
nop
.if . - 1b != 0
.word 0x12345678
.endif

2:
nop
.zero 0x10000
nop
.if . - 2b == 4
.word 0x12345678
.endif

Making this work only on ".if" is IMO a non-starter, at least without understanding why. So I looked into what broke with llvm/test/MC/MachO/reloc-diff.s.

Basically, with LLVM's current representation, assuming that consecutive fragments are never moved with respect to each-other is invalid. In both ELF and Mach-O. In ELF, we have the ability to use numbered subsections that can insert new fragments between existing fragments. In Mach-O, fragments can effectively be turned into subsections with the 'subsections_via_symbols' directive. Committing this patch as-is would both be ugly and wrong, since we'd allow computing nonsense offsets -- even if we only restrict it to ".if".

I think it's a mistake that it works like that, and I'm going to spend a little bit of time to see if it'll be easy to fix this representational mistake in LLVM (in a separate patch), so that we have fragment lists which _are_ guaranteed to be kept exactly in the order they appear.

For this review, I have two requests:

  1. Undo all the ".if"-only hacks. (understanding that it will cause some tests to fail, for now).
  2. Fix the code to support arbitrary numbers of fragments between the symbols, of kinds MCFillFragment and MCDataFragment (and maybe others if they are fixed size -- I haven't looked through all of the kinds). Probably best would be to introduce a helper function to calculate the delta between two arbitrary symbols, given an optional layout (and either return an answer or a failure indication). This should make AttemptToFoldSymbolOffsetDifference significantly more straightforward.

    I believe these examples should work, and will after making the latter change: ` 1: .arch armv7a nop .arch armv4 nop .arch armv7a nop .if . - 1b != 0 .word 0x12345678 .endif

    2: nop .zero 0x10000 nop .if . - 2b == 4 .word 0x12345678 .endif `

Thanks for the clarification. It will be great if we can remove the restriction to ".if". I will making changes accordingly while the representation of subsections are being changed.

jcai19 updated this revision to Diff 229225.Nov 13 2019, 9:04 PM

Support arbitrary numbers of fragments between the symbols of kind
MCDataFragment. Getting the size of MCFillFragments seems to be less
straightforward https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00287.