This is an archive of the discontinued LLVM Phabricator instance.

Fix inline assembly that switches between ARM and Thumb modes
ClosedPublic

Authored by garious on Nov 22 2013, 12:37 PM.

Details

Summary
This patch restores the ARM mode if the user's inline assembly
does not.  In the object streamer, it ensures that instructions
following the inline assembly are encoded correctly and that
correct mapping symbols are emitted.  For the asm streamer, it
emits a .arm or .thumb directive.

This patch does not ensure that the inline assembly contains
the ADR instruction to switch modes at runtime.

The problem we need to solve is code like this:

  int foo(int a, int b) {
    int r = a + b;
    asm volatile(
        ".align 2     \n"
        ".arm         \n"
        "add r0,r0,r0 \n"
    : : "r"(r));
    return r+1;
  }

If we compile this function in thumb mode then the inline assembly
will switch to arm mode. We need to make sure that we switch back to
thumb mode after emitting the inline assembly or we will incorrectly
encode the instructions that follow (i.e. the assembly instructions
for return r+1).

Based on patch by David Peixotto

Diff Detail

Event Timeline

Hi Greg,

I'm assuming the STI already dealt with context, but no one cared to change this. If that's so, LGTM.

Would be better if you gave a bit more context on the test's CHECK-NEXT dump.

I'm starting to think that fixing this bug is a bad idea and that reporting an error might be more appropriate. Fixing the bug opens up a can of worms. Say the user's inline assembly doesn't return to thumb mode. Should the compiler inject the $t mapping symbol? Should it inject a ".code 16" directive? Should it inject a label and adr to switch back to thumb mode? Where do we draw the line?

Also, my test is fragile. That $a shouldn't be displayed by llvm-objdump (and isn't by GNU's objdump). Also, the CHECK-NEXT assertions are checking those bytes in the context of thumb instructions. Instead there should just be one CHECK that matches all 4 bytes on one line.

Thoughts?

garious updated this revision to Unknown Object (????).Dec 2 2013, 12:43 PM

The previous patch attempted to fix mode-switching in inline assembly, but the patch was incomplete. While it would generate correct machine code if compiled for Thumb, compiling for armv7 would leave the compiler in a broken state. If we want to support mode-switching, the streamer needs to restore the original mode after the inline assembly (and preferably not circumvent the type system with const_cast).

Hi Greg,

The two problems I mapped from our discussion were:

  1. The compiler must go back to its own state after inline assembly, and err if the user makes it hard.
  2. The compiler must detect changes in inline assembly, and record its own state, to be able to do so.

Forbidding state change altogether is a first approach on correctness, and I'm fine with this patch as it is. For the future, do you plan on implementing a more fine grained approach?

include/llvm/MC/MCTargetAsmParser.h
101 ↗(On Diff #5859)

I don't understand why you need this...

The compiler must go back to its own state after inline assembly, and err if the user makes it hard.

Agreed. It's easy to detect a missing .arm or .thumb directive. Not so obvious to me how to detect the missing adr instruction (or equivalent?). I think a good fix would be to error out if the compiler notices the missing directive (rather than inject it), under the assumption that the missing directive suggests the adr instruction is missing as well.

The compiler must detect changes in inline assembly, and record its own state, to be able to do so.

Agreed. SubtargetInfo's FeatureBits looks good for this.

Forbidding state change altogether is a first approach on correctness, and
I'm fine with this patch as it is. For the future, do you plan on implementing
a more fine grained approach?

I'm not sure. It's easy to implement but difficult to test. llvm-objdump needs to respect the ARM-ELF mapping symbols. I'll take a shot at it today. I'll report back in a few hours. If it doesn't go well, I think we should merge this patch, which is about as conservative as it gets.

include/llvm/MC/MCTargetAsmParser.h
101 ↗(On Diff #5859)

Because the name 'ParsingInlineAsm' was taken for MS-style assembly.

Renato/Jim, as Jim had mentioned earlier, trying to fully support switching ARM modes in inline asm requires some restructuring. I think it would also help to first fix llvm-objdump to switch modes in response to the ARM-ELF mapping symbols, so that we can write readable unit-tests. Therefore, as a stopgap, I'd like to submit this patch as-is. That okay?

Thanks,
Greg

grosbach added inline comments.Dec 10 2013, 12:52 PM
include/llvm/MC/MCTargetAsmParser.h
101 ↗(On Diff #5859)

Wouldn't this problem apply no matter what the syntax of the inline asm was in the user's source code?

Seems the flag should be generalized to be an enumeration of asm kind. Something like, "enum InlineAsmKind { ASM_RAW, ASM_MSINLINE, ASM_GNUINLINE };" except with better names. The various predicate checks would be changed to reference that.

Hi Greg,

I agree with your strategy. I'm happy when Jim is. :)

garious updated this revision to Unknown Object (????).Dec 11 2013, 10:41 AM

This patch includes the same tests as the previous revision. The only difference is that instead of adding a second boolean member variable to indicate we're parsing GNU Inline Asm, an enum was added. The functions that accessed the boolean have been updated to access the enum variable.

The X86 code that was touched in this patch was added by Chad Rosier. I've added him to the CC list.

And per Jim, the .arm/.thumb directives have been disabled for any type of inline assembly.

garious updated this revision to Unknown Object (????).Dec 12 2013, 1:36 PM

Same as the last patch, but includes some code cleanup. Per David Peixotto, renamed isParsingInlineAsm() to isParsingMSInlineAsm(), and added isParsingAnyInlineAsm().

While commenting on PR18231, I had a horrible realization. This patch will cause diagnostics to be generated when compiling with "-c" that will not be generated when compiling with "-S". That's a problem. There shouldn't be any diagnostics that depend on whether we're doing direct-to-object or going through a .s file first. When generating a .s file, the inline assembly is just passed through as a textual blob with no interpretation done.

This revision is broken - please ignore.

garious updated this revision to Unknown Object (????).Dec 12 2013, 2:20 PM

bang. make check-all now passes for llvm+clang

Hum, that's not good indeed... I'm not sure how to fix that, at least not
within the context of this patch.

Is there a technical reason why we don't parse the inline asm when
outputting asm? It seems like we should, at least to make sure it's well
formed, or it contains the same instruction set, to give users a better
error...

garious updated this revision to Unknown Object (????).Dec 12 2013, 3:23 PM

Taking another shot at fixing the integrated assembler instead of disabling mode switching entirely. This is the same solution that started this review, but in this revision, the compiler checks to ensure the inline assembly restored the ARM mode.

Hi Greg,

I think the simplest solution to this would be, as Jim said, to *always* revert the mode back to what it was before the inline asm, regardless of what the inlined piece does.

Renato, reverting the mode correctly is not too easy. It isn't enough to set the feature bits back to what they were. We'd need to detect if the .arm/.thumb directive is missing and if the adr and its label are missing. And for anything missing, inject it.

I think the latest patch here is an improvement over the current situation. Rather than generate broken machine code, it allows switching between thumb and arm modes and vice versa, and it detects when the user code failed to switch back.

I don't disagree that there's more that could be done here, but I think Step 1 is to fix the correctness bug with the least invasive change, and that's the aim of this patch.

Hi Greg,

As Jim said, this works for binary output:

if (OutStreamer.hasRawTextSupport()) {
  OutStreamer.EmitRawText(Str);
  return;
}

In the case of textual output, not only you're not checking for the error, but also you're not fixing anything. So, in that context, in half of the cases where this patch would be useful, it's not.

Jim's proposal was to change the FeatureBits to its original state (and I add, emit a warning with the new inline asm warning system if changed), so that, at least, the common case is caught.

I imagine there are cases (which would give anyone nightmares) for valid .arm/.thumb usage inside inline asm, but I think those cases are unstable, and a lot worse than the common case, so breaking them to fix this would make a bit of sense. I wouldn't cry over it, myself...

The (better) alternative would be to parse the inline ASM on both output formats, but only add the instructions to the streamer on the binary one. Parsing inline ASM without output (in ASM mode) would serve two purposes:

  1. Validation of inline ASM, as an early warning system when the user included the wrong instruction set, or added invalid instructions, etc.
  2. To identify changes that could impact further code generation, such as .code .text etc. and revert the changes before continue.

This should make use of the new warning call back mechanism.

Hi Greg,

I don’t follow which case you’re concerned about. It’s never correct for the inline assembly to exit in a different mode (arm vs. thumb) than how it entered, so it will always be correct to just reset the mode at the end.

-Jim

He Renato,

It’s both a bit more complicated and a bit simpler than that. We don’t need hasRawTextSupport() specialization or explicit touching of the sub target feature bits. We’ll be added a target hook that will be called in AsmPrinterInlineAsm.cpp right after the inline asm blob is parsed and emitted (EmitInlineAsm()). exitInlineAsm() or something like that. In that hook, we check if the current sub target mode matches the function attribute or not. If it doesn’t, emit a .thumb or .arm directive, whichever will get the assembler back to the right mode.

That is, we care whether the assembler and encoder are set up to handle the right instruction mode, not whether the executing code will be in the right mode when it gets there. The latter is not something we can realistically try to detect.

We very explicitly don’t want to add (yet?) parsing of the inline asm for the asm streamer. That’s an escape hatch for inline asm that uses constructs the integrated assembler can’t handle.

-Jim

garious updated this revision to Unknown Object (????).Dec 16 2013, 1:35 PM

This patch restores the ARM mode by [only] injecting any missing .arm/.thumb directives after parsing. It will ensure the mapping symbols are generated and that any code following the inline assembly is encoded correctly.

Jim, is this what you had in mind?

Hi Greg,

Couldn't you have used the first implementation you had to restore the feature bits?

Also, you need to make sure this reverts when the output is text, too (so after dumping raw text), and get a test to check for it.

rengolin requested changes to this revision.Dec 16 2013, 1:41 PM

Hi Renato,

Couldn't you have used the first implementation you had to restore the feature bits?

That would work, except that the object file would be missing the mapping symbol ($a/$t). objdump would print garbage after the inline assembly. Need that call to EmitAssemblerFlag(MCAF_Code16)

Also, you need to make sure this reverts when the output is text, too
(so after dumping raw text), and get a test to check for it.

As Jim pointed out, we can't implement this without closing our GCC escape hatch. When emitting raw text, we don't run the Asm parser.

-Greg

garious updated this revision to Unknown Object (????).Jan 15 2014, 3:43 PM

New in this patch:

  • adds .thumb/.arm directives after inline asm (because we don't parse it, and user might have switched modes and not switched back)
  • adds tests for inline asm that enters ARM or thumb mode and not going back
  • rebased on the latest code
rengolin accepted this revision.Jan 16 2014, 2:38 AM

Hi Greg,

Thanks for doing this, it looks good to me. Jim, is that what you had in mind?

I think we should add a check, on a later patch, to only emit the arm/thumb symbols in the object file if we have parsed a mode change. Maybe now just add a FIXME when calling finishInlineAsm() to remember where to add the feature.

cheers,
--renato

I've been chatting with David Peixotto about this patch. He came up with a way to implement the same features that shares more code and doesn't modify the asm parser. New patch coming soon!

garious updated this revision to Unknown Object (????).Jan 16 2014, 5:34 PM

This is a patch based on feedback from David Peixotto. It uses the same API to emit the inline asm footer when targeting either assembly or an object file.

Hi Greg,

Thanks for refining this, it definitely looks better (Thanks David!). Apart from my comment on the test, LGTM.

cheers,
--renato

test/CodeGen/ARM/inlineasm-switch-mode.ll
16

adding the instruction here will make it more clear, like others

grosbach accepted this revision.Jan 22 2014, 9:49 AM

LGTM. Thanks!

Thanks everyone. Committed under r199818

rengolin closed this revision.Jan 24 2014, 4:32 AM