Page MenuHomePhabricator

[RFC] Checking inline assembly for validity
Needs ReviewPublic

Authored by olista01 on Nov 26 2018, 2:50 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

GCC-style inline assembly is notoriously hard to write correctly, because it is
the user's responsibility to tell the compiler about the requirements of the
assembly (inputs, output, modified registers, memory access), and getting this
wrong results in silently generating incorrect code. This is also dependent on
register allocation and scheduling decisions made by the compiler, so an inline
assembly statement may appear to work correctly, then silently break when
another change to the code or compiler upgrade causes those decisions to
change.

This patch tries to improve this situation by emitting diagnostics when the
instructions inside the inline assembly string do not match the operands to the
inline assembly statement. We can do this because we parse the assembly in the
same process as the compiler, and the MC layer has some knowledge of which
registers are read and written by an assembly instruction.

For example, this C code, which tries to add 3 integers together, looks OK at
first glance:

int add3(int a, int b, int c) {
  int x;
  asm volatile(
      "add %0, %1, %2; add %0, %0, %3"
    : "=r" (x)
    : "r" (a), "r" (b), "r" (c));
  return x;
}

However, the compiler is allowed to allocate x (the output operand) and c (an
input operand) to the same register, as it assumes that c will be read before x
is written. This code might happen to work correctly when first written, but a
later change (either in the source code or compiler) could cause register
allocation to change, and this will start silently generating the "wrong" code.

With this patch, the compiler emits this warning for the above code:

test.cpp:4:7: warning: read from an inline assembly operand after first
                       output operand written, suggest adding early-clobber
                       modifier to output operand [-Winline-asm]
      "add %0, %1, %2; add %0, %0, %3"
      ^
<inline asm>:1:30: note: instantiated into assembly here
        add r0, r0, r1; add r0, r0, r2
                                    ^
test.cpp:4:7: note: output operand written here
      "add %0, %1, %2; add %0, %0, %3"
      ^
<inline asm>:1:6: note: instantiated into assembly here
        add r0, r0, r1; add r0, r0, r2
            ^

The warning is a bit noisy because it prints both the original source code and
final assembly code, and the carets for the original source just point to the
start of the line, but that's a separate issue.

I've designed this to be independent of register allocation decisions, so the
above diagnostic is always reported, regardless of whether x and c were
allocated in the same register or not. This allows it to catch code which
currently works but might break in future.

The way this works is:

  • When the AsmParser reaches an INLINEASM MachineInstr, it creates an InlineAsmDataflowChecker object, which tracks all of the information we need to know about one inline assembly statement.
  • The AsmPrinter examines the operands to the MachineInstruction, and calls functions on the tracking object to record information about each operand to the inline assembly block, as provided by the user and relied on by the compiler's optimisations and code-generation.
  • While the AsmParser is generating the final assembly string (which involves expanding operand templates like "$0" into physical register names), it records the offset from the start of the (output) string at which each operand expansion appeared.
  • The table-generated assembly matcher is modified to record the index of the MCParsedAsmOperand which resulted in in the creation of each MCOperand. An MCParsedAsmOperand can create multiple MCOperands (for example, a memory operand with base and offset), but not the other way round, so this information is stored in the MCOperand.
  • When the AsmParser is running and a tracking object is present (it is only present if we are parsing inline assembly), it records in each ParsedOperand the indexes of the inline assembly statement operands which overlap with it. This is done using the source location of the just-parsed MCParsedAsmOperand and the string offsets recored in the tracking object by the AsmPrinter.
  • Finally, after an instruction has been completely parsed, the AsmParser calls into the tracking object with the final MCInst and list of MCParsedAsmOperands. With all of this information, we can match up inline assembly operands to the MCOperands that were created for them, and check that they match.

The reason that I'm posting this as an RFC is that it adds a lot of coupling
between different parts of the backend. Do people think this is an acceptable
cost for making a quite user-hostile feature a bit safer? If people agree that
this is worthwhile, there are some things I still need to do before the patch
is ready for a proper review:

  • Check the memory overhead (during regular compilation) of storing the parsed operand number in the MCOperand. If this is too high, it could be moved to a separate data structure only created when parsing inline assmebly.
  • There are three different concepts of "operand" here (inline assembly operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that these are a bit clearer.
  • For checks which depend on the order of instructions (for example, checking that a clobbered register is only read from if it has previously been written to), these could give false positives if there is control-flow in the assembly. We could check if an MCInst affects control flow, buffer these diagnostics until the end of the assembly block, and only emit them if there are no control-flow instructions.

Diff Detail

Repository
rL LLVM

Event Timeline

olista01 created this revision.Nov 26 2018, 2:50 AM
glandium added inline comments.
include/llvm/MC/MCInstrDesc.h
612

Where does variadicOpsAreDefs come from?

Assuming removing the variadicOpsAreDefs test completely (since I couldn't figure where it came from) still yields something that mostly works, it appears not to catch when the assembly writes to input operands. Examples of real code that had the problem:

olista01 marked an inline comment as done.Nov 27 2018, 3:46 AM

For the first example, it looks like we're missing the case where a memory instruction with writeback modifies the address register. I'll have a look and see if there's a way to fix that.

For the second example, I've only made the assembly parser modifications for ARM so far, so I would't expect this to work for X86 yet.

include/llvm/MC/MCInstrDesc.h
612

That's added by D54853.

Correction, the mpi_arm.c ones seem to be caught. However, warnings are also being emitted for assembly post substitution. Example:

/builds/worker/workspace/build/src/security/nss/lib/freebl/mpi/mpi_arm.c:31:10: warning: write to an input-only inline assembly operand [-Winline-asm]
        "ldr     r6, [%0], #4\n"
         ^
<inline asm>:5:13: note: instantiated into assembly here
ldr     r6, [r0], #4
            ^
/builds/worker/workspace/build/src/security/nss/lib/freebl/mpi/mpi_arm.c:31:10: warning: read from R0, which is not an inline assembly operand or clobber [-Winline-asm]
        "ldr     r6, [%0], #4\n"
         ^
<inline asm>:5:19: note: instantiated into assembly here
ldr     r6, [r0], #4
                  ^

It also seems there are false positives. Possibly because MC doesn't have all the right information about some instructions. For example:

/builds/worker/workspace/build/src/modules/freetype2/src/truetype/ttinterp.c:1261:31: warning: instruction reads from an output inline assembly operand (without having written to it first) [-Winline-asm]
      "smull  %1, %2, %4, %3\n\t"       /* (lo=%1,hi=%2) = a*b */
                              ^

The corresponding code is: https://hg.mozilla.org/try/file/8ce89b7f2cf06a5ad743c5a593502c484e73b5b7/modules/freetype2/src/truetype/ttinterp.c#l1261
Operands %3 and %4 *are* inputs, but it seems to be complaining about %2 which is indeed an output, but is not actually an input of the instruction.

I can provide a full log of the 4831 warnings marked -Winline-asm during a Firefox for ARM Android build, if you're interested.

mpi_arm.c:

  • For the warnings which are emitted, some register MCOperands are being matched against the wrong parsed operand.
  • For post-increment instructions, the assembly matcher is not setting the writeback register in the MCInst (leaving it as noreg), because it is tied to one part of a complex operand, which it doesn't support. The fix for this might be to make a similar change to rL209425 in the ARM backend, which would give other advantages (better diagnostics for memory operands).

ttinterp.c:

  • The caret on the "instantiated into assembly here" is the accurate one (pointing to the mov instruction after the smull), the mapping back to C/C++ source locations isn't currently accurate. Improving that is on my TODO list, but I haven't looked into it yet.
  • There are ~10 t2AsmPseudo records in the Thumb2 tablegen file which have every operand marked as input, this looks like a simple fix. I expect we'll uncover more issues like this, because we are using information which hasn't been used for anything in the MC library before.
olista01 updated this revision to Diff 175655.Nov 28 2018, 3:42 AM
  • Make the "cc" inline assembly clobber affect all status registers
  • Handle tied operands correctly in the AsmMatcher (fixes post-increment ldr/str instructions)
  • Fix outputs of t2AsmPseudo records (fixes Thumb2 mov with shifted right-hand operand)
  • Call transferInlineAsmOps in cases where we parse more than one operand at once

I can provide a full log of the 4831 warnings marked -Winline-asm during a Firefox for ARM Android build, if you're interested.

Yes, it would be helpful if you could send me that output (with the latest patch applied).

dmajor added a subscriber: dmajor.Nov 28 2018, 4:32 PM

I can provide a full log of the 4831 warnings marked -Winline-asm during a Firefox for ARM Android build, if you're interested.

Yes, it would be helpful if you could send me that output (with the latest patch applied).

I extracted the warnings from the log and processed them to link to the corresponding code:
https://glandium.org/files/llvm-D54891-log-20181129.html

There are much less warnings with the new patch than before. And there are some interesting things in there. Like cases where the assembly pushes a register value to the stack, does stuff with it, and restores it ; with no annotation, and I think it's right doing so.

(Note that the code that was built has both fixes I linked earlier reverted, which is why the mpi_arm.c code is full of warnings ; I should do another run with the fix re-applied.)