This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] avoid crash disassembling unknown instruction
ClosedPublic

Authored by SjoerdMeijer on Jan 28 2020, 1:07 AM.

Details

Summary

llvm-obj dump can crash when it can't disassemble an instruction:

llvm-objdump: llvm/include/llvm/ADT/SmallVector.h:153: llvm::SmallVectorTemplateCommon::const_reference llvm::SmallVectorTemplateCommon<llvm::MCOperand, void>::operator[](llvm::SmallVectorTemplateCommon::size_type) const [T = llvm::MCOperand]: Assertion `idx < size()' failed.

This can happen when the source is compiled with:

clang -march=..+ext1+ext2

and disassembly is attempted with:

llvm-objdump -mattr=+ext1

I.e., it isn't given the same architecture extensions. I have tried adding a test case, by taking our proprietary disassembler and generate the assembly code for the offending assembly sequence, let llvm-mc assemble that, and then llvm-objdump disassemble it again:

llvm-mc -assemble  ... | llvm-objdump -disassemble ..

but I just can't reproduce it in this way; llvm-mc or llvm-objdump is perhaps behaving slightly differently, perhaps there's some state somewhere. I have visually inspected the code though: first of all, it no longer crashes, and also it looks okay (and the test suite passes).

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jan 28 2020, 1:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2020, 1:07 AM
Herald added a subscriber: rupprecht. · View Herald Transcript

You should be able to use yaml2obj tool to create an object for a test case.
I think you can just create a section .text with a content and place your instruction there:

Sections:
  - Name: .text
    Type: SHT_PROGBITS
    Content: "AABB" ## instruction

It's not clear to me how the fix is solving a problem. Could you explain more where the actual problem is, please? That would help identify what is important for testing this issue.

SjoerdMeijer added a comment.EditedJan 28 2020, 2:53 AM

Thanks for your speedy responses!

For more context, this is my problem, the dissassembler is unable to disassemble some/most instructions:

fca8: b8 bf                        	it	lt
fcaa: b0 ee 42 4b                       vmovlt.f64	d4, d2
fcae: b4                           	<unknown>
fcaf: ee 4c                        	ldr	r4, [pc, #952]
fcb1: 4b f1 ee 10                        adc	r0, r11, #15597806
fcb5: fa 33                        	adds	r3, #250
fcb7: fe 0c                        	lsrs	r6, r7, #19
fcb9: cb                           	<unknown> llvm-objdump: <crash>

At a point, it is just disassembling garbage, for example, here where the crash happens instruction with opcode cb here (just a byte, which is nonsense), and is feeding this to function evaluateBranch() here:

// Try to resolve the target of a call, tail call, etc. to a specific
// symbol.
if (MIA && (MIA->isCall(Inst) || MIA->isUnconditionalBranch(Inst) ||
            MIA->isConditionalBranch(Inst))) {
  uint64_t Target;
  if (MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target)) {

The garbage value maps to some existing/random opcode, it tries to get an instruction operand, but then crashes trying to do so because the MCInst hasn't been created.

I think I understand. Let me restate my understanding to check:

  1. A disassembler doesn't understand some part of the machine code and consequently generates an <unknown> response in the disassembly.
  2. The point at which it tries to carry on is incorrect (presumably because the amount of data it read for the operands or whatever wasn't correct).
  3. Consequently, something later is parsed incorrectly (because we're no longer parsing the real instructions), causing it to be treated as a call instruction or whatever, but due to the mis-parse, llvm-objdump tries to load non-existent operands and crashes.

Assuming that's correct, it seems like there are potentially two separate issues here. 1) The disassembler should have read the instruction right in the first place, assuming it was a valid instruction. 2) If it was an invalid instruction, leading to later mis-parses, there needs to be safer error checking (i.e. not assertions or crashes), which can handle things potentially being wrong.

One question from your example: if the cb byte isn't a valid instruction, why does it get treated as a call/branch instruction?

I think I understand. Let me restate my understanding to check:

  1. A disassembler doesn't understand some part of the machine code and consequently generates an <unknown> response in the disassembly.
  2. The point at which it tries to carry on is incorrect (presumably because the amount of data it read for the operands or whatever wasn't correct).
  3. Consequently, something later is parsed incorrectly (because we're no longer parsing the real instructions), causing it to be treated as a call instruction or whatever, but due to the mis-parse, llvm-objdump tries to load non-existent operands and crashes.

This is an excellent summary of my understanding of the problem(s).

Assuming that's correct, it seems like there are potentially two separate issues here. 1) The disassembler should have read the instruction right in the first place, assuming it was a valid instruction.

Agreed, but in practice difficult to achieve I think. When you don't give the disassembler the right architecture information, we can't really blame if for not being able to disassemble everything correctly. But the least thing we should achieve is to be a bit more resilient to this, and avoid a crash. But in my example, it looks like it is trying disassembling 1 byte instructions which is nonsense in the Thumb2 ISA which has 16-bit and 32-bit instructions. So I guess Size increment is wrong somewhere.

  1. If it was an invalid instruction, leading to later mis-parses, there needs to be safer error checking (i.e. not assertions or crashes), which can handle things potentially being wrong.

Agreed. This is essentially what this patch is proposing: I don't see the point of trying to evaluate a branch when disassembly has failed before that.

One question from your example: if the cb byte isn't a valid instruction, why does it get treated as a call/branch instruction?

I haven't checked the instructions encodings yet, but it is just a random value mapping to some opcode, a branch in this case.

Testing this should be possible as suggested by @grimar, by creating a small text section using yaml2obj with explicit instructions that mimic the reproducible. I'm guessing you might need to have an odd number of bytes, but I'm not sure. If that doesn't do it on its own, you may need to dig into the crash point to identify where exactly the crash is occurring, and what inputs are needed to trigger it.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1434

Let's add a comment here explaining why we don't just let it fall through to the following code. Something like "Continue early, if the disassembler failed to decode the instruction, so that llvm-objdump doesn't try to load other information for that instruction, which may be invalid or incomplete." Happy for alternative suggestions though.

Cool, cheers, and I will now look into yaml2obj (never used that before) and try to create a reproducer.

Now with a test case (and a comment)!

yaml2obj is a cool tool (I didn't know about but will be very useful)

jhenderson added inline comments.Jan 31 2020, 1:38 AM
llvm/test/tools/llvm-objdump/ARM/unknown-instr.test
2

To be clear, does this test result in a crash without your fix?

It would be a good idea to add a comment to the top of this test to explain what is going on, why you use the switches you do, and why you use the content you do.

6–7

Are either of these two instructions important for the crash, or can you get rid of them?

13–16

In newer tests, we usually try to avoid excessive whitespace as it makes things harder to read. I'd reduce it so there's only a single space between the longest key and it's value, and then line up all the other values as shown:

Class:   ELFCLASS32
Data:    ELFDATA2LSB
Type:    ET_REL
Machine: EM_ARM
18–20

Ditto, but this time adding spaces.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1434

Tip: when you address a comment, click the "Done" checkbox on the comment. This causes the colouring to be obviosuly different. If you do it after you've uploaded your diff, make sure to submit too.

Thanks for checking, and I have reduced the test case further:

  • removed 2 instructions, and it indeed crashes on this very first byte cb. This triggers the assert listed in the description.
  • I have also removed the mve target attribute from the command line, which is not necessary.

Added some extra comments to the test file.

jhenderson added inline comments.Jan 31 2020, 2:07 AM
llvm/test/tools/llvm-objdump/ARM/unknown-instr.test
2

I'd still like a comment explaining what this test is doing.

SjoerdMeijer marked an inline comment as done.Jan 31 2020, 2:09 AM
SjoerdMeijer added inline comments.
llvm/test/tools/llvm-objdump/ARM/unknown-instr.test
2

yep, forgot that, but did that in a follow up just before you added this comment.

jhenderson added inline comments.Jan 31 2020, 2:09 AM
llvm/test/tools/llvm-objdump/ARM/unknown-instr.test
2

Oops, you got in as I posted this :)

6

I'd change "doesn't crash" -> handles such instructions cleanly.

Use two '#' characters for comments (i.e. "## This is...") to make them more clearly stand out as comments instead of lit/FileCheck directives.

Addressed comments

This revision is now accepted and ready to land.Jan 31 2020, 3:23 AM

Thanks for reviewing

This revision was automatically updated to reflect the committed changes.