This is an archive of the discontinued LLVM Phabricator instance.

The issues with X86 prefixes: step 2
ClosedPublic

Authored by avt77 on Aug 29 2017, 6:46 AM.

Details

Summary

This patch includes everything we have in D36788 plus it covers PR19251. I was forced to refactor and to re-design the current implementation because it was not adopted for various use cases of prefixes. Now it should fix all known issues.

Diff Detail

Event Timeline

avt77 created this revision.Aug 29 2017, 6:46 AM
craig.topper added inline comments.Sep 1 2017, 10:40 PM
include/llvm/MC/MCInst.h
165

sence->sense

RKSimon added inline comments.Sep 4 2017, 10:17 AM
lib/Target/X86/Disassembler/X86Disassembler.cpp
243

Flags |= 1 ?

251

These hard coded values for the flag bits are going to be tricky to maintain. Perhaps an enum ?

enum OpPrefixFlag {
  OPF_OpSize = 1,
  OPF_AdSize = 2,
  OPF_REP = 4,
  OPF_REP = 8,
};

void clearFlags() { Flags = 0; }
void setFlag(OpPrefixFlag F) { Flags |= (unsigned)F; }
bool isFlagSet(OpPrefixFlag F) const { return !!(Flags & (unsigned)F); }
avt77 updated this revision to Diff 115822.Sep 19 2017, 3:57 AM

I fixed issues raised by Craig and Simon.

avt77 added a comment.Sep 25 2017, 1:30 AM

Craig,
You reviewed https://reviews.llvm.org/D36788 which was the first step of fixing issues with prefixes: could you review this new patch as well? In fact it's simply an extended version of D36788 and covers issues raised by echristo.

craig.topper edited edge metadata.Sep 25 2017, 10:45 AM

I'm not sure I can approve growing the size of MCInst. Though I can't see anyway around it. @rafael what do you think?

craig.topper requested changes to this revision.Sep 25 2017, 12:03 PM

The new test cases don't test anything because they aren't running FileCheck.

test/MC/Disassembler/X86/prefixes-i386.s
1

This needs to be named .txt not .s

You also need to pass the output to FileCheck

test/MC/Disassembler/X86/prefixes-x86_64.s
1

This file needs be named .txt not .s

You also need to pass the output to FileCheck

This revision now requires changes to proceed.Sep 25 2017, 12:03 PM
avt77 updated this revision to Diff 116672.Sep 26 2017, 8:51 AM
avt77 edited edge metadata.

I fixed tests mentioned by Craig. And about extension of MCInst: this opaque data (not simple flag) could be really useful for other components (not only for disassembler).

avt77 added a comment.Sep 27 2017, 4:37 AM

There was a email thread about the issues in this patch. To keep track of those emails I'm putting them here:

It was added when I’ve start poking around with prefixes, to implement the proper recognition of xaquire/xrelease (rL311309).
I can suggest some additional views to the matter at hand:
Enhancing prefix digestion by the parser is highly recommended – aforementioned FIXME note describes those issues I’ve found on a brief exploring, surly there’s more.
Currently it is emitted as a standalone instruction, which I don’t see much sense in.
IMHO, we should aggregate prefixes till we have consumed an ‘actual’ instruction, and then make queries about whether they form a legal combination, and I deem it to be the right course for the disassembler as well, unless we don’t care about tolerating nonsense in disassembly (how does others disassemblers handle such phenomena?)
Personally, I see prefixes as a part of a particular instruction, so at least on the concept level I’m in favor of ‘Flags’.
More generally, whether producing multiple MCInsts, using flag or whatever other approach – it’s technicalities.
Agreement should be first reached on what would be considered as a proper handling for both ends (parser, disassembler).
p.s. can you guyz please use Phab for further discussion? Hard (for me) to keep track on mail correspondence.

From: Andrew Tischenko [mailto:tishenandr@xenzu.com]
BTW, JFYI, I found the following comment in the source:

// FIXME:
// Enhace prefixes integrity robustness. for example, following forms
// are currently tolerated:
// repz repnz <insn>    ; GAS errors for the use of two similar prefixes
// lock addq %rax, %rbx ; Destination operand must be of memory type
// xacquire <insn>      ; xacquire must be accompanied by 'lock'

The approach with Flag will allow to implement it.
Andrew

On 27.09.2017 12:18, Andrew Tischenko wrote:

OK, I'll try to change the assembler properly but there are some questions:

    Should I do it in the same patch?
    Currently if we have:

repz repnz repe cmpsb

    then we produce with 'llvm-mc -triple x86_64-unknown-unknown -x86-asm-syntax=intel -show-encoding intel-syntax.s':

        rep                             # encoding: [0xf3]
        repne                           # encoding: [0xf2]
        rep                             # encoding: [0xf3]
        cmpsb   %es:(%rdi), (%rsi)      # encoding: [0xa6]

    but after the change we'll get only the following:

        rep                             # encoding: [0xf3]
        cmpsb   %es:(%rdi), (%rsi)      # encoding: [0xa6]

    Is it OK? (IMHO, Yes.)

    If YES, do we need any warnings here? (IMHO, No.)

Andrew

On 26.09.2017 22:31, Rafael Avila de Espindola wrote:

    The assembler and disassembler should use the same path.
    I would be OK with always producing 1 or N instructions, as long as both
    the assembler and disassembler do the same. That is, it is OK to have
    Flags, as long as the assembler uses that instead of creating a separate
    instruction for prefixes.
    It seems that allowing the disassembler to create multiple instructions
    would have the advantage of not needing Flags, but that is secondary
    IMHO.
    Cheers,
    Rafael

    Craig Topper <craig.topper@gmail.com> writes:

        Here's my understanding of what I think happens today.
        -For a very select few instructions if the AsmParser sees a repne/repe
        prefix it creates a special version of the instruction that has the REP
        bits set in TSFlags. For any other instruction it emits the repne/rep/repe
        as a separate MCInst.
        -For the disassembler if it sees a repne/repe byte at the start that it
        doesn't think goes with an instruction it will emit a MCInst containing
        just the REP.
        -If the disassembler encounters a repne/repe byte not at the start of the
        instruction that doesn't go with the instruction we drop it and don't print
        anything. The disassembler interface only allows us to return one
        instruction. So we can't return a separate repne/repe instruction and a
        real instruction from the same byte sequence. I don't believe the assembler
        can ever produce a byte sequence that hits this case, but that doesn't mean
        some binary couldn't contain that string of bytes created by hand. So this
        patch is trying to preserve the extra prefix information in the one MCInst
        we're allowed to emit. Maybe another option would be to allow creating
        multiple MCInsts from the disassembler?
        ~Craig

        On Tue, Sep 26, 2017 at 10:37 AM, Rafael Avila de Espindola <
        rafael.espindola@gmail.com> wrote:
            The question is why it is different for disassembler than for the
            assembler?
            How does the assembler handle trepne?
          Cheers,
           Rafael

            Andrew Tischenko <tishenandr@xenzu.com> writes:
                It is not a simple flag, it's some data. And this data could be useful
                for any other component because it's some opaque info which could be
                send via MCInst from one low level target component to another one.
                Without this (additional) data MCInst loosing (potentially very useful)
                info about the given instruction.

                Andrew

                On 25.09.2017 22:05, Rafael Avila de Espindola wrote:

                    Having a flag field that is used only on disassembly seems wrong.
                    Don't we support parsing our own output? I don't see trepne in any .s
                    test for example.

                    Cheers,
                    Rafael
avt77 added a comment.Sep 28 2017, 2:10 AM

Rafael, Craig,

After of some investigation of AsmParser I realised that assembler modification to support Flags for proper prefixes elaboration will require signature change of at least 2 virtual functions: ParseInstruction and MatchAndEmitInstruction. The reason of such change is very simple: ParseInstruction does real parsing (and as result could track prefixes) but does not deal with MCInst while MatchAndEmitInstruction create MCInst. This signature modifications will force massive changes in the sources that's why it will be difficult to review.

The question is: could we accept the current patch as it is (only disassembler is being supported) and follow up with a new patch which will include the corresponding assembler changes?
Or the big patch is OK?

avt77 added a comment.Sep 28 2017, 2:40 AM

Alternatively, I could add a new X86Operand type (e.g. Prefix): as result I will not change any signature but it will be X86-specific only. What's better from your point of view?

avt77 updated this revision to Diff 117520.Oct 3 2017, 7:25 AM

I re-implemented assembler in case of working with X86-prefixes. Now both X86-assembler and X86-disassembler use the new Flags field from MCInst. As result now it's possible to track several prefixes for one instr and now one prefix is not a separate instruction but only is the parameter of the one. I tried to keep the current tests unchanged where it's possible. And I did not extend/change any diagnostic related to prefixes: it should be done in the follow up patches.

avt77 updated this revision to Diff 118602.Oct 11 2017, 6:25 AM

I added a test to cover PR32809.
Guys, could you speed up the review? In fact you already reviewed everything except X86-ASM changes: I did it to have one path for both assembler and disassembler as we agreed in our previuos discussions.

craig.topper added inline comments.Oct 11 2017, 9:58 AM
lib/Target/X86/AsmParser/X86AsmParser.cpp
2865 ↗(On Diff #118602)

Use isPrefix()?

2866 ↗(On Diff #118602)

I'm not sure this isn't a use after free. pop_back_val is going to return a std::unique_ptr which you dereferenced, but I'm not convinced anything is keeping the std::unique_ptr alive.

lib/Target/X86/AsmParser/X86Operand.h
395 ↗(On Diff #118602)

isPrefix*

avt77 added inline comments.Oct 12 2017, 1:02 AM
lib/Target/X86/AsmParser/X86AsmParser.cpp
2866 ↗(On Diff #118602)

It seems that everything is OK here because

C++  Utilities library  Dynamic memory management std::unique_ptr

typename std::add_lvalue_reference<T>::type operator*() const; (1) (since C++11)
pointer operator->() const noexcept; (2) (since C++11)

operator* and operator-> provide access to the object owned by *this.
The behavior is undefined if get() == nullptr

Parameters
(none)

Return value

  1. Returns the object owned by *this, equivalent to *get().
  2. Returns a pointer to the object owned by *this, i.e. get().

Am I right?

avt77 updated this revision to Diff 118758.Oct 12 2017, 2:33 AM

I added usage of isPrefix() accordingly to Craig requirement.

avt77 updated this revision to Diff 118776.Oct 12 2017, 5:47 AM

I added tests covering PR21640. Now this patch covers PR7709, PR17697, PR19251, PR32809 and PR21640.

craig.topper added inline comments.Oct 12 2017, 11:50 AM
lib/Target/X86/AsmParser/X86AsmParser.cpp
2866 ↗(On Diff #118602)

The problem isn't with the operator*. It's about the ordering of when the destructor for the std::unqiue_ptr returned by pop_back_val is executed. I believe it will happen as soon as that line ends since it's an unnamed temporary. So the destructor will run and delete the object the unique_ptr is pointing at. But you're still holding a reference to that object. So it's a use after free.

I think you should assign to Prefixes while the object is still in the vector using .back(), and then once you're done with that just call pop_back() on the vector.

avt77 added inline comments.Oct 13 2017, 12:50 AM
lib/Target/X86/AsmParser/X86AsmParser.cpp
2866 ↗(On Diff #118602)

I'm not sure you're right because they say about "end of scope" to invoke destructor and here the "scope" should be "if" statement. But to be absolutely safe I'll do it. Tnx.

avt77 updated this revision to Diff 118890.Oct 13 2017, 1:42 AM

Safe implementation for std::unique_ptr usage was done (raised by Craig).

craig.topper accepted this revision.Oct 13 2017, 8:28 AM

Thanks for fixing that. Scope only applies to named objects, the unique_ptr here has no name. https://stackoverflow.com/questions/2298781/why-do-un-named-c-objects-destruct-before-the-scope-block-ends.

LGTM other than the one comment.

This revision is now accepted and ready to land.Oct 13 2017, 8:28 AM
avt77 closed this revision.Oct 19 2017, 2:33 AM

The patch was committed as rL315899.