This is an archive of the discontinued LLVM Phabricator instance.

Adding support for missing variations of X86 string related instructions
ClosedPublic

Authored by myatsina on Nov 19 2015, 9:13 AM.

Details

Summary

The following are legal according to X86 spec:

ins mem, DX
outs DX, mem
lods mem
stos mem
scas mem
cmps mem, mem
movs mem, mem

Diff Detail

Repository
rL LLVM

Event Timeline

myatsina updated this revision to Diff 40657.Nov 19 2015, 9:13 AM
myatsina retitled this revision from to Adding support for missing variations of X86 string related instructions.
myatsina updated this object.
myatsina added a reviewer: rnk.
myatsina set the repository for this revision to rL LLVM.
myatsina added a subscriber: llvm-commits.
rnk edited edge metadata.Nov 23 2015, 2:14 PM

Can you add tests to cover the new error paths you are adding?

test/MC/X86/intel-syntax.s
740–741 ↗(On Diff #40657)

I'm confused. Why are these two assembly instructions equivalent? Ditto for the rest, they don't look equivalent.

myatsina added a comment.EditedNov 24 2015, 7:08 AM
In D14827#295382, @rnk wrote:

Can you add tests to cover the new error paths you are adding?

Do you mean the "mismatching source and destination index registers" error?
If so, then this code is just a refactoring of the old logic that used "doSrcDstMatch" and returned this error if there was no match.
It should be covered by existent tests, but I can add one more just in case.

test/MC/X86/intel-syntax.s
740–741 ↗(On Diff #40657)

If you look at the X86 spec of these instructions:

INS m8, DX - Input byte from I/O port specified in DX into memory location specified in ES:(E)DI or RDI
which is actually the same as the definition of the "insb" instruction.
The mem operand has no meaning other than distinguishing if it's insb, insw or insl (in this case it's mem8, so it should be insb).

"insb" and "ins m8, DX" are equivalent and should map to the same instruction.

Does this answer your question?

rnk added a comment.Nov 24 2015, 1:36 PM
In D14827#295382, @rnk wrote:

Can you add tests to cover the new error paths you are adding?

Do you mean the "mismatching source and destination index registers" error?
If so, then this code is just a refactoring of the old logic that used "doSrcDstMatch" and returned this error if there was no match.
It should be covered by existent tests, but I can add one more just in case.

Ah, you're right. Nevermind.

rnk added inline comments.Nov 24 2015, 2:15 PM
lib/Target/X86/AsmParser/X86AsmParser.cpp
1057–1058 ↗(On Diff #40657)

Please use a static_cast here

test/MC/X86/intel-syntax.s
740 ↗(On Diff #40657)

You should test what happens with different pointer sizes. Since this is a 64-bit test, can you add a test like this:

ins byte ptr [rax], dx

GAS assembles that to this:

insb   (%dx),%es:(%rdi)

And then again in a separate 32-bit test file, you should do something like:

insb byte ptr [ax], dx
insb byte ptr [eax], dx
// CHECK: insb   (%dx),%es:(%di)
// CHECK: insb   (%dx),%es:(%edi)
740–741 ↗(On Diff #40657)

OK, so the register eax is essentially discarded and is only used to size the implicit memory operand. binutils 'as' issues a warning in this case, but MSVC does not. What does Intel's assembler do? I would like it if we issued a warning that EDI or ESI was expected in place of the user's register.

myatsina updated this revision to Diff 41409.Nov 30 2015, 10:00 AM
myatsina edited edge metadata.

Added tests as asked

myatsina added inline comments.Nov 30 2015, 10:03 AM
test/MC/X86/intel-syntax.s
740 ↗(On Diff #41409)

Added test cases in index-operations.s
I've added only for insw, I believe these cases with the case here have enough coverage

740–741 ↗(On Diff #40657)

Intel assembler does not issue a warning, same as MSVC.
I think we shouldn't issue a warning either because according to x86 spec this is a completely legal instruction which has the semantics of looking at ESI/EDI.
The spec does not have any restrictions about the mem operand and does not "expect" ESI/EDI.

rnk added a comment.Nov 30 2015, 11:18 AM

Craig, can you give us a second opinion here?

These string instructions need a memory operand size, but the encodings only support [esi]/[edi] as a memory operand. If the users asks for ins byte ptr [eax], dx and we assemble it to ins byte ptr es:[edi], dx, it seems highly likely that the user made a mistake.

I'm trying to insist that we should either error or warn when the instruction that the user asked for is not encodable.

silvas added a subscriber: silvas.Nov 30 2015, 8:03 PM
In D14827#298742, @rnk wrote:

Craig, can you give us a second opinion here?

These string instructions need a memory operand size, but the encodings only support [esi]/[edi] as a memory operand. If the users asks for ins byte ptr [eax], dx and we assemble it to ins byte ptr es:[edi], dx, it seems highly likely that the user made a mistake.

I'm trying to insist that we should either error or warn when the instruction that the user asked for is not encodable.

The intel manual even has a paragraph about this being misleading:

"""
At the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no- operands” form. The explicit-operands form (specified with the INS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source operand must be “DX,” and the destination operand should be a symbol that indicates the size of the I/O port and the destination address. This explicit-operands form is provided to allow documentation; *however, note that the documentation provided by this form can be misleading*. That is, the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, which must be loaded correctly before the INS instruction is executed.

"""

(emphasis mine)

Sean, if I understand correctly, you, as well as Reid, support producing a warning, though the spec explicitly mentions that the memory operand is ignored, and though MSVC and ICC do not produce such warnings, right?
If so, how about some kind of warning like "Memory operand ignored, assuming ES:(R|E)SI" or something of that sort?

myatsina updated this revision to Diff 42666.Dec 13 2015, 8:22 AM

Added warning regarding the fact that these instructions use (R|E)SI or (R|E)DI instead of the specified memory operand

I've added the warning as you suggested, can you please review the new version?

rnk accepted this revision.Jan 13 2016, 11:21 AM
rnk edited edge metadata.

lgtm

This revision is now accepted and ready to land.Jan 13 2016, 11:21 AM
This revision was automatically updated to reflect the committed changes.
sberg added a subscriber: sberg.Jan 20 2016, 5:54 AM

Building LibreOffice (which has some asm blocks) started to fail now with

error: memory operand is only for determining the size, ES:(R|E)SI will be used for the location [-Werror,-Winline-asm]
<inline asm>:2:10: note: instantiated into assembly here

movsd   (%rax), %xmm0

Committed a fix in revision 258312.
Please let me know if you encounter additional issues.

sberg added a comment.Jan 20 2016, 7:04 AM

r258312 does not make that -Werror,-Winline-asm go away for me

yes you are right, I should not produce the warning in your case, working on a fix.