This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add support for vex, vex2, vex3, and evex for MASM
ClosedPublic

Authored by LiuChen3 on Oct 29 2020, 8:28 PM.

Details

Summary

For MASM syntax, the prefixes are not enclosed in braces.
The assembly code should like:

"evex vcvtps2pd xmm0, xmm1"

There are still some avx512 tests need to be improved with 'evex'
prefix. But I think it's better to discuss this syntax first.

Diff Detail

Event Timeline

LiuChen3 created this revision.Oct 29 2020, 8:28 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 29 2020, 8:28 PM
LiuChen3 requested review of this revision.Oct 29 2020, 8:28 PM
pengfei added inline comments.Oct 29 2020, 8:54 PM
clang/test/CodeGen/X86/ms-inline-asm-prefix.c
2

Maybe need // REQUIRES: x86-registered-target

2

You may need add att check too since you modified the att code.

2

Should it be avalible only when -fms-compatibility

epastor added inline comments.Oct 30 2020, 5:57 AM
clang/test/CodeGen/X86/ms-inline-asm-prefix.c
2

The triple is misspelled; it should be x86_64-pc-windows-msvc (the "n" in windows is missing)

2

A broader question: As written, this applies to anything in Intel syntax. Is this an Intel syntax feature, or a MASM feature?

LiuChen3 added inline comments.Nov 2 2020, 4:44 PM
clang/test/CodeGen/X86/ms-inline-asm-prefix.c
2

Thanks for your review. After checking with the people of MSVC, I found that prefix without braces is not intel syntax. Actually, we don't know if there any document says what the prefix should be. At least, gcc does have the "{}" in intel syntax, so does clang. We currently decide to only support parsing the prefix without MSVC.

LiuChen3 updated this revision to Diff 302769.Nov 4 2020, 12:56 AM
  1. Address comments;
  2. Only support parsing vex/vex2/vex3/evex prefix for MASM
LiuChen3 added inline comments.Nov 4 2020, 1:18 AM
clang/test/CodeGen/X86/ms-inline-asm-prefix.c
2

I am not sure. But I think -fasm-blocks, -fms-extensions also support MASM.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3068

I tried to limit to MASM. But I found that the 'isParsingMSInlineAsm()' is not accurate. And then I tried to transmit 'ParsingMSInlineAsm' information correctly in AsmPrinterInlineAsm.cpp (according to the '-fasm-blocks' option). But I was surprised to find that isParsingMSInlineAsm() is actually used as the argument of 'MatchingInlineAsm' in 'MatchAndEmitInstruction()'. This makes me confused. Should that 'MatchingInlineAsm' be 'MatchingMSInlineAsm' ?Is this MatchingInlineAsm only used by llvm-ml.
It difficult to limit this to MASM at the moment.

epastor added inline comments.Nov 4 2020, 5:38 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3068

llvm-ml attempts not to touch anything involving inline assembly so far. The signal that MasmParser.cpp is involved is Parser.isParsingMasm(). However... while I can't answer the majority of this without more research, I suspect you're correct that MatchingInlineAsm is misnamed. We need to check this, and if so, we should rename it to avoid confusion.

craig.topper added inline comments.Nov 4 2020, 8:34 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3068

MS inline assembly is parsed twice. Once by clang to find names of C/C++ variables. And again in the backend. GNU inline assembly is only parsed in the backend since variable names are bound explicitly and not referenced in the assembly text.

IsParsingInlineAsm is set during the clang parsing.

LiuChen3 added inline comments.Nov 4 2020, 5:15 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3068

Thanks. That's make sense.
So 'MatchingInlineAsm' in MatchAndEmitInstruction() more like 'MatchingMSInlineAsm' and can only be set when parsing the instructions first time. And the second time parser can not set 'setParsingMSInlineAsm(true)'. That's make difficult to limit the scope to MASM for now.

LiuChen3 updated this revision to Diff 304730.Nov 11 2020, 11:49 PM

Rebase.
Adding the '{}' to prefix when generate IR.

LiuChen3 added inline comments.Nov 11 2020, 11:53 PM
clang/lib/AST/Stmt.cpp
801

From X86AsmParser, the vex/evex prefix must be the begin of the instructions.

pengfei added inline comments.Nov 17 2020, 4:36 AM
clang/lib/AST/Stmt.cpp
795

Can we always assume the separator is \n\t?

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3083

You just need to check ForcedVEXEncoding != VEXEncoding_Default.

3084

Unused assignment. It may suppose to be used on line 3086.

LiuChen3 added inline comments.Nov 17 2020, 5:20 PM
clang/lib/AST/Stmt.cpp
795

I think so. From the code, we can see '\n\t' will be added to each end of statement:

case AOK_EndOfStatement:
      OS << "\n\t";
      break;
LiuChen3 updated this revision to Diff 305959.Nov 17 2020, 7:55 PM
  1. Check prefix, ignoring case
  2. Delete IsPrefix parameter, and delete 'break', so that we won't check prefix again. I am not sure if this is right. Att format can allow two prefix and using the last one as the finally encoding prefix. I think this may not be the original intention of the design.
  3. Change the test: checking the IR istead of checking the assembly.
  4. Made some format adjustments.
LiuChen3 marked an inline comment as done.Nov 17 2020, 9:03 PM
  1. Delete IsPrefix parameter, and delete 'break'

It should be 'continue'. Sorry for this mistake.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3083

I think this is better. Multi vex/evex prefix doesn't make sense.

3084

This would be used later. However, this should only be updated when there is prefix. I put it in wrong place.

  1. Delete IsPrefix parameter, and delete 'break', so that we won't check prefix again. I am not sure if this is right. Att format can allow two prefix and using the last one as the finally encoding prefix. I think this may not be the original intention of the design.

It allows more than two, right? like {vex}{vex2}{vex3} instruction. I think it should be a bug for att.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3069

Do you need to eat the prefix here?

LiuChen3 marked an inline comment as done.Nov 17 2020, 9:35 PM

It allows more than two, right? like {vex}{vex2}{vex3} instruction. I think it should be a bug for att.

Yes, My previous statement is incorrect, it should be ‘two more’. Thanks for your correction.
We might need another patch to fix it.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3069

No. The prefix has been eat.
For example: vex vcvtps2pd xmm0, xmm1 .
Current token is 'vex' and the rest is 'vcvtps2pd xmm0, xmm1'. 'vcvtps2pd' is the next token which will be eat in line 3082.

pengfei accepted this revision.Nov 17 2020, 9:39 PM

LGTM. Thanks.
You'd better wait one or two days to see if other people objects.

This revision is now accepted and ready to land.Nov 17 2020, 9:39 PM
This revision was landed with ongoing or failed builds.Nov 20 2020, 12:22 AM
This revision was automatically updated to reflect the committed changes.