This is an archive of the discontinued LLVM Phabricator instance.

RFC: Implementing new mechanism for hard register operands to inline asm as a constraint.
Needs ReviewPublic

Authored by anirudhp on Jun 29 2021, 12:18 PM.

Details

Summary
  • Relevant RFCs posted here

https://lists.llvm.org/pipermail/llvm-dev/2021-June/151370.html
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html

  • This is put up as an RFC patch to get feedback about the introduction of a new inline asm constraint which supports hard register operands
  • This is mostly a clang change, since the LLVM IR for the inline assembly already supports the {...} syntax which the backend recognizes. This change merely completes the loop in terms of introducing a user facing constraint which maps to it.

The following design decisions were taken for this patch:

For the Sema side:

  • We validate the "{" constraint using a two-phase validation approach. Firstly, we check if there is a target dependent implementation to handle the {....} constraint. If it fails, then we move on to a target agnostic check where we parse and validate the {....} constraint.
  • Why do we do this? Well, there are some targets which already process the {...} as a user facing constraint. For example the AMDGPU target. It supports syntax of the form {register-name} as well as {register-name[...]}. Moving this implementation to the target agnostic side seems to set a precedent, that we can keep extending the target agnostic implementation based on new cases for certain targets, which would be better served of moving it to the respective target.
  • In terms of the target agnostic validation, we simply check for the following syntax {.*}. The parsed out content within the two curly braces is checked to see whether its a "valid GCC register".

For the Clang CodeGen side:

  • Most of the work is done in the AddVariableConstraints function in CGStmt.cpp, which is responsible for emitting the LLVM IR corresponding to the usage of an actual register that the backend can use. Coincidentally, the LLVM IR is also {...}. As mentioned above, this is essentially mapping the LLVM Inline Assembly IR back to a user facing inline asm constraint.
  • Within this function we add in logic to check if the constraint is of the form [&]{...} in addition to the "register asm" construct.
  • A scenario where it was applicable to apply both the "register asm" construct and the "hard register inline asm constraint" was diagnosed as an unsupported error because there's no way the compiler will know which register the user meant. The safest option here is to error out explicitly, and put onus back on the user.
  • To achieve this, and refactor it in a nice way, most of the logic pertaining to the "register asm" construct has been moved into a separate function called ShouldApplyRegisterVariableConstraint which deduces whether to apply the "register asm" construct or not.
  • Furthermore, the GCCReg field is set with the "Register" only if the register is validated to be a "valid GCC Register" type. To me it doesn't make a lot of sense to set GCC Reg to something that might not necessarily be a "valid GCC register" as defined by respective targets. Would we have a case where we could have a {.*} constraint where the contents inside the curly brace is not a valid register? Yes. For example, The x86 target supports the "@cca" constraint, which is validated on the Sema side as a valid constraint, and before processing is converted to "{@cca}" (via the convertAsmConstraint function. "@cca" is not a valid "GCC register name". So in these case, we'll simply emit the constraint without setting GCCReg (with the assumption that the respective backends deal with it appropriately)

Tests:

  • Various tests were updated to account for the new behaviour.
  • I added a few SystemZ tests because we work on the Z backend, but I have no issues adding testing for multiple targets.
  • There were a few mentions of "waiting" for the GCC implementation of the same to land before the Clang side lands. As mentioned above, the intent to implement it on the GCC side has already been put forward via the RFC. I have no issues "parking" this implementation until its ready to be merged in. However, it might be good to hash out any open questions/concerns in the interim.

Diff Detail

Event Timeline

anirudhp created this revision.Jun 29 2021, 12:18 PM
anirudhp requested review of this revision.Jun 29 2021, 12:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 12:18 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
anirudhp updated this revision to Diff 355332.Jun 29 2021, 12:22 PM

The code looks fine but it would be good to see some docs along with it. We're currently missing docs on inline assembly entirely and the GCC ones are somewhat... opaque when it comes to describing how constraints work.

The code looks fine but it would be good to see some docs along with it. We're currently missing docs on inline assembly entirely and the GCC ones are somewhat... opaque when it comes to describing how constraints work.

Thank you for your feedback! By docs do you mind updating/adding some information to the existing LLVM docs(like the langref https://llvm.org/docs/LangRef.html for example), or more comments to the code?

This is great.

unsigned long foo(unsigned long addr, unsigned long a0,
                  unsigned long a1, unsigned long a2,
                  unsigned long a3, unsigned long a4,
                  unsigned long a5) {
  unsigned long result asm("rax");
  unsigned long b2 asm("rdx") = a2;
  unsigned long b3 asm("rcx") = a3;
  unsigned long b4 asm("r8") = a4;
  unsigned long b5 asm("r9") = a5;
  asm("call *%1" : "=r" (result) : "{rax}"(addr), "{rdi}"(a0), "{rsi}"(a1), "r"(b2), "r"(b3), "r"(b4), "r"(b5));
  return result;
}

this compiles to`%0 = tail call i64 asm "call *$1", "=r,{r{ax}x},{r{dx}i},{rsi},r,r,r,r,~{dirflag},~{fpsr},~{flags}"(i64 %addr, i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5) #1, !srcloc !3`
(note {r{ax}x},{r{dx}i}) which will cause a backend failure error: couldn't allocate input reg for constraint '{r{dx}'.
Can you investigate it?

For example the AMDGPU target. It supports syntax of the form {register-name} as well as {register-name[...]}.

CC @arsenm @rampitec for AMDGPU thoughts.

This is great.

unsigned long foo(unsigned long addr, unsigned long a0,
                  unsigned long a1, unsigned long a2,
                  unsigned long a3, unsigned long a4,
                  unsigned long a5) {
  unsigned long result asm("rax");
  unsigned long b2 asm("rdx") = a2;
  unsigned long b3 asm("rcx") = a3;
  unsigned long b4 asm("r8") = a4;
  unsigned long b5 asm("r9") = a5;
  asm("call *%1" : "=r" (result) : "{rax}"(addr), "{rdi}"(a0), "{rsi}"(a1), "r"(b2), "r"(b3), "r"(b4), "r"(b5));
  return result;
}

this compiles to`%0 = tail call i64 asm "call *$1", "=r,{r{ax}x},{r{dx}i},{rsi},r,r,r,r,~{dirflag},~{fpsr},~{flags}"(i64 %addr, i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, i64 %a5) #1, !srcloc !3`
(note {r{ax}x},{r{dx}i}) which will cause a backend failure error: couldn't allocate input reg for constraint '{r{dx}'.
Can you investigate it?

Definitely!

I'm assuming this is the X86 target from the usage of the registers in the example. So the issue here seems to be that before a constraint is emitted into the IR, the SimplifyConstraint (in CGStmt.cpp) /TargetInfo::ConvertConstraint (Target overridden) function(s) is/are called to "convert" the constraint into a simpler form if it exists. So, in this particular case, {rax}, gets "simplified" to {r{ax}} because:

{ -> no simplification rule exists -> simply emit it
r -> no simplification rule exists -> simply emit it
a -> simplified to {ax}            -> {ax} is emitted
x -> no simplification rule exists -> simply emit it
} -> no simplification rule exists -> simply emit it

Thanks for bringing up this point because I missed it.

Looking into it in more detail, I think we can just forego the "simplification"/"conversion" of the constraint while emitting, if we have the [&]{.*} form, since this already maps to the lower level LLVM inline assembly IR which is responsible for telling the backend to allocate a specific register. All validation is already performed in the Sema stage, and/or in the AddVariableConstraints function. So what can be done imo, is to simply early return in the SimplifyConstraint function in CGStmt.cpp, when you have the constraint of the form [&]{.*}. What do you think?

anirudhp updated this revision to Diff 356961.Jul 7 2021, 8:06 AM
  • Disable constraint simplification when you already have a constraint of the form {...}. Constraint simplification is usually done character by character, with different targets having different implementations.
  • Furthermore, a constraint of the form {...} already maps to the LLVM inline assembly IR that tells the backend to allocate a suitable physical register.
anirudhp updated this revision to Diff 356968.Jul 7 2021, 8:12 AM
  • Something went wrong with the previous time updating the diff. I'm not too sure, but I'm just doing it again, and this time the it looks a lot better.

@MaskRay Could you please look at the latest changeset, I have added your example as a separate test case for the x86 target.

This code doesn't handle multiple alternatives in a constraint.

E.g. "={eax}{ebx}" or "={eax}{ebx},m".

See the GCC docs for the C-level syntax
https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html#Multi-Alternative
and LLVM IR docs for the IR syntax:
https://llvm.org/docs/LangRef.html#constraint-codes

LLVM doesn't handle alternatives very well in the backend, but Clang at least should parse and properly generate LLVM asm strings for these cases I think.

The code looks fine but it would be good to see some docs along with it. We're currently missing docs on inline assembly entirely and the GCC ones are somewhat... opaque when it comes to describing how constraints work.

Thank you for your feedback! By docs do you mind updating/adding some information to the existing LLVM docs(like the langref https://llvm.org/docs/LangRef.html for example), or more comments to the code?

I meant user-facing clang docs. This is not an IR change, so it does not belong in LangRef, but the only reference to inline assembly in clang's documentation is a reference to the GCC docs (which are almost incomprehensible in general because they were very x86-specific and were then tweaked a bit to be portable, and specifically don't mention this feature). If we are adding a new user-facing feature, we need to provide user-facing documentation for it. Ideally this would provide complete documentation of inline assembly supported by clang, but at least we should document this feature as an extension.

This code doesn't handle multiple alternatives in a constraint.

E.g. "={eax}{ebx}" or "={eax}{ebx},m".

See the GCC docs for the C-level syntax
https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html#Multi-Alternative
and LLVM IR docs for the IR syntax:
https://llvm.org/docs/LangRef.html#constraint-codes

LLVM doesn't handle alternatives very well in the backend, but Clang at least should parse and properly generate LLVM asm strings for these cases I think.

You're right, when it came to the {...} constraint, initially I purposely errored out when there were multiple constraints after {.*}. To me, it didn't make sense to issue another constraint when you're already being specific enough by asking the backend to use a specific register. For example, the AMDGPU target, currently errors out when you have something like "{reg-name}a". Ideally if its an option, I'd like to make it even more strict to ensure that there is no other constraint applied with {.*}. Ie. if a user specifies a hard register constraint, only emit that, otherwise diagnose as an error, because a hard register constraint is quite specific to begin with. Maybe this could be documented as a special feature (a standalone constraint, can't be used as a multi-alternative constraint) since this is something new? (From the backend's perspective, is it a valid scenario to try to allocate two registers for an operand? will the last one occurring be the one allocated? what happens if the user wanted the "first" register?)

If not, I have no issue with changing it to emit multiple constraints in the IR, but I'd like to get your thoughts on the above first.

What's the status of this? Did the GCC proposal go anywhere? I'd be happy to see this move forward if you're also pushing it on the GCC side.

clang/lib/CodeGen/CGStmt.cpp
2077

I don't think the FIXME seems useful. I don't see any reason why anyone should validly use both at the same time.

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2022, 8:22 AM
Herald added a subscriber: StephenFan. · View Herald Transcript