This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Allow to make libcalls that are defined in the current module.
ClosedPublic

Authored by denzp on Jun 27 2017, 1:39 PM.

Details

Summary

The patch adds a possibility to make library calls.
An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available.

Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase.

Diff Detail

Repository
rL LLVM

Event Timeline

denzp created this revision.Jun 27 2017, 1:39 PM
jlebar edited edge metadata.Jun 27 2017, 1:53 PM

It's been a while since I've looked at this code, so forgive me, but isn't there a big file somewhere that lists which libcalls each target supports? Is LLVM going to be unhappy with us if sometimes NVPTX allows libcalls and sometimes it doesn't?

arsenm added a subscriber: arsenm.Jun 27 2017, 2:02 PM

This is something I've thought about doing before. Does this work with the TLI lib call simplification as is, and this is the only required codegen part?

lib/Target/NVPTX/NVPTXISelLowering.cpp
42–44

Can you move this to generic DAG code?

42–49

This is probably the wrong way to do this. You probably want to do a getMergeValues with the returned value and the entry chain

denzp added a comment.Jun 28 2017, 2:21 AM

It's been a while since I've looked at this code, so forgive me, but isn't there a big file somewhere that lists which libcalls each target supports? Is LLVM going to be unhappy with us if sometimes NVPTX allows libcalls and sometimes it doesn't?

Depending on "operation action" some instructions are legal and some are not for the target. Illegal instructions should be expanded into other (probably legal ones) or lowered to libcalls. For example, for i128 values %a = mul i128 %0, %1 can be expanded into sequence of mul.lo and mul.hi instructions, but %a = sdiv i128 %0, %1 can only be lowered into libcall to __divti3. And without patch it's completely impossible to perform sdiv i128. But now users are able to do it when they provide __divti3 implementation.

And also some targets has restrictions on libcalls, disabling some of them. But in our case, we don't have any runtime lib (don't we?) so it's up to a user to define the implementations.

denzp added a comment.Jun 28 2017, 2:42 AM

This is something I've thought about doing before. Does this work with the TLI lib call simplification as is, and this is the only required codegen part?

As far as I understood, we can make libcall either with:

And I both should work. I'm going to add one more test case for TargetLowering::makeLibCall path.

lib/Target/NVPTX/NVPTXISelLowering.cpp
42–44

It has quite specific for NVPTX logic, usually other backends don't have restrictions on presence of function definition in current module. Are there another use cases for this?

42–49

Thanks, it's a good idea and a right way. But... we are still losing reference during legalization:

Legalizing: t28: f64,ch = merge_values t25, t27
t28: f64,ch = merge_values t25, t27 ... replacing: t28: f64,ch = merge_values t25, t27
     with:      t25: f64,ch,glue = NVPTXISD::LoadParam<LD8[<unknown>]> t24, Constant:i32<1>, Constant:i32<0>, t24:1
      and:      t27: ch,glue = callseq_end t25:1, TargetConstant:i32<0>, TargetConstant:i32<1>, t25:2

Unfortunately, nobody is referencing t27 and it's being removed as a dead node.

tra added inline comments.Jul 5 2017, 2:54 PM
lib/Target/NVPTX/NVPTXISelLowering.cpp
42–43

I'd move it down to where isIndirectCall is used.

Also, I'd rephrase the comment along the lines of

Both indirect calls and libcalls have nullptr Func. In order to distinguish between them we must rely on the call site value which is valid for indirect calls but is always null for libcalls.

I wonder if there's any way to explicitly mark the call as libcall, so we don't have to rely on callsite for that.

42–49

I guess we could modify NVPTXISD::StoreRetval so it takes additional 'glue' argument and explicitly make it depend on glue returned by callseq_end. Perhaps in a separate patch.

47–48

Nit. these should be on the same line.

denzp updated this revision to Diff 178510.Dec 17 2018, 12:43 PM
denzp edited the summary of this revision. (Show Details)

My apologies, it has been a while since the revision was created. I've finally changed the approach - from a hack with keeping a reference to a CALLSEQ_END to a better one (at least by a personal feeling) with pseudo instructions to keep the nodes alive. The summary was changed to reflect the current approach. Also, the code is rebased on top of the fresh source tree.

denzp marked 2 inline comments as done.Dec 17 2018, 12:45 PM
arsenm added inline comments.Dec 18 2018, 1:30 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
8420

This needs to preserve the address space of the function

denzp updated this revision to Diff 178748.Dec 18 2018, 11:56 AM
denzp marked an inline comment as done.Dec 18 2018, 11:59 AM

@arsenm thanks for pointing out. Is this the right way to do it?
TLI->getPointerTy(getDataLayout(), Function->getAddressSpace());

tra added a comment.Dec 18 2018, 2:20 PM

You probably want to remove libcalls.patch from the patch diff.

lib/Target/NVPTX/NVPTXInstrInfo.td
2257–2258

If we ever get here, I think correct asm to generate here should be a move from one register to another.

If we do not expect ProxyReg to ever make it to this point, then we should make the error visible. E.g. fail to lower ProxyReg at all which will cause llvm error.

test/CodeGen/NVPTX/libcall-instruction.ll
2

Typo/grammar nits:
asse*r*tion
a*n* "Undefined..." (same issue in in the libcall-intrinsic.ll)

denzp updated this revision to Diff 178971.Dec 19 2018, 3:08 PM

@tra Thank you, I've addressed the mentioned issues.

You are right, the implementation is more reliable when ProxyReg produces a correct PTX assembly even when it's still present.

Also, I noticed missing function declarations for libcall implementors. I didn't find a common way how to dynamically add "usage" to implementors, so I ended up assigning nvptx-libcall-callee attribute and then forcefully emitting declarations in NVPTXAsmPrinter.

denzp marked 2 inline comments as done.Dec 19 2018, 3:08 PM
tra added a comment.Dec 19 2018, 3:50 PM

The tests you have only check the end result and do not directly verify the new functionality the patch adds.

Ideally, it would be great if we could verify that your changes that add and remove RegProxy do the right thing.
Could you check if it would make sense to add couple of MIR tests: https://llvm.org/docs/MIRLangRef.html#testing-individual-code-generation-passes

I can think of three interesting tests:

  • verify that RegProxy is inserted in the right places.
  • verify that RegProxy gets removed by NVPTXProxyRegErasure
  • verify that we correctly lower libcalls with and without NVPTXProxyRegErasure

Other than that the patch looks good to me.

lib/Target/NVPTX/NVPTXInstrInfo.td
2258

Couple of nits:

  • !strconcat accepts multiple arguments, so you only need one !strconcat to do the job.
  • There's also a # which is a very convenient shortcut for concatenating strings. "a" # "b" is the same as !strconcat("a", "b")

Either of those could be used here.

denzp updated this revision to Diff 179375.Dec 21 2018, 3:50 PM

Thanks, @tra that was a great idea!

I've added thorough PTX tests with and without the pass. And MIR test before and after the pass.

denzp marked an inline comment as done.Dec 21 2018, 3:50 PM
tra accepted this revision.Dec 21 2018, 3:59 PM

Nice. Thank you for adding the tests.
LGTM.

This revision is now accepted and ready to land.Dec 21 2018, 3:59 PM
This revision was automatically updated to reflect the committed changes.
test/CodeGen/NVPTX/proxy-reg-erasure-ptx.ll