This is an archive of the discontinued LLVM Phabricator instance.

Fix for NVPTX module asm regression
ClosedPublic

Authored by jdtatz on Jun 21 2020, 10:16 AM.

Details

Summary

When the new NVPTX backend was added in commit ae556d3ef72dfe5f40a337b7071f42b7bf5b66a4 the NVPTXAsmPrinter::doInitialization method did not call it's parent AsmPrinter::doInitialization, which would emit module-level inline asm if it existed.

Emitting of module-level inline asm was then added in commit d2bbdf05e0b88524226589d89ffb2bfdc53ef3c8 but this was done without calling AsmPrinter::doInitialization and instead fixed by copying the module-level asm emitting code into NVPTXAsmPrinter::doInitialization after calling NVPTXAsmPrinter::emitHeader

Calling AsmPrinter::doInitialization was then added to NVPTXAsmPrinter::doInitialization in commit 68f2218e1efdb74442987a2040876e2e41f7d90e but NVPTXAsmPrinter::emitHeader was called afterwards, which meant that any module-level inline asm was added before the header which results in invalid ptx.
Also the copied module-level inline asm emitting code was left in, so module-level inline asm was added to the final ptx file twice, once before the header during AsmPrinter::doInitialization in an invalid position and once again after the header during NVPTXAsmPrinter::doInitialization in the correct position.

I have fixed this error by removing the copied module-level inline asm emitting code and adding the NVPTXAsmPrinter::emitHeader call to a new NVPTXAsmPrinter instantiation of the virtual method AsmPrinter::emitStartOfAsmFile which was designed for this use case. I did not add a unit test as one was added in commit d2bbdf05e0b88524226589d89ffb2bfdc53ef3c8 but without calling ptxas -v on every emitted ptx file we can't not verify the correctness, so I'm not sure of the usefulness of the unit test other than to ensure it doesn't cause any internal llvm issues.

Diff Detail

Event Timeline

jdtatz created this revision.Jun 21 2020, 10:16 AM
jdtatz added a reviewer: tra.
tra added a comment.Jun 22 2020, 9:21 AM

Could you post a trivial example of where the module-level inline assembly is emitted before/after this patch?

Testing things that require ptxas would need to be done in the LLVM test-suite. CUDA buildbot is offline at the moment, but I'm working on getting a new one running soon.

Using the unit test

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx-nvidia-cuda"

module asm ".global .b32 val;"

define void @foo() {
  ret void
}

with llvm-10 invalid ptx is generated

                                        // Start of file scope inline assembly
.global .b32 val;

                                        // End of file scope inline assembly
//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 32

                                        // Start of file scope inline assembly
.global .b32 val;

                                        // End of file scope inline assembly
	// .globl	foo             // -- Begin function foo
                                        // @foo
.visible .func foo()
{


// %bb.0:
	ret;
                                        // -- End function
}

with this patch valid ptx is generated

//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 32

                                        // Start of file scope inline assembly
.global .b32 val;

                                        // End of file scope inline assembly
	// .globl	foo             // -- Begin function foo
                                        // @foo
.visible .func foo()
{


// %bb.0:
	ret;
                                        // -- End function
}
tra accepted this revision.Jun 23 2020, 3:51 PM

Thank you for providing the examples.

with llvm-10 invalid ptx is generated

                                        // Start of file scope inline assembly
.global .b32 val;

                                        // End of file scope inline assembly
//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 32

                                        // Start of file scope inline assembly
.global .b32 val;

Ouch. We've generated it twice, with one of the instances being in the wrong place. That's indeed not good.

with this patch valid ptx is generated

//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 32

                                        // Start of file scope inline assembly
.global .b32 val;

I like this much better. :-)

This revision is now accepted and ready to land.Jun 23 2020, 3:51 PM

I don't have write access, could you please commit it for me?

tra added a comment.Jun 24 2020, 10:06 AM

I did not add a unit test as one was added in commit d2bbdf05e0b88524226589d89ffb2bfdc53ef3c8 but without calling ptxas -v on every emitted ptx file we can't not verify the correctness

Before I land the test, we do need to fix test/CodeGen/NVPTX/module-inline-asm.ll as it did not detect the current issue.
The test currently only verifies the presence of the module-level inline assembly in the output, but it does not verify that it's in the right place or that it's only emitted once.

We heed few more checks that the inline assembly is only emitted after the initial PTX directives and before the functions.

tra updated this revision to Diff 273107.Jun 24 2020, 11:01 AM

Updated the test.

This revision was automatically updated to reflect the committed changes.