This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Use the mask() operator to initialize packed structs with pointers
ClosedPublic

Authored by ikudrin on Jun 10 2022, 8:27 AM.

Details

Summary

The current implementation assumes that all pointers used in the initialization of an aggregate are aligned according to the pointer size of the target; that might not be so if the object is packed. In that case, an array of .u8 should be used and pointers should be decorated with the mask() operator.

The operator was introduced in PTX ISA 7.1, so an error is issued if the case is detected for an earlier version.

Diff Detail

Event Timeline

ikudrin created this revision.Jun 10 2022, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2022, 8:27 AM
ikudrin requested review of this revision.Jun 10 2022, 8:27 AM
tra added inline comments.Jun 10 2022, 3:14 PM
llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1170–1171

What will happen if we have an aggregate with a mix of packed and unpacked fields and we only want to init with a pointer located in the unpacked part of it? Will something like this trigger the error?

%ti = type <{ i8, i32 }>
%to = type { i8, %ti, i32, void ()* }
@n = addrspace(1) global %to {
    i8 12,
    %ti <{ i8 56, i32 78 }>,
    i32 34,
    void()* @func
    }

It's one of the test cases below with another pointer added to non-packed top-level struct.

llvm/test/CodeGen/NVPTX/packed-aggr.ll
51

Interesting. Right now we're printing '0' for the function pointer, which is indeed not good at all. https://godbolt.org/z/KEexEar5o

ikudrin added inline comments.Jun 11 2022, 7:09 AM
llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1170–1171

Unfortunately, yes, that would trigger the error. On the other hand, supporting that case would require a deeper analysis of the initialization expression itself, not only the type, because there can be ptrtoint operators. I'm not sure if we need that complication, considering that CUDA 11.1 is almost 2 years old already.

llvm/test/CodeGen/NVPTX/packed-aggr.ll
51

That's true. Currently, the implementation does not expect pointers to be unaligned.

Hi @tra Artem, do you think the said complicated deeper analysis is profitable?

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1170–1171

Unfortunately, yes, that would trigger the error. On the other hand, supporting that case would require a deeper analysis of the initialization expression itself, not only the type, because there can be ptrtoint operators. I'm not sure if we need that complication, considering that CUDA 11.1 is almost 2 years old already.

tra added inline comments.Jul 11 2022, 1:11 PM
llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1170–1171

I think we need to fix this. I'm not convinced that recursively checking fields' 'packed'-ness makes sense here.

I think we only care whether the structure with the field we need to init is packed or not. If it's not packed, then per-field alignment guarantees still apply and that's all we need.
E.g. in the example above, it does not matter whether any of the fields itself are packed or not. The pointer field alignment is only affected by its containing aggregate itself.

Perhaps I'm missing something. If we do need recursive check, could you give me an example where that would make a difference?

ikudrin updated this revision to Diff 444648.Jul 14 2022, 7:14 AM
ikudrin edited the summary of this revision. (Show Details)

Rework the implementation so that the mask() operator is used only if it is indeed required.

tra added a comment.Jul 14 2022, 11:23 AM

Nice. I like this implementation better.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1177

We should probably have a test case triggering this.

Also, we may want to move the check into printInBytes, as otherwise it may be possible to invoke it when mask() is not supported.

1180

requires at least

1234–1235

It may be a bit easier to read this way:

bool isGenericPointer = PTy && PTy->getAddressSpace() == 0;
  if (EmitGeneric  && isGenericPointer && !isa<Function>(v)) {
     // wrap in generic();
 } else {
     ...
 }

Comment above would need to be updated, too.

1264

I'd add a comment that we're generating a per-byte mask operator here with an example illustrating it.
PTX's mask() is weird due to the fact that it actually has no mask in it, so an explanation would be useful here.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
84–87

I'd suggest rephrasing it as "all are aligned", which would closer match our intent here vs "none are non-aligned" we have now which is equivalent, but is a bit harder to read.

std::all_of( ... return pos % ptrSize == 0; )

ikudrin updated this revision to Diff 445021.Jul 15 2022, 9:30 AM
ikudrin marked 4 inline comments as done.

Thanks!

  • Add a comment about the mask() operator
  • Update the error message
  • std::none_of() -> std::all_of()
ikudrin updated this revision to Diff 445024.Jul 15 2022, 9:34 AM
  • 0xff -> 0xFF in the comment
llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1177

We should probably have a test case triggering this.

The first test in packed-aggr.ll checks the error.

Also, we may want to move the check into printInBytes, as otherwise it may be possible to invoke it when mask() is not supported.

That would require passing GVar and STI there, only to check the requirements and to print the error message. Not sure it's worth it.

1234–1235

Done in D129773

tra accepted this revision.Jul 15 2022, 11:43 AM

LGTM with few nits.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1272

Nit: splitting pos incrementing between the for loop and the body looks a bit odd.

It may be a bit better to increment pos where we actually produce output:
+=1 where we do byte-by-byte output and += ptrSize where we output whole pointer.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.h
85

Nit: we would use llvm::all_of(symbolPosInBuffer, ...) and make it a bit more concise.

llvm/test/CodeGen/NVPTX/packed-aggr.ll
4

I've missed this test case when I reviewed previous revision of this patch. ERR and RUN kind of blend together.
Maybe change it to ERROR to make it easier to see.
I'd also move it down to where the s1 is defined.

This revision is now accepted and ready to land.Jul 15 2022, 11:43 AM
ikudrin updated this revision to Diff 445105.Jul 15 2022, 12:37 PM
ikudrin marked 3 inline comments as done.

Thanks! I'm going to land that on Monday.

  • Fix nits
tra added a comment.Jul 20 2022, 10:37 AM

Looks like the patch got reverted and reapplied. Could you tell me what went wrong and what was changed to fix it?

Looks like the patch got reverted and reapplied. Could you tell me what went wrong and what was changed to fix it?

The test failed on BE bots. Direct reading u32 and u64 values through a C-style pointer cast made unexpected results, thus I had to replace them with support::endian::read32le() and read64le().