This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add nocapture to pointer parameters of masked stores/loads
ClosedPublic

Authored by benmxwl-arm on Oct 11 2022, 3:25 AM.

Details

Summary

The lack of this attribute (particularly on the load intrinsics)
prevented InstCombine from optimizing away allocas and memcpys
for arrays that could be read directly from rodata.

Diff Detail

Event Timeline

benmxwl-arm created this revision.Oct 11 2022, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 3:25 AM
benmxwl-arm requested review of this revision.Oct 11 2022, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 3:25 AM
peterwaller-arm added a comment.EditedOct 11 2022, 3:36 AM

(Full disclosure: Ben is a new colleague of mine)

Welcome to LLVM! Thanks for your first patch. The change looks good to me, but you'll need to include unit tests that defend this behaviour before this can be accepted. This has the dual benefit of making it clear in the patch what this is trying to achieve and prevents someone from breaking the code you're trying to improve in the future. Tests are found in test/Transforms/InstCombine. There are a few files in that directory with the name 'alloca' in them, and you can probably add to one of those. Please take a look.

Adds a new test for attributes on the @llvm.masked.load/store intrinsics, along with updates previous tests to reflect the new changes.

The changes to test/Transforms/InstCombine/load-store-masked-constant-array.ll show the improved codegen.

Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
MattDevereau added inline comments.Oct 14 2022, 7:41 AM
llvm/include/llvm/IR/Intrinsics.td
1802–1803

Trim line to be less than 80 chars. Breaking the 80+ character limit per line isn't enforced in .td files but similar def's near here seem to follow it.

1834–1835

Put NoCapture<ArgIndex<1>>]>; on a new line

llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll
21–34

Do we not need an equivalent test for expandload and compressstore as we've added nocapture to those intrisics too? For example:

define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
  %1 = alloca [10 x i64]
  call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
  %2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
  call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
  ret void
}

Can you verify this test does not optimize away alloca and memcpy when nocapture is missing from the intrinsic definitions, and that it does optimize them away when nocapture is present from the intrisic definitions? Can you please also verify the correctness of optimizing alloca and memcpy away for this test?

benmxwl-arm added inline comments.Oct 17 2022, 4:09 AM
llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll
21–34

I get the same results for that test case (not optimized before, optimized after).

How would you like me to verify the correctness? The resulting code looks correct to me:

define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
  %1 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull @contant_int_array, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <10 x i64> zeroinitializer)
  call void @llvm.masked.compressstore.v10i64(<10 x i64> %1, ptr %ptr, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
  ret void
}

Assuming adding nocapture to these intrinsics is valid and the existing optimization is correct, I see no reason this change could generate invalid code.

MattDevereau added inline comments.Oct 17 2022, 5:03 AM
llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll
21–34

If you agree the test is correct then feel free to go ahead and add it.

Addresses some formatting issues in the tablegen, and adds a test case for expandload/compressstore.

benmxwl-arm marked 3 inline comments as done.Oct 17 2022, 5:32 AM

Looks good to me but it's probably wise to get approval from someone with more authority than me. @peterwaller-arm can you have a look please?

peterwaller-arm accepted this revision.Oct 18 2022, 5:00 AM

Looks good to me, with one minor suggestion.

llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll
2

Opaque pointers is the default, so you can drop this argument.

This revision is now accepted and ready to land.Oct 18 2022, 5:00 AM
This revision was automatically updated to reflect the committed changes.