This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Emit .actual_access metadata
ClosedPublic

Authored by cfang on Aug 8 2023, 3:58 PM.

Details

Summary

Emit .actual_access metadata for the deduced argument access qualifier,
and .access for kernel_arg_access_qual.

Diff Detail

Event Timeline

cfang created this revision.Aug 8 2023, 3:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 3:58 PM
cfang requested review of this revision.Aug 8 2023, 3:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 3:58 PM
Herald added a subscriber: wdng. · View Herald Transcript
cfang added a reviewer: Restricted Project.Aug 9 2023, 10:07 AM
arsenm added inline comments.Aug 9 2023, 10:46 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

This looks like lost test coverage, need to test the access of the metadata

cfang added inline comments.Aug 9 2023, 11:25 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

Under V3, if the access is default, it won't print out "default", (will not print out the access qualifier at all).

But I am curious because "noalias readonly %in", why the access is not "readonly"? Does it mean the original access qualifier got lost?

arsenm added inline comments.Aug 9 2023, 11:26 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

the point of this is introspection of the source program, not the actual properties. if someone defines something as readwrite but is only ever read, the optimizer can conclude it's readonly even though it wasn't declared that way

cfang added inline comments.Aug 9 2023, 11:41 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

Under V3 or above, it is designed NOT to print out the access qualifier if it is "default".
Do you want to CHECK-NOT for :access" for the default?

arsenm added inline comments.Aug 9 2023, 11:44 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
2 ↗(On Diff #548385)

From the test name, was this supposed to be inferred? Are there non-introspection uses for the access metadata?

26 ↗(On Diff #548385)

-NEXT checks are safer than -NOT

cfang updated this revision to Diff 548730.Aug 9 2023, 12:16 PM

Update the LIT tests with kernel_arg_access_qual specified to "read_only: for the first argument.

cfang added inline comments.Aug 9 2023, 12:21 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
2 ↗(On Diff #548385)

I am not sure about the actual uses. But apparently the conformance test get_kernel_arg_info checks that.
Do you want to change the name (remove deduce? ) of these two tests?

arsenm added a reviewer: kzhuravl.
arsenm added a reviewer: rampitec.

The first iteration of this test in D32091 suggests this actually does something, so it seems unfortunate if we're limited by introspection tests

rampitec added inline comments.Aug 9 2023, 12:52 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

Right, this can be deduced and had quite positive performance impact on some kernels.

cfang added inline comments.Aug 9 2023, 3:45 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

It seems we are using the same AccQual field for two purposes: 1) optimization, 2) get_kernel_arg_info.AccessQualifier, and
the second one is expected from getMetadata("kernel_arg_access_qual")

rampitec added inline comments.Aug 9 2023, 3:53 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

This is the same thing, one is because developer marked it (and language allowed this marking), another because we were able to prove it from compiler. Net impact is still the same, RT may omit copying memory before every dispatch.

cfang added inline comments.Aug 9 2023, 4:07 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

We are seeing a conformance test failure with test_api get_kernel_arg_info.
The test expects access qualifier to be access_none for any kernels other than image and pipes.
Do you think the checks from this test are not reasonable?

cfang added inline comments.Aug 9 2023, 4:09 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

I meant if the language allows us to change, it should not check for the original ones.

rampitec added inline comments.Aug 9 2023, 4:11 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

Sigh. I do not see why the same qualifier cannot apply to other types of memory. Readonly is readonly, and the optimization is useful. I understand that the standard did not say it is allowed, but conformance did pass for a long time this optimization is there.

yaxunl added inline comments.Aug 9 2023, 9:43 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

There is a kernel arg metadata “.actual_access” which may be used instead:

“.actual_access” string
The actual memory accesses performed by the kernel on the kernel argument. Only present if “.value_kind” is “global_buffer”, “image”, or “pipe”. This may be more restrictive than indicated by “.access” to reflect what the kernel actual does. If not present then the runtime must assume what is implied by “.access” and “.is_const” . Values are:

“read_only”
“write_only”
“read_write”

https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-metadata

rampitec added inline comments.Aug 10 2023, 12:01 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
26 ↗(On Diff #548385)

Makes senses as long as agreed with RT. However, I would argue this shall be set in the same patch.

cfang updated this revision to Diff 549132.Aug 10 2023, 12:17 PM
cfang retitled this revision from [AMDGPU] Do not deduce access qualifiers from IR attributes to [AMDGPU] Emit .actual_access metadata.
cfang edited the summary of this revision. (Show Details)

implement .actual_access metadata emission.

arsenm added inline comments.Aug 10 2023, 1:47 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
28 ↗(On Diff #548385)

Don't see new actual_access?

llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

Seems like the mirror write only deduction test is missing

cfang added inline comments.Aug 10 2023, 5:55 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

We currently only emit read_only for .actual_access. Do you know what are the exact
conditions to emit "write_only" or "read_write"? Thanks.

if (Arg.getType()->isPointerTy() && Arg.onlyReadsMemory() &&
    Arg.hasNoAliasAttr()) {
  ActAccQual = "read_only";
}
cfang added inline comments.Aug 10 2023, 5:57 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg-v3.ll
28 ↗(On Diff #548385)

This is the test for code object version 2. Do you think I should still update for V2?

arsenm added inline comments.Aug 11 2023, 3:39 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

Arg.onlyReadsMemory and Arg.hasAttribute(Attribute::WriteOnly) (don't know if you really need the noalias part here, it's probably a runtime workaround if you do)

cfang updated this revision to Diff 549551.Aug 11 2023, 5:29 PM

add the case of deduced "write_only" emission.

cfang added inline comments.Aug 11 2023, 5:31 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

@rampitec: Do you remember exactly why noalias is needed?

rampitec added inline comments.Aug 11 2023, 11:35 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

The fact that you do not write through this pointer and pointers derived from it does not tell that you do not write this memory. You can tell you don't if there can be no other pointers to that memory as well.

rampitec added inline comments.Aug 11 2023, 11:44 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
2–3

Like:

void foo(const int *p1, int *p2) {
  *p2 = p1[1];
}

This is wrong to assume that memory under p1 was not modified because you can call it like this:

foo(p, p);
arsenm added inline comments.Aug 14 2023, 2:16 PM
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll
27

also should test without the noaliases

cfang updated this revision to Diff 550122.Aug 14 2023, 4:19 PM

Add tests without noalias attribute.

Also plan to delete the following test in commit because all cases have been included in the added file.
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduce-ro-arg.ll

arsenm accepted this revision.Aug 14 2023, 4:33 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/hsa-metadata-deduced-arg-attr.ll
13

Reduce this down to the read_only, best to have -NOT checks as loose as possible if you aren't going to CHECK-NEXT everything

27

Ditto

This revision is now accepted and ready to land.Aug 14 2023, 4:33 PM
This revision was landed with ongoing or failed builds.Aug 23 2023, 12:58 PM
This revision was automatically updated to reflect the committed changes.