This is an archive of the discontinued LLVM Phabricator instance.

[clang][AArch64][SVE] Handle PRValue under VLAT <-> VLST cast
ClosedPublic

Authored by junparser on Jun 29 2021, 3:09 AM.

Details

Summary

This change fixes the crash that PRValue cannot be handled by
EmitLValue.

Diff Detail

Event Timeline

junparser created this revision.Jun 29 2021, 3:09 AM
junparser requested review of this revision.Jun 29 2021, 3:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 3:09 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
efriedma added inline comments.Jun 29 2021, 12:01 PM
clang/lib/CodeGen/CGExprScalar.cpp
2103

I don't think it's legal to use EmitLValue here at all; the emitted IR could have side-effects.

Since we have the vector insert/extract intrinsics now, can we just use them here instead of going through the load/store dance?

junparser added inline comments.Jun 29 2021, 7:13 PM
clang/lib/CodeGen/CGExprScalar.cpp
2103

I don't think it's legal to use EmitLValue here at all; the emitted IR could have side-effects.

I agree since we have already visited E.

Since we have the vector insert/extract intrinsics now, can we just use them here instead of going through the load/store dance?

we have already use insert/extract intrinsics for same element type, we can only handle predicate cast through memory.
One of idea here is always use store + load. what do you think?

junparser added inline comments.Jun 29 2021, 7:40 PM
clang/lib/CodeGen/CGExprScalar.cpp
2103

@efriedma I'm also working on a patch that optimize such store + bitcast + load pattern with constant vector, so maybe it is ok to always use alloca + load

efriedma added inline comments.Jun 29 2021, 9:07 PM
clang/lib/CodeGen/CGExprScalar.cpp
2103

I'd be happy to accept just unconditionally doing the alloca+store+load thing for now.

Not sure I understand why predicates are special here. Even if we can't handle predicates directly in insert/extract intrinsics, we can always just zero-extend to a bigger integer type, do the cast, then truncate the result.

junparser added inline comments.Jun 29 2021, 11:28 PM
clang/lib/CodeGen/CGExprScalar.cpp
2103

I'm not sure whether zext+cast+trunc to vxi1 can generate better code. I think it is better to handle this in llvm ir without extend and truncate, only bitcast and insert/extract

address comment.

efriedma added inline comments.Jun 30 2021, 1:17 AM
clang/lib/CodeGen/CGExprScalar.cpp
2103

I don't think we need to call getCallReturnType() here. A call that returns a reference is an lvalue, and the code here expects an rvalue. So CE->getCallReturnType() is going to be the same as E->getType().

junparser added inline comments.Jun 30 2021, 2:13 AM
clang/lib/CodeGen/CGExprScalar.cpp
2103

make sense to me, thanks!

junparser updated this revision to Diff 355484.Jun 30 2021, 2:37 AM

address comment

efriedma accepted this revision.Jun 30 2021, 10:19 AM

LGTM

clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
108

Oh, hmm, this is the case where we can't optimize.

We could probably teach instcombine to convert this pattern into a load directly from the global, if it matters.

This revision is now accepted and ready to land.Jun 30 2021, 10:19 AM
Matt added a subscriber: Matt.Jun 30 2021, 12:58 PM
junparser added inline comments.Jun 30 2021, 7:40 PM
clang/test/CodeGen/attr-arm-sve-vector-bits-globals.c
108

yep, we can also convert to bitcast + vector.insert when without vscale_range . I'll check the codegen of rvv see whether it has any difference.