This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask
ClosedPublic

Authored by Peter on Aug 24 2022, 10:49 PM.

Details

Summary

This commit fixes https://github.com/llvm/llvm-project/issues/57326.

Currently we would take a Mask out and directly use it by doing auto Mask = SVI->getShuffleMask();
However, if the mask is undef, this Mask is not initialized. It might be a vector of -1 or random integers.
This would cause an Out-of-bound read laterwhen trying to find a StartMask.

This change checks if all indices in the Mask is in the allowed range. If not, we would early abort.

Diff Detail

Event Timeline

Peter created this revision.Aug 24 2022, 10:49 PM
Peter requested review of this revision.Aug 24 2022, 10:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 10:49 PM
Peter retitled this revision from [AArch64 SelectionDAG] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask to [AArch64] Add index check before lowerInterleavedStore() uses ShuffleVectorInst's mask.Aug 24 2022, 11:06 PM
fhahn added a subscriber: fhahn.Aug 25 2022, 10:43 AM
fhahn added inline comments.
llvm/test/CodeGen/AArch64/aarch64-shuffle-undef.ll
24

can the vectors be shortened here? Does it also crash with a poison mask?

Peter added inline comments.Aug 25 2022, 11:06 AM
llvm/test/CodeGen/AArch64/aarch64-shuffle-undef.ll
24

From my experience, it seems only lengths of 32 will trigger the bug.

I just tested again. undef crashes both Debug and Release build. poison only crashes Debug build. This just adds the mystery for me...

I checked the failed testcases in Transform, it seems they only considered when part of the Mask is undef or poison, not the whole Mask.

Peter added inline comments.Aug 25 2022, 11:16 AM
llvm/test/CodeGen/AArch64/aarch64-shuffle-undef.ll
24

Ok I managed to find a testcase with only vector length of 16.

; ModuleID = 'PoC.ll'
source_filename = "M"

define void @f() {
BB:
  %A2 = alloca <16 x i32>, align 64
  %B = urem <16 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
  %S = shufflevector <16 x i32> %B, <16 x i32> %B, <16 x i32> undef
  store <16 x i32> %S, <16 x i32>* %A2, align 64
  ret void
}

What I don't understand is, if we change this case's integer type from i32 to i16, llc will complaint, even though the instruction seems legit to me

./build-debug/bin/llc: error: ./build-debug/bin/llc: PoC.ll:8:22: error: invalid shufflevector operands
  %S = shufflevector <16 x i16> %B, <16 x i16> %B, <16 x i16> undef

But when switching to <16 x i32>, it crashes...

fhahn added inline comments.Aug 25 2022, 1:37 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

This is probably too restrictive and not really matches the comment.

There are a few test failures with the patch, including LLVM.Transforms/InterleavedAccess/AArch64::interleaved-accesses.ll

llvm/test/CodeGen/AArch64/aarch64-shuffle-undef.ll
24

Could be a bit simplified further: https://llvm.godbolt.org/z/663P9z73G

Yeah, the vector of indices needs to have i32 as element type I think.

Peter added inline comments.Aug 25 2022, 1:45 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

I didn't know if undef or poison are part of the Mask is legal. That' s why I put a restrict check here.

If they are legal, should we just check the index every time before Mask is indexed? I am afraid that code won't be pretty, but that' s the only thing I can think of.

llvm/test/CodeGen/AArch64/aarch64-shuffle-undef.ll
24

That example is awesome. I though it needs to be a constant for it to work.

fhahn requested changes to this revision.Aug 25 2022, 1:50 PM

Marking this as changes requested as there are some test failures.

This revision now requires changes to proceed.Aug 25 2022, 1:50 PM
Peter updated this revision to Diff 455739.Aug 25 2022, 3:27 PM

Use new checks to sanitize Mask. Adding new tests and fixed test failures.

Previous check abort if anyone index is invalid, which is too strict.
A Mask can have undef/poison values in some of the indeces and we shouldn't abort in that case.
Thus we have failed testcases.

All we need to look out for is when the whole mask is undef, thus -1.
Therefore, we changed the checks from "abort if ANY index is invalid" to "abort if ALL index is invalid"

After discussion with Fhahn, we find that poison can also crash the code like undef, thus a poison testcase is added.

Peter updated this revision to Diff 455743.EditedAug 25 2022, 3:45 PM

Fixed a formatting error

Peter updated this revision to Diff 456015.Aug 26 2022, 1:51 PM

Fixed format issue

fhahn added inline comments.Aug 30 2022, 12:27 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

LLVM provides any_of/all_of helpers that take a iterator range. It would be good to use them here.

// If mask is `undef` or `poison`, `Mask` may be a vector of -1s.

There's a constexpr for UndefMaskElem: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/Instructions.h#L1996

It would probably be better to check for that. IIUC this is the only case where an index could be out-of-range; other cases would be already rejected by the verifier I think.

Peter updated this revision to Diff 456719.Aug 30 2022, 10:35 AM

Using llvm::all_of and UndefMaskElem to do the testing

Peter updated this revision to Diff 456721.Aug 30 2022, 10:39 AM

Fixed format issue.

fhahn added inline comments.Aug 31 2022, 7:47 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

Don't we want any_of here? Won't a single -1 create an issue in the mask?

Peter added inline comments.Sep 1 2022, 1:23 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

No, It seems a few UndefMaskElem won't affect how it works. See test case https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/InterleavedAccess/AArch64/interleaved-accesses-inseltpoison.ll.

Only a whole vector of UndefMaskElem will crash it.

Peter added inline comments.Sep 1 2022, 1:25 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

Correction, see test case here: https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/InterleavedAccess/AArch64/interleaved-accesses-inseltpoison.ll#L246

Seems they expected a few elements to be undef or poison when they invented this feature, but never thought if the whole mask is undef.

Peter marked 2 inline comments as not done.Sep 8 2022, 11:52 AM
fhahn added inline comments.Sep 9 2022, 3:13 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

Ah right, that makes me think that the patch doesn't fully fix the underlying issue.

E.g. the following below should also crash

define void @foo(<8 x i64> %a, ptr %dst) {
BB:
  %S = shufflevector <8 x i64> %a, <8 x i64> %a, <16 x i32> <i32 0, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef>
  store <16 x i64> %S, ptr %dst, align 64
  ret void
}
fhahn added inline comments.Sep 9 2022, 3:19 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

If some lanes are undef, that's fine and there's already some support, the issue is that we compute an invalid index to find the start mask. But if all elements we need the select have undef indices, that's would still be fine.

There are more variations with only some undef/poison lanes that crash.

Peter added inline comments.Sep 12 2022, 11:36 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13369

You are right, the example you proposed is crashing.
Now I am lostdon't think I fully grasp the reason for the crashing.

This fails too: https://godbolt.org/z/f91z97cPY
The index it picks when the the first item in the mask is undef for a part of the split store.

Peter added a comment.Sep 29 2022, 3:06 PM

I spent another morning trying to figure out how to fix this and failed.
I am confused why index is calculated as IdxJ * Factor + IdxI, isn't IdxJ = StoreCount * LaneLen * Factor + j; already multiplied by Factor? Why do it twice?
It seems to me there is no better way than inserting an

if (IdxJ * Factor + IdxI > Mask.size()) { 
  return true;
}

before every access.

If you agree with me I'll upload it.

Peter added a comment.Sep 29 2022, 3:21 PM

Another question is, what is the expected outcome when the start index is undef?
It seems like when NEON is turned off, the whole shuffle will be ignored. Then we can check all start indices and abort if any of them is undef, but I am not sure if that's desired.

I am confused why index is calculated as IdxJ * Factor + IdxI, isn't IdxJ = StoreCount * LaneLen * Factor + j; already multiplied by Factor? Why do it twice?

Yeah - I think that logic is just wrong, so it is going out of range. Mask elements are almost never undef, and the combination of an undef first elements and wider-than-legal vectors is rare enough that no-one has noticed it is incorrect before. It would be good to support it though, if we can get the logic correct.

The Mask[IdxJ * Factor + IdxI] index into Mask should never be out-of-range. It needs to find the first element that isn't undef (-1) in that store-chunk, for elements in that Factor, and use that to find the equivalent StartMask. I think it should just be StoreCount * LaneLen * Factor + j * Factor + i, but I may have the LaneLen and Factor backwards. There is a comment that suggests that at least one of the elements shouldn't be undef, but it appears from your test if it is getting there then that might not be true.

Peter updated this revision to Diff 464107.Sep 29 2022, 5:23 PM

fixed index calculation error.

Peter added a comment.Sep 29 2022, 5:24 PM

I am confused why index is calculated as IdxJ * Factor + IdxI, isn't IdxJ = StoreCount * LaneLen * Factor + j; already multiplied by Factor? Why do it twice?

Yeah - I think that logic is just wrong, so it is going out of range. Mask elements are almost never undef, and the combination of an undef first elements and wider-than-legal vectors is rare enough that no-one has noticed it is incorrect before. It would be good to support it though, if we can get the logic correct.

The Mask[IdxJ * Factor + IdxI] index into Mask should never be out-of-range. It needs to find the first element that isn't undef (-1) in that store-chunk, for elements in that Factor, and use that to find the equivalent StartMask. I think it should just be StoreCount * LaneLen * Factor + j * Factor + i, but I may have the LaneLen and Factor backwards. There is a comment that suggests that at least one of the elements shouldn't be undef, but it appears from your test if it is getting there then that might not be true.

I am more confident to fix it once you feel like it's a bug too. I just submitted latest change.

Thanks - this looks OK to me. Can you add these two extra tests to the new test file, both of which will produce the same output. Then this LGTM.

define void @noundefs(<8 x i32> %a, <8 x i32> %b, ptr %dst) {
BB:
  %S = shufflevector <8 x i32> %a, <8 x i32> %b, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 undef, i32 14, i32 7, i32 15>
  store <16 x i32> %S, ptr %dst, align 64
  ret void
}

define void @undefs(<8 x i32> %a, <8 x i32> %b, ptr %dst) {
BB:
  %S = shufflevector <8 x i32> %a, <8 x i32> %b, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 3, i32 11, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 7, i32 15>
  store <16 x i32> %S, ptr %dst, align 64
  ret void
}
llvm/test/CodeGen/AArch64/aarch64-shufflevector.ll
1 ↗(On Diff #464107)

aarch64-shufflevector.ll is a bit of a generic name for this test file, considering what it is testing. Maybe change it to be something that mentions interleave stores with undef elements?

Peter updated this revision to Diff 464327.Sep 30 2022, 10:55 AM

add more tests. change file name to indicate interleaved access w under

dmgreen accepted this revision.Oct 3 2022, 12:58 AM

Thanks. LGTM

Peter added a comment.Oct 4 2022, 3:28 PM

@fhahn Hi Florian, when you have time can you review this and approve it when you have time?

I don't have push access to LLVM upstream, I'm hoping if you can push the diff for me or grant me access.

I can commit it, so long as you can provide a "name <email@example.com>" to use for attribution of the git commit.

Peter added a comment.Oct 9 2022, 1:22 PM

If you can do it that's the best.
Please use "Peter Rong, PeterRong96@gmail.com".

Thanks.

fhahn accepted this revision.Oct 10 2022, 3:03 AM

Thanks for the latest changes, LGTM!

This revision is now accepted and ready to land.Oct 10 2022, 3:03 AM
This revision was automatically updated to reflect the committed changes.