This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
ClosedPublic

Authored by dmgreen on Jan 18 2022, 11:29 AM.

Details

Summary

Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected.

Fixes #53120 and #50905 and #51761.

Diff Detail

Event Timeline

dmgreen created this revision.Jan 18 2022, 11:29 AM
dmgreen requested review of this revision.Jan 18 2022, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2022, 11:29 AM

Thanks for working on this!

qrdmilh -> qrdmulh in the commit message.

I'm out of my depth math wise here so can't approve overall but perhaps you can clarify the aim here.

Since it's introduction, the qrdmlah has been represented as a qrdmilh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected.

  • We had qrdmlah which output qrdmulh then sadd_sat. This didn't behave as expected for one set of inputs.
  • You've introduced an intrinsic for qrdmlah. So now it'll output the one vqrdmlah instruction. This behaves as expected for all inputs.
  • The existing test cases will now (have always?) output qrdmulh and sqadd.

That last bit I don't understand. Wouldn't those test cases output the new single instruction? I think I am misunderstanding which cases these are and what they test.

llvm/test/CodeGen/ARM/neon-v8.1a.ll
31–32

Should this refer to vqadd etc instead?

dmgreen updated this revision to Diff 401151.Jan 19 2022, 2:51 AM
dmgreen edited the summary of this revision. (Show Details)

Thanks for working on this!

qrdmilh -> qrdmulh in the commit message.

I'm out of my depth math wise here so can't approve overall but perhaps you can clarify the aim here.

Since it's introduction, the qrdmlah has been represented as a qrdmilh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected.

  • We had qrdmlah which output qrdmulh then sadd_sat. This didn't behave as expected for one set of inputs.
  • You've introduced an intrinsic for qrdmlah. So now it'll output the one vqrdmlah instruction. This behaves as expected for all inputs.
  • The existing test cases will now (have always?) output qrdmulh and sqadd.

That last bit I don't understand. Wouldn't those test cases output the new single instruction? I think I am misunderstanding which cases these are and what they test.

Yeah - The vqrdmlah ACLE intrinsic will produce a single vqrdmlah (or sqrdmlah). That should all be fine. The test cases with sadd.sat + arm.neon.vqrdmulh will now produce two instructions (as you would expect).

The test cases with sadd.sat + arm.neon.vqrdmulh will now produce two instructions (as you would expect).

Where before they would have been merged into one instruction, got it.

dmgreen edited the summary of this revision. (Show Details)Jan 19 2022, 5:45 AM
samtebbs added inline comments.Jan 27 2022, 7:16 AM
llvm/test/CodeGen/ARM/neon-v8.1a.ll
33

What is it about these tests that makes the vqrdmlsh invalid for the inputs? (referring to the commit message line that says the existing test cases produce the two instructions, which are more correct than the single vqrdmlsh)

dmgreen added inline comments.Jan 27 2022, 7:48 AM
llvm/test/CodeGen/ARM/neon-v8.1a.ll
33

It produces different output for some of the inputs. https://github.com/llvm/llvm-project/issues/53120 has some details and an example that is probably better than I can put here. The simplified version is that for the same input the two bits of assembly (vqrdmulh;vqadd vs vqrdmlah) do not produce the same output, which makes it invalid as a compiler transform.

Ah I see, makes sense. LGTM

This revision was not accepted when it landed; it landed in state Needs Review.Jan 27 2022, 11:19 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 11:19 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript