This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implemnt MSVC __mulh and __umulh builtins and corresponding IR level intrinsics
ClosedPublic

Authored by mstorsjo on Jul 23 2021, 4:01 PM.

Details

Summary

This seems to work, but would it need more testing for anything else
than just the most trivial happy path cases, and any other tests than
these? (I tried to look for some of the existing tests for MSVC ARM64
intrinsics, but they're all very sparingly tested.)

This should fix PR51128.

Diff Detail

Event Timeline

mstorsjo created this revision.Jul 23 2021, 4:01 PM
mstorsjo requested review of this revision.Jul 23 2021, 4:01 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 23 2021, 4:01 PM

Do we need LLVM intrinsics for these? For the x86 equivalents, we just generate mul i128.

rnk added a comment.Jul 23 2021, 7:57 PM

Do we need LLVM intrinsics for these? For the x86 equivalents, we just generate mul i128.

I worry that LLVM will end up generating a call to compiler-rt to implement i128 arithmetic, especially in unoptimized builds, and currently nothing autolinks clang_rt.builtins-${arch}.lib in an MSVC environment. We can do that, I filed an issue for it, it just needs to get done.

I also worry that without the intrinsic, LLVM will more often than not fail to match the pattern here. The issues users filed about x86 rotate instructions come to mind, LLVM failed to produce the desired rot instructions sometimes.

We won't call compiler-rt for i128 multiply on aarch64. It's not worthwhile under any circumstances.

Pattern-matching 64/64->128 multiply in SelectionDAG legalization has been reliable in practice, as far as I know. And even it misses due to weird behavior in instcombine or something like that, we only end up with one or two extra mla instructions, so not a big deal.

rnk added a comment.Jul 26 2021, 12:21 PM

We won't call compiler-rt for i128 multiply on aarch64. It's not worthwhile under any circumstances.

Pattern-matching 64/64->128 multiply in SelectionDAG legalization has been reliable in practice, as far as I know. And even it misses due to weird behavior in instcombine or something like that, we only end up with one or two extra mla instructions, so not a big deal.

Sounds reasonable to use i128 to me then.

mstorsjo updated this revision to Diff 366839.Aug 17 2021, 2:09 AM

Reimplemented the builtins by expanding to the corresponding i128 multiplication in IR, just like the corresponding existing __mulh and __umulh builtins on x86.

rnk accepted this revision.Aug 18 2021, 2:14 PM

lgtm

This revision is now accepted and ready to land.Aug 18 2021, 2:14 PM
This revision was landed with ongoing or failed builds.Aug 19 2021, 1:30 AM
This revision was automatically updated to reflect the committed changes.