This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
ClosedPublic

Authored by vhscampos on Oct 21 2019, 5:02 AM.

Details

Summary

Writing support for three ACLE functions:

unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Diff Detail

Event Timeline

vhscampos created this revision.Oct 21 2019, 5:02 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 21 2019, 5:02 AM
compnerd added inline comments.Oct 21 2019, 11:40 AM
clang/lib/Headers/arm_acle.h
150

I don't see a pattern match for the cls64 on ARM32, would that not fail to lower?

155

Should we have a __clsll extension, otherwise these two are the same in LLP64? I'm thinking about the LLP64 environments, where long and long long are different (32-bit vs 64-bit).

vhscampos marked 4 inline comments as done.Oct 22 2019, 2:54 AM
vhscampos added inline comments.
clang/lib/Headers/arm_acle.h
150

Yes. However, for now, I am not enabling support for cls64 on ARM32 as it is not done yet.

155

ACLE does provide a long long version of cls called __clsll. But since the support for cls64 on Arm32 is not done yet, I decided not to write support for __clsll. If I did, it would work for 64-bit but not for 32-bit.

Please let me know what you think.

compnerd added inline comments.Oct 22 2019, 10:18 PM
clang/lib/Headers/arm_acle.h
150

Is the difference not just the parameter type? I think that implementing it should be a trivial change to the existing implementation. Is there a reason that you are not implementing that?

155

clang supports Windows where long is 4-bytes even on 64-bit targets, and this means that this doesn't work for that target. I think that we need to add __clsll so that 64-bit ARM at least is covered.

vhscampos marked 2 inline comments as done.Oct 23 2019, 5:29 AM
vhscampos added inline comments.
clang/lib/Headers/arm_acle.h
150

At clang's side, yes, but not in the backend: Arm32 does not have a cls instruction, thus the CLS operations need to be custom lowered. In the llvm.arm.cls(i32) case, lowering is quite simple, and it's been included in this patch. For llvm.arm.cls64(i64), on the other hand, it is not as trivial since it's necessary to break its logic into 32-bit instructions.

So the reason not to implement that (yet) is just to split work in two different efforts.

155

I'm not sure if I am following you. On AArch64-Windows, __clsl will be lowered to llvm.aarch64.cls(i32) which will then be custom lowered correctly. Let me know if I am thinking this wrong.

compnerd added inline comments.Oct 23 2019, 7:27 PM
clang/lib/Headers/arm_acle.h
150

Would it not be sufficient to do the top half (after a shift right of 32-bits), and if it is exactly 32, then do the bottom 32-bits, otherwise, you're done?

155

Windows is LLP64, __cls and __clsl are the same thing. It needs a __clsll for the 64-bit value.

vhscampos updated this revision to Diff 226430.Oct 25 2019, 8:06 AM

Add support for __clsll.

vhscampos retitled this revision from [ARM][AArch64] Implement __cls and __clsl intrinsics from ACLE to [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE.Oct 25 2019, 8:08 AM
vhscampos edited the summary of this revision. (Show Details)
vhscampos edited the summary of this revision. (Show Details)
vhscampos marked an inline comment as done.Oct 25 2019, 8:13 AM

Added support for __clsll as requested.

clang/lib/Headers/arm_acle.h
150

Sort of. How we interpret the bottom half depends on the value of the top half. I've added this custom lowering in the latest revision.

compnerd accepted this revision.Oct 25 2019, 8:39 AM

Thanks for all the adjustments, this looks good.

This revision is now accepted and ready to land.Oct 25 2019, 8:39 AM
This revision was automatically updated to reflect the committed changes.