This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
ClosedPublic

Authored by pbarrio on Jan 9 2018, 7:25 AM.

Details

Summary

Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Diff Detail

Repository
rL LLVM

Event Timeline

pbarrio created this revision.Jan 9 2018, 7:25 AM
craig.topper resigned from this revision.Jan 9 2018, 9:54 AM
SjoerdMeijer added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
736 ↗(On Diff #129083)

How about MVT::v8f16? Does this one needs to get a similar treatment?

lib/Target/AArch64/AArch64InstrInfo.td
5852 ↗(On Diff #129083)

Don't think I understand why this is now a special case. Why is this one different from the other patterns here?

Hi Sjoerd, thanks for the review. I have attached some thoughts on your comments and I will upload a new patch soon.

lib/Target/AArch64/AArch64ISelLowering.cpp
736 ↗(On Diff #129083)

I could do the same for MVT::v8f16, but this would be an optimization rather than a bugfix, since LLVM generates correct code in this case:

ld1     { v0.2d }, [x0]
rev64   v0.8h, v0.8h

which does the correct byte reversal after the load. For the MVT::v4f16 case, LLVM currently generates the following incorrect code:

ld1     { v0.2s }, [x0]
rev64   v0.4h, v0.4h

I will give the optimization a try and upload an updated patch soon.

lib/Target/AArch64/AArch64InstrInfo.td
5852 ↗(On Diff #129083)

Earlier in the file there is an explanation about why we need to insert REVs when we do bitcasts in big endian (~line 5600). At the end, the following comment suggests that identity conversions (e.g. same-size float-to-int conversions) do not need it:

// Most bitconverts require some sort of conversion. The only exceptions are:
//   a) Identity conversions -  vNfX <-> vNiX

It makes sense to me that a type conversion in big endian between vectors with elements of the same size does not need a byte reversal. This reversal should have been done right before, otherwise the original vector (in this case, with type v4f16) would have also been incorrect.

efriedma added inline comments.
lib/Target/AArch64/AArch64InstrInfo.td
5852 ↗(On Diff #129083)

There should be a testcase specifically for this change, then. Maybe something like this, to force the conversion:

%x = add <4 x i16> %i, 1
%y = bitcast <4 x i16> %x to <4 x half>
%z = fpext <4 x half> %y tp <4 x float>

Please put this in a separate patch from the AArch64TargetLowering::addTypeForNEON changes. Also, please move the pattern out of the IsBE predicate.

olista01 added inline comments.Jan 16 2018, 2:03 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
736 ↗(On Diff #129083)

Why is this change needed for correctness of v4f16, but only an optimisation for v8f16? If there's a difference between the treatment of these two types elsewhere in the compiler, would it not be better to handle them consistently?

lib/Target/AArch64/AArch64InstrInfo.td
5849 ↗(On Diff #129083)

I suspect that the problem is actually in these patterns - the pattern on line 5823 (v2i32 -> v4i16) used REV32, but this one (v2i32 -> v4f16) uses REV64. I would assume that these patterns should be the same regardless of whether the lanes are integers or floating-point.

pbarrio updated this revision to Diff 129953.Jan 16 2018, 7:05 AM

I have fixed the bug as olista01 suggested, which is more straightforward than
my previous fix.

There is still the question about why we require reversals in identity
conversions, but I believe that is affecting other conversions apart from the
v4i16->v4f16 ones. This is an optimization and can be handled in another patch,
as efriedma suggested.

olista01 added inline comments.Jan 16 2018, 7:26 AM
lib/Target/AArch64/AArch64InstrInfo.td
5852 ↗(On Diff #129083)

(discussed in person with Pablo, noting here for the record)

We think some of the other patterns here are also wrong, especially the vNf16<->vNi16 ones, which obviously don't need REV instructions. We should fix all of them at once.

pbarrio updated this revision to Diff 130140.Jan 17 2018, 4:49 AM

Fixed big-endian bitconvert patterns and extensive testing for half-float vectors.

olista01 accepted this revision.Jan 17 2018, 5:02 AM

LGTM, thanks.

I've spotted a missed optimisation in the tests, but that should be done as a separate patch.

test/CodeGen/AArch64/arm64-big-endian-bitconverts.ll
70 ↗(On Diff #130140)

It looks like we generate more efficient code for the v4i16 case than the v4f16 case above. Is there a way we could get better code here? I think this is just an optimisation, so makes sense to do it as a separate patch (or raise a ticket if you don't have time to do it now).

This revision is now accepted and ready to land.Jan 17 2018, 5:02 AM
pbarrio added inline comments.Jan 17 2018, 5:14 AM
test/CodeGen/AArch64/arm64-big-endian-bitconverts.ll
70 ↗(On Diff #130140)

Thanks, I will post a patch soon. I can fix it by doing something similar to the first iteration of this patch.

This revision was automatically updated to reflect the committed changes.