This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix non-intrinsic roundss/roundsd to not read the destination register
ClosedPublic

Authored by mkuper on Dec 1 2016, 3:07 PM.

Details

Summary

This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking.

This fixes PR31143.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 79989.Dec 1 2016, 3:07 PM
mkuper retitled this revision from to [X86] Fix non-intrinsic roundss/roundsd to not read the destination register.
mkuper updated this object.
mkuper added reviewers: craig.topper, zvi.
mkuper added a subscriber: llvm-commits.
craig.topper edited edge metadata.Dec 1 2016, 5:04 PM

Should these rows in X86InstrInfo.cpp be moved to MemoryFoldTable1 from MemoryFoldTable2?

{ X86::ROUNDSDr,        X86::ROUNDSDm,      0 },
{ X86::ROUNDSSr,        X86::ROUNDSSm,      0 },

And should ROUNDSSr_Int and ROUNDSDr_Int be added to MemoryFoldTable2? And related to that should RSQRTSSr_Int, RCPSSr_Int, SQRTSDr_Int, and SQRTSSr_Int actually be in MemoryFoldTable2 instead of MemoryFoldTable1?

Nevermind the question about RSQRTSSr_Int, RCPSSr_Int, SQRTSDr_Int, and SQRTSSr_Int. Didn't realize they don't take two arguments.

And ROUNDSSr_Int is already in the folding table I dont' know how i missed that earlier.

So only my question about moving ROUNDSSr and ROUNDSDr to the operand 1 table stands.

Yes, thanks a lot for catching this!

The test change I made for folding was wrong, I didn't notice those are minsize test, where we *should* be folding, the only reason we stopped folding was because it was in the wrong table.

mkuper updated this revision to Diff 80105.Dec 2 2016, 11:04 AM
mkuper edited edge metadata.

Fixed folding to use the right operand.

craig.topper accepted this revision.Dec 3 2016, 11:05 AM
craig.topper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 3 2016, 11:05 AM

As Michael have already noticed, I also have a solution for this false dependency bug (https://reviews.llvm.org/D27391).
What happens is that some instructions (like the ROUNDSSr) read an undef value from one of it's source operands.
Part of the logic that searches for false dependencies decides if the dependency can be broken only if the instruction doesn't read the operand.
I think that a read of an undef value should not be considered as a real read, and this is the fix in my patch. I believe this approach will catch more cases.

Thanks,
Marina

Worth adding an equivalent roundsd test to pr31143.ll?

As Michael have already noticed, I also have a solution for this false dependency bug (https://reviews.llvm.org/D27391).
What happens is that some instructions (like the ROUNDSSr) read an undef value from one of it's source operands.
Part of the logic that searches for false dependencies decides if the dependency can be broken only if the instruction doesn't read the operand.
I think that a read of an undef value should not be considered as a real read, and this is the fix in my patch. I believe this approach will catch more cases.

Thanks,
Marina

To reiterate what I wrote on the other patch - the only reason we end up with reading an undef value for the ROUNDSS/Drr is because we have a source operand tied with the destination, and then all patterns that match these instructions shove an IMPLICIT_DEF into that operand - making it, in practice, a dummy operand. This is in contrast to the other SSE instructions that have a false dependence on the high lanes, that simply don't have this operand.

I think we should be modeling all these instructions in the same way.
One option is to sync ROUNDSS/Drr with the rest, by removing the operand, like this patch does.
The other is to add the operand to all relevant instructions (making them binary instead of unary), and put something like D27391 in place.
Or am I missing a reason why ROUND should be modeled differently from the rest?

Regarding D27391 catching more cases - Marina, do you have anything specific in mind? Which instructions would that apply to?

Worth adding an equivalent roundsd test to pr31143.ll?

Sure, will do.

mkuper updated this revision to Diff 80298.Dec 5 2016, 11:11 AM
mkuper edited edge metadata.

Updated with SD testcase.

Craig, does your LGTM still hold?

As Michael have already noticed, I also have a solution for this false dependency bug (https://reviews.llvm.org/D27391).
What happens is that some instructions (like the ROUNDSSr) read an undef value from one of it's source operands.
Part of the logic that searches for false dependencies decides if the dependency can be broken only if the instruction doesn't read the operand.
I think that a read of an undef value should not be considered as a real read, and this is the fix in my patch. I believe this approach will catch more cases.

Thanks,
Marina

To reiterate what I wrote on the other patch - the only reason we end up with reading an undef value for the ROUNDSS/Drr is because we have a source operand tied with the destination, and then all patterns that match these instructions shove an IMPLICIT_DEF into that operand - making it, in practice, a dummy operand. This is in contrast to the other SSE instructions that have a false dependence on the high lanes, that simply don't have this operand.

I think we should be modeling all these instructions in the same way.
One option is to sync ROUNDSS/Drr with the rest, by removing the operand, like this patch does.
The other is to add the operand to all relevant instructions (making them binary instead of unary), and put something like D27391 in place.
Or am I missing a reason why ROUND should be modeled differently from the rest?

Regarding D27391 catching more cases - Marina, do you have anything specific in mind? Which instructions would that apply to?

You are right, ROUND shouldn't be modeled differently than the other cases, so your change should go in.
I found other instructions and intrinsics that look for uses and find uses of undef values:
%XMM0<def,tied1> = RCPSSm_Int %XMM0<undef,tied0>, %RDI<kill>, 1, %noreg, 0, %noreg
So there is a deeper problem here than just ROUND

As Michael have already noticed, I also have a solution for this false dependency bug (https://reviews.llvm.org/D27391).
What happens is that some instructions (like the ROUNDSSr) read an undef value from one of it's source operands.
Part of the logic that searches for false dependencies decides if the dependency can be broken only if the instruction doesn't read the operand.
I think that a read of an undef value should not be considered as a real read, and this is the fix in my patch. I believe this approach will catch more cases.

Thanks,
Marina

To reiterate what I wrote on the other patch - the only reason we end up with reading an undef value for the ROUNDSS/Drr is because we have a source operand tied with the destination, and then all patterns that match these instructions shove an IMPLICIT_DEF into that operand - making it, in practice, a dummy operand. This is in contrast to the other SSE instructions that have a false dependence on the high lanes, that simply don't have this operand.

I think we should be modeling all these instructions in the same way.
One option is to sync ROUNDSS/Drr with the rest, by removing the operand, like this patch does.
The other is to add the operand to all relevant instructions (making them binary instead of unary), and put something like D27391 in place.
Or am I missing a reason why ROUND should be modeled differently from the rest?

Regarding D27391 catching more cases - Marina, do you have anything specific in mind? Which instructions would that apply to?

You are right, ROUND shouldn't be modeled differently than the other cases, so your change should go in.

Ok, great, then I'll commit this.

I found other instructions and intrinsics that look for uses and find uses of undef values:
%XMM0<def,tied1> = RCPSSm_Int %XMM0<undef,tied0>, %RDI<kill>, 1, %noreg, 0, %noreg
So there is a deeper problem here than just ROUND

This looks extremely weird.
(And now I understand what Craig's email was about! :-) )

This revision was automatically updated to reflect the committed changes.