This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking.
This fixes PR31143.
Paths
| Differential D27323
[X86] Fix non-intrinsic roundss/roundsd to not read the destination register ClosedPublic Authored by mkuper on Dec 1 2016, 3:07 PM.
Details Summary This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143.
Diff Detail Event TimelineComment Actions Should these rows in X86InstrInfo.cpp be moved to MemoryFoldTable1 from MemoryFoldTable2? { X86::ROUNDSDr, X86::ROUNDSDm, 0 }, { X86::ROUNDSSr, X86::ROUNDSSm, 0 }, And should ROUNDSSr_Int and ROUNDSDr_Int be added to MemoryFoldTable2? And related to that should RSQRTSSr_Int, RCPSSr_Int, SQRTSDr_Int, and SQRTSSr_Int actually be in MemoryFoldTable2 instead of MemoryFoldTable1? Comment Actions Nevermind the question about RSQRTSSr_Int, RCPSSr_Int, SQRTSDr_Int, and SQRTSSr_Int. Didn't realize they don't take two arguments. Comment Actions And ROUNDSSr_Int is already in the folding table I dont' know how i missed that earlier. So only my question about moving ROUNDSSr and ROUNDSDr to the operand 1 table stands. Comment Actions Yes, thanks a lot for catching this! The test change I made for folding was wrong, I didn't notice those are minsize test, where we *should* be folding, the only reason we stopped folding was because it was in the wrong table. This revision is now accepted and ready to land.Dec 3 2016, 11:05 AM Comment Actions As Michael have already noticed, I also have a solution for this false dependency bug (https://reviews.llvm.org/D27391). Thanks, Comment Actions
To reiterate what I wrote on the other patch - the only reason we end up with reading an undef value for the ROUNDSS/Drr is because we have a source operand tied with the destination, and then all patterns that match these instructions shove an IMPLICIT_DEF into that operand - making it, in practice, a dummy operand. This is in contrast to the other SSE instructions that have a false dependence on the high lanes, that simply don't have this operand. I think we should be modeling all these instructions in the same way. Regarding D27391 catching more cases - Marina, do you have anything specific in mind? Which instructions would that apply to? Comment Actions
Sure, will do. mkuper edited edge metadata. Comment ActionsUpdated with SD testcase. Craig, does your LGTM still hold? Comment Actions
You are right, ROUND shouldn't be modeled differently than the other cases, so your change should go in. Comment Actions
Ok, great, then I'll commit this.
This looks extremely weird. Closed by commit rL288703: [X86] Fix non-intrinsic roundss/roundsd to not read the destination register (authored by mkuper). · Explain WhyDec 5 2016, 1:07 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 79989 lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/pr31143.ll
test/CodeGen/X86/stack-folding-fp-sse42.ll
|