Fri, Dec 13
It's good that people are looking at achieving better modeling for the x86 backend, but we need to have a plan that doesn't require heroic effort just to get basic correctness.
Do you mean in the backend? If so, I don't think that's possible. The backends just don't have any sort of feature that could be used to get conservatively correct behavior for cheap the way intrinsics give it to us in the middle end. Once you go into instruction selection things get very low level in a hurry.
I'm looking for simple ways to modeling X86 intrinsics, but haven't find better one than modeling it one by one.
I would suggest that we need a function/call attribute roughly on the level of readonly / readnone, maybe readfponly, that says that a function has no side-effects and no dependencies on anything *except* the FP state.
Do you mean mark it at the declaration of intrinsics? Is it reasonable to mark except on dependent intrinsics?
Basic queries like Instruction::mayReadMemory() that are supposed to be used generically in code-motion transforms would then return true for calls marked that way only if they're FP-constrained functions.
Middle end or back end? I think in middle end you may need to change all releated passes to get such information to prevent optimization. And in back end, I think we can simply chain intrinsics marked except with other FP nodes like what common code doing.
The bug with __builtin_isless should be a really easy fix; the builtin just needs to be flagged as having custom type-checking, and then we need to make sure we do appropriate promotions on the arguments (but we probably do).
Thu, Dec 12
Wed, Dec 11
I was looking into trying to see if we could store this in APFloat. And I just noticed that we mark pseudonan, pseudoinfinity, and unnformals all as fcnan, and copy their original significand from the apint. I believe since the integer bit was 0, this makes apfloat encode them as a pseudonan. Then I think arithmetic operations with one of these as an operand will propagate the pseudonan. Should we have forced the integer bit of the significand when we created it to make it a true nan?
Is it possible to add a test for this?
In my experience extra iterations are often caused by bad worklist management when updating nodes sometimes. For example https://reviews.llvm.org/D50990 was supposed to fix one case of that, but I never got around to finding the original test where I observed the issue. If that patch still applies does it have any effect on any of the extra iterations in the table?
Tue, Dec 10
Seems like we should maybe teach InstCombiner::FoldOpIntoSelect to handle cases where one of the select operands becomes a constant if we fold. But we might only call FoldOpIntoSelect when one of the operands of the binop is a constant today.
Seems to be what gcc really implements https://godbolt.org/z/ZBooz9
Isn't it sufficient to just check that the true/false values of the select are x and -x. The condition itself doesn't matter. x / (select c, x, -x) -> select c ? 1 : -1
Isn't abs(INT_MIN) undefined?
expensive div is removed:_Z4fooai: # @_Z4fooai mov eax, edi mov ecx, edi neg ecx cmovl ecx, edi cdq idiv ecx ret _Z5fooari: # @_Z5fooari mov eax, edi sar eax, 31 or eax, 1 ret
GCC knows this trick too.
I don't really like the code In SelectionDAGBuilder, but I don't have a better solution either. So LGTM.
Rebase on latest D69275
Mon, Dec 9
I've also been wondering if we should drop rounding from ceil, floor, trunc, and round. Looks like lround/llround don't have rounding.
Forgot click Accept I guess. Please fix those two comments before committing
LGTM with those two comments addressed.
Sun, Dec 8
Shouldn’t we instead stop the sext and zext attributes from being added by the frontend?
Fri, Dec 6
I contacted our documentation people yesterday to point out this difference between Intel and AMD documentation. They have agreed to fix this in the next release of the SDM.
Fix bug in X86 code
Restore @kpn 's last diff.
I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.
If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.
I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.
Change the signature of SelectionDAGLegalize::ExpandLegalINT_TO_FP to allow it to update the Results vector directly.
Upload with context
-Improve some of the X86 code.
-Add Promote support. Use it for i8/i16 on X86.
-Remove changes to UnrollVectorOp which seemed to be unexercised
-Some cleanup in ExpandLegalINT_TO_FP
-Drop the changes to getNode. We can't fold NOP conversions here and asserts were recently added in another patch.
Commandeering to update the X86 code and some other fixes/cleanup.
Thu, Dec 5
Seems reasonable. LGTM
I also found this where NASM indicated they wouldn't support it https://sourceforge.net/p/nasm/bugs/324/
Do you have examples of other tools that accept this? I checked the GNU assembler and it didn't accept r8l
For X86, I think the one opcode or two is a wash. So I think this is fine.
Wed, Dec 4
Change the X86 code to the behavior as well. Regenerate the test changes which no longer applied cleanly.
I'm going to commandeer this patch and update the equivalent code in X86. I'll post a new version later this afternoon.
I wanted this to be the full change from the other patch. Including moving this all to X86BaseInfo.h
Can you please put the macro fusion changes in a separate phabricator review. I’ll review it in the morning US time and if it all looks good we can get that part committed while the other comments are being addressed.
Tue, Dec 3
-Bypass the node in STRICT_FP_EXTEND if the expanded VT matches in the put VT.
-Add a bunch of asserts to getNode to ensure we don't create bad STRICT_FP_EXTEND/STRICT_FP_ROUND.