- User Since
- Jan 6 2015, 6:21 AM (297 w, 3 d)
This is certainly not the expected behaviour so I'll get it fixed.
Here's the most obvious choice -- glueing the new check to the AArch64ISD::EXT generation. That's good because the intention is obvious, and future code changes won't get in the way. But it could also be argued that the NEON-sized vector check should precede this entire if-else statement. I.e. the EXTRACT_SUBVECTORs have the same problem as AArch64ISD::EXT. None of this code is ready for SVE-sized fixed vectors.
Wed, Sep 16
Mon, Sep 14
Remove the BUILD_VECTOR->EXTRACT_SUBVECTOR transform so that this is easier to review. Will post a separate Diff for that.
Tue, Sep 8
Turns out that ReconstructShuffle(...) needs a lot of work to be updated. The EXTRACT_SUBVECTOR index assumes we're extracting a 1/2 width vector. I missed that.
Thu, Sep 3
Wed, Sep 2
Update tests -- don't use pointer args for NEON sized vectors.
Fix bad copy-and-paste in CHECK lines. No significant changes made.
Updating Diff for recent upstream changes.
Tue, Sep 1
Updated with @david-arm's comments for Thursday's Sync-up call.
Thanks, David. Ok, I'll put a pin in this until Thursday's call. But yeah, isKnownMultipleOf(...) sounds fair. I'll prepare a patch to see if any surprises come up.
Mon, Aug 31
I don't really have an opinion on any of this. Just thinking aloud...
Ok, I guess that makes sense. So if we're considering Min elements, should we just overload % then?
Wed, Aug 26
˜Bah, egg on my face. You're right that D86394 fixes the immediate issue. Sorry for the noise.
Mon, Aug 24
Updating to exhibit the problem mentioned in D85546. One way to see the issue is:
There is a subtle wrong answers bug with this transform, but it's something that's under development separately. That is, FMUL(X,-1.0) != FNEG(X) when FTZ is enabled. We have a flag for checking the denormal mode, namely DenormalMode. We can address it later though. @arsenm
Fri, Aug 21
Eh, thinking some more, it's still a little weird:
Ah, you're right. I misread the register classes.
These patterns might need attention. ISD::SIGN_EXTEND_INREG expects both the input and output registers to have the same type, extending the small values in place. I.e. the input is unpacked.
Thu, Aug 20
Abandoning this Diff since most of it was covered in D84056. Will prepare a new patch to remove the problematic FSUB DAGCombine soon.
Oh, but to be fair, I didn't use DAG.SplitVector(Op.getOperand(0), DL);. So that may avoid some of the ugly expanding.
Wed, Aug 19
We also need a peep for the smaller vectors, where the extend can fit in one register.
Aug 19 2020
Updated DIV lowering to make use of correct fixed length predicates.
Aug 18 2020
Ok, I see it now. We have to explicitly call LowerToPredicatedOp(...) with the fixed types still intact, so that getPredicateForVector(...) will generate the correct predicate. Will update...
Aug 17 2020
Aug 14 2020
Remove unneeded braces.
Remove UDIV setOperationAction.
Aug 13 2020
Updated patch based on Paul's comments.
Actually, I made a mistake. New patch coming soon...
Updated based on Paul's review. Inline comments to come shortly...
Aug 12 2020
Aug 11 2020
Update based on @paulwalker-arm's comments.
Yeah, separate patch is okay. A SQRT+DIV is definitely bad.
I'm fairly sure this transform is a performance loss. For a target like Skylake Server, a SQRT(x) can take up to 20 cycles. But a RSQRT(x) is about 6 cycles and a MUL(y) is 4 cycles. We'd be better off with a X*RSQRT(X).
Aug 10 2020
MIN/MAX and DIVs are up my alley. I'll try those. Will check out VREDUCE if I get through the others.
Aug 7 2020
Aug 6 2020
Ah, okay. So it's not safe to assume that the truncate to i1 is a mask. Makes sense.
Aug 5 2020
For this IR test and -aarch64-sve-vector-bits-min=512:
Aug 4 2020
Ah, ok. Tests removed with: