- User Since
- Dec 3 2018, 7:49 AM (132 w, 1 d)
May 22 2020
Rebasing the patch. (Changed code in TargetTransformInfoImpl.hpp, around line 864).
May 21 2020
Rebasing the patch - there's now one more call side for getCastInstrCost
May 20 2020
- Allow VCMPs before the VCTP
- Add test case as requested
- Revert the null-pointer check I added in the previous patch & refactor the condition as suggested
May 19 2020
Re-adding the instructions in the calls to TTI::getCastInstrCost in the LoopVectorizer.
- Addressed comments (see items marked as "done")
- Changed "reportVectorizationFailure" calls in "prepareTailFoldingByMasking" into simple debug prints.
- Added note explaining how instructions that write to VPR.P0 work
- Removed the logic to check VPTs, they're acceptable whenever a VCMP is
- Adding VPT before VCTP test case
- It was causing a crash, which I fixed by adding a null-pointer check at line 404
- Added new debug message at line
- Changing implementation of the patch following discussion
- Removed the ReportFailure argument of prepareToFoldTailByMasking. I don't think it's useful anymore, but feedback is welcome. (The only thing that annoys me is that we now print "loop not vectorized" even when we'll fallback to a scalar epilogue)
- Added a test that makes use of the attribute that enables tail-folding
- Simplified tests
May 18 2020
May 15 2020
May 14 2020
- Removing instruction from calls to getCastInstrCost in the LoopVectorizer.
May 13 2020
- Cleaning up vectorizer tests (removed all extra attributes)
- Refactored & Improved the cast.ll test
- Removed isLoadOrMaskedLoad, now using the CCH instead.
May 12 2020
- Simplified the vctp-chains.ll test: removed useless attributes
- Added codegen test to test the whole pipeline (based on the vctpi32 test from vctp-chains.ll)
May 11 2020
- Minor refactoring of the patch
- The pass is no longer limited to VCMPs for VCCRValue, it can now use any instruction that writes to VPR (e.g. VMSR)
- The pass no longer replaces VPNOT with copies - it just removes the VPNOT and replaces all of its uses.
- Other minor fixes
- Moved CastContextHint to TargetTransformInfo
- Moved the logic that calculates the CastContextHint from an Instruction* from getCastInstrCost (in TargetTransformInfo.cpp) to a static function in TargetTransformInfo named getCastContextHint).
- CastContextHint is no longer an optional parameter. Callers have to choose between using a None for the context, using a custom context or calling TTI::getCastContextHint
- Removed my change in BasicTTIImpl.h - (Should I restore it? It seemed to have no effects on tests)
Updating the patch following the changes to D79162
May 7 2020
Changing the implementation of the patch.
- TTI hook renamed to isProfitableLSRChainElement
- It now returns true for the VCTP
- Removed FilterOutUndesirableUses
- Now isProfitableLSRChainElement is called in LSR's isProfitableChain function. If it returns true for one of the chain's UserInst, the chain will be considered profitable and will not be optimized by LSR.
May 5 2020
Change IsAcceptableVPT and adding GetReachingDefs
Closing this for now as we plan to fix this in LSR instead.
May 4 2020
- Refactorings (see comments marked "done")
- Fold even when only one side is free to invert. This brings back the mve-pred-or changes.
- Moved the (not(vcmp)) -> !vcmp fold to PerformXORCombine
May 1 2020
Apr 30 2020
Apr 29 2020
Apr 28 2020
Apr 27 2020
Updated the patch: now the transformation only happens if one of the operands is a condition that can be immediately inverted.
It isn't as good as the other version (in terms of improvements) but it's safer (there is less risk of generating terrible code in some situations)
Apr 20 2020
I reworked the implementation of the patch, it should be cleaner now.
Updated the patch following review.
Apr 17 2020
- recomputeVPTBlockMask now takes a reference instead of 2 iterators
- Fixed assertion in recomputeVPTBlockMask
- Added support for optimising multiple VPT blocks in the same Basic Block, and added a test to show it
- Now VCCRValue is only set for VCMPs. (It doesn't really make a difference, but makes it clear that only VCMPs are supported by this optimisation)
Apr 16 2020
- removing a one-line-change that didn't belong to this patch (a TTI change)
Apr 15 2020
Apr 14 2020
- Adding if (Subtarget->hasMVEIntegerOps()) before setTargetDAGCombine(ISD::VSELECT)
- Rebasing the patch, so I can commit it earlier (I'll commit it with the VPT Optimisations pass which has already been approved)
- Adding another test + restoring the old test (which produces a weird result, but it's fixed by the child revision)
Closing for now as I'm working on other things, I'll reopen when/if I can come back to this.
Apr 9 2020
Apr 8 2020
- Removed useless comment.
- Now calls setTargetDAGCombine(ISD::VSELECT) everytime, but the PerformVSELECTCombine function will return early if MVE Integer Ops are not enabled.
Apr 7 2020
I had to revert this change because it seemed to have caused multiple buildbot failures:
- (and a few more)
Apr 6 2020
- Moved the Const->getAPIntValue().sextOrTrunc(VT.getScalarSizeInBits()) expression out of the switch, storing it in ConstValue
- Changed the ZeroOrOneBooleanContent case to use ConstValue as well
Apr 3 2020
Here is the change using VT.getScalarSizeInBits().
PrevVCMPResultKiller is now correctly reset back to nullptr, but I didn't add a test for it as it was not useful (there was nothing to test).
It's pretty much an NFC, the behaviour is the exact same as before, but it's indeed more correct to reset it once a VCMP has been replaced by a VPNOT.
Apr 2 2020
- Fixed bugs related to isKill flags in multiple places. I now correctly change the isKill flags when needed, and I added tests for that.
- Rebased the patch - Added the mve-vpt-blocks.ll test.