- User Since
- Dec 3 2018, 7:49 AM (78 w, 3 d)
Fri, May 22
Rebasing the patch. (Changed code in TargetTransformInfoImpl.hpp, around line 864).
Thu, May 21
Rebasing the patch - there's now one more call side for getCastInstrCost
Wed, May 20
- Allow VCMPs before the VCTP
- Add test case as requested
- Revert the null-pointer check I added in the previous patch & refactor the condition as suggested
Tue, May 19
Re-adding the instructions in the calls to TTI::getCastInstrCost in the LoopVectorizer.
- Addressed comments (see items marked as "done")
- Changed "reportVectorizationFailure" calls in "prepareTailFoldingByMasking" into simple debug prints.
- Added note explaining how instructions that write to VPR.P0 work
- Removed the logic to check VPTs, they're acceptable whenever a VCMP is
- Adding VPT before VCTP test case
- It was causing a crash, which I fixed by adding a null-pointer check at line 404
- Added new debug message at line
- Changing implementation of the patch following discussion
- Removed the ReportFailure argument of prepareToFoldTailByMasking. I don't think it's useful anymore, but feedback is welcome. (The only thing that annoys me is that we now print "loop not vectorized" even when we'll fallback to a scalar epilogue)
- Added a test that makes use of the attribute that enables tail-folding
- Simplified tests
Mon, May 18
Fri, May 15
Thu, May 14
- Removing instruction from calls to getCastInstrCost in the LoopVectorizer.
Wed, May 13
- Cleaning up vectorizer tests (removed all extra attributes)
- Refactored & Improved the cast.ll test
- Removed isLoadOrMaskedLoad, now using the CCH instead.
Tue, May 12
- Simplified the vctp-chains.ll test: removed useless attributes
- Added codegen test to test the whole pipeline (based on the vctpi32 test from vctp-chains.ll)
Mon, May 11
- Minor refactoring of the patch
- The pass is no longer limited to VCMPs for VCCRValue, it can now use any instruction that writes to VPR (e.g. VMSR)
- The pass no longer replaces VPNOT with copies - it just removes the VPNOT and replaces all of its uses.
- Other minor fixes
- Moved CastContextHint to TargetTransformInfo
- Moved the logic that calculates the CastContextHint from an Instruction* from getCastInstrCost (in TargetTransformInfo.cpp) to a static function in TargetTransformInfo named getCastContextHint).
- CastContextHint is no longer an optional parameter. Callers have to choose between using a None for the context, using a custom context or calling TTI::getCastContextHint
- Removed my change in BasicTTIImpl.h - (Should I restore it? It seemed to have no effects on tests)
Updating the patch following the changes to D79162
Thu, May 7
Changing the implementation of the patch.
- TTI hook renamed to isProfitableLSRChainElement
- It now returns true for the VCTP
- Removed FilterOutUndesirableUses
- Now isProfitableLSRChainElement is called in LSR's isProfitableChain function. If it returns true for one of the chain's UserInst, the chain will be considered profitable and will not be optimized by LSR.
May 5 2020
Change IsAcceptableVPT and adding GetReachingDefs
Closing this for now as we plan to fix this in LSR instead.
May 4 2020
- Refactorings (see comments marked "done")
- Fold even when only one side is free to invert. This brings back the mve-pred-or changes.
- Moved the (not(vcmp)) -> !vcmp fold to PerformXORCombine
May 1 2020
Apr 30 2020
Apr 29 2020
Apr 28 2020
Apr 27 2020
Updated the patch: now the transformation only happens if one of the operands is a condition that can be immediately inverted.
It isn't as good as the other version (in terms of improvements) but it's safer (there is less risk of generating terrible code in some situations)
Apr 20 2020
I reworked the implementation of the patch, it should be cleaner now.
Updated the patch following review.
Apr 17 2020
- recomputeVPTBlockMask now takes a reference instead of 2 iterators
- Fixed assertion in recomputeVPTBlockMask
- Added support for optimising multiple VPT blocks in the same Basic Block, and added a test to show it
- Now VCCRValue is only set for VCMPs. (It doesn't really make a difference, but makes it clear that only VCMPs are supported by this optimisation)
Apr 16 2020
- removing a one-line-change that didn't belong to this patch (a TTI change)
Apr 15 2020
Apr 14 2020
- Adding if (Subtarget->hasMVEIntegerOps()) before setTargetDAGCombine(ISD::VSELECT)