User Details
- User Since
- Jul 26 2016, 7:17 AM (357 w, 4 d)
Wed, May 24
Test simplified
Tue, May 23
clang-format error corrected
Minor changes: comment corrected, class member name capitalized
Wed, May 17
Mon, May 8
The question is why the vectorizer failed to unroll the loop in your workload.
Fri, May 5
The review title was changed to better reflect the changes being submitted.
The variable name was changed accordingly.
The test was added to check that
May 3 2023
Apr 26 2023
Apr 3 2023
@nikic, thanks a lot, I have cherry-picked that commits, which helps.
I suspected we were "lucky" enough to hit the middle of an actively developed feature.
Mar 29 2023
This change caused a regression in AMDGPU backend.
In case optimization is done before inline, it covers opportunities for other inst-combines that may have higher precedence.
As a result, we get suboptimal code.
My idea is just to postpone the zero check folding before the compare argument, which is a call, is inlined.
If it is, the next inst-combine invocation will have a chance to do a better job, if not - it will do the same as before.
define i32 @zot(i32 noundef %arg) { bb: %inst = icmp eq i32 %arg, 0 %inst1 = call i32 @llvm.cttz.i32(i32 %arg, i1 true) %inst2 = select i1 %inst, i32 -1, i32 %inst1 %inst3 = add nsw i32 %inst2, 1 ret i32 %inst3 }
Feb 20 2023
Jan 5 2023
minor changes according to the reviewer request
Dec 20 2022
schedule printing function moved out from the GCNSchedStage class.
objects passed by reference are marked as const
Dec 19 2022
refactorerd getScheduleMetrics to avoid copying
Dec 15 2022
amdgpu-schedule-metric-bias description updated
amdgpu-schedule-metric-bias=100
makes the scheduler always prefer the occupancy over the latency.
but the schedule metrics are still computed.
Did you mean the option to completely switch the getScheduleMetrics OFF?
register class and move size from the virtual source register
operand type check is loosen to !isReg()
- metric calculation calls localized to the UnclusteredHighRPStage::shouldRevertSchedule. Now the getScheduleMetric is only called if it is really necessary.
- -amdgpu-schedule-metric-bias=<unsigned value> compiler option was added to ease the further testing and tuning. It defaults to 10 which means the schedule w/o bubbles gets 10 points reward.
- Several other changes according to the reviewer request.
Dec 13 2022
reviewers comments addressed
add operand instead of adding immediate
Dec 12 2022
assign getImm() result to unsigned for implicit conversion
changed as requested
test renamed
test added
floating points changed to scaled integers
Fixed in https://reviews.llvm.org/D139874
Since we ensure all the VGPR to SGPR copies are uniform, we just need to V_READFIRSTLANE_B32 here.
Model map renamed to ReadyCycles
changes as requested
Dec 9 2022
yet one more minor code cleanup
odd changes removed
Nov 22 2022
removed changes not relevant directly to the current revision
added assert to check if the flat scratch offset is aligned
removed unrelated changes
Nov 21 2022
added test for using and restoring frame register for offset calculation.
S_SUBB_U32 changed to S_ADDC_U32 with "-Offset"
Since the https://reviews.llvm.org/D137574 has been landed, this review is updated to use backward PEI.
Nov 18 2022
rebased
Nov 17 2022
Nov 16 2022
TargetRegisterInfo::eliminateFrameIndex signature changed to return "true" if the MachineInstr passed in by iterator was removed
Nov 14 2022
TargetRegisterInfo::supportsBackwardScavenger() description changed
Added test case for the frame index in the last instruction in the BB. "::llvm" removed.
Nov 10 2022
define <4 x i32> @extract0_bitcast_raw_buffer_load_v4i32(<4 x i32> inreg %rsrc, i32 %ofs, i32 %sofs) local_unnamed_addr #0 { %var = tail call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %rsrc, i32 %ofs, i32 %sofs, i32 0) ret <4 x i32> %var }IR dump after amdgpu-isel:
bb.0 (%ir-block.0): liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5
Common code for debug instructions frame index operand replacement is factored
out to separate function.
no else after return
Nov 9 2022
Formatting
The first really working version of the frame index elimination with backward walk.
Nov 7 2022
TargetRegisterInfo::eliminateFrameIndex invalidate iterator passed to it. It can remove the instruction or change the number of its operands.
I am thinking of changing its interface to make it return an iterator that points to the new/changed instruction to let caller handle this.
Temporary workaround to avoid SCC liveness scanning loop and using manually set
"dead" flags to decrease the amount of unnecessary SCC preserving code.
Nov 2 2022
Oct 31 2022
Sorry for my tediousness but
I would like to see any inspirational reason for this change.
Oct 24 2022
Corrected wrong Frame Register restore code.
Several minor changes as requested.
Oct 21 2022
Applied approach suggested by @rampitec
Oct 20 2022
In fact, to avoid this unwanted loop I can use another hack - just forward scavenger one position to look up the SCC liveness at the insertion slot and then immediately backward scavenger to restore its position and avoid breaking the PEI logic.
This looks weird but avoids the loop.
Oct 19 2022
SCC copy restore has been made wave size independent.