- User Since
- Jan 28 2016, 8:30 AM (335 w, 23 h)
Thu, Jun 23
May 5 2022
Yep, in SSA use of A means it dominates B so it seems correct to expect only narrowing of the exec mask. Let's try and see what happens.
Feb 23 2022
Feb 21 2022
Fixed, updated description of the patch.
Sorry for breaking the test, it turned out it has differences between two runs in what functions are left after the inliner, for example: barney exists on the first path but don’t exist on the second path. This cannot be handled with a single CHECK prefix.
replaced ARGPROMOTION prefix with CHECK, renamed %tmp IR values to %temp to avoid update warnings like "WARNING: Change IR value name 'tmp1' or use --prefix-filecheck-ir-name to prevent possible conflict with scripted FileCheck name"
Feb 20 2022
From a cursory look at the implementation, does this handle unwinding properly?
store i32 0, ptr %arg
call void @may_unwind() readnone
store i32 1, ptr %arg
I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.
Feb 18 2022
Feb 17 2022
@arsenm I've added inoutargs2.ll test which is modified rewrite_out_arguments.ll. Original checks are saved with REF prefix, differences marked with "; ***" prefix, please take a look. Most of the differences are related to bitcasts which my patch doesn't support yet, I'm not sure if it should be included in this commit
Feb 16 2022
Using IsArgGepInBound map just to hold a boolean seems expensive to me, can you just add a field to OriginalLoads value?
Feb 14 2022
Updated opaque-ptr.ll test to use ALL prefix to simplify checks
@nikic please take a look now, I've addressed the issue with opaque pointers.
Feb 10 2022
Feb 7 2022
Can you check if the testcases in llvm/test/CodeGen/AMDGPU/rewrite-out-arguments.ll are redundant and/or covered by the ones you add here?
@arsenm I looked into the test and its not directly suitable for this pass as behaives differently, however the cases can be used for testing.
Some of the per review issues addressed.
Feb 4 2022
Feb 1 2022
Looks almost good
Jan 26 2022
In D117562#3272690, @vangthao wrote:
Right, by the time we hit pre-ra scheduler, occupany may already be decreased by MachineLICM and since the scheduler does not currently have any way to decrease register pressure of live-throughs, it would not have any ability to increase occupancy if it was decreased by MachineLICM hoisting from loops. Enforcing an occupancy target for RA will help in this case. Another issue is I believe the scheduler will allow for more VGPR usage as long as it does not decrease occupancy. This may hide some achievable occupancy if we do not handle this before scheduler or make scheduler aware that it can increase occupancy by sinking. I am not sure if disabling MachineLICM altogether in high RP situations would be the best solution here.
Jan 25 2022
more comments to follow
Feb 26 2021
Following the discussion at https://reviews.llvm.org/D97342.
Looks good either but as with previous patch I think setting HasClusteredNodes to true to avoid discovering it's value is a bit misleading.
Looks good, but should we use just a single dedicated pass over SUs to check if there're clustered ops after first scheduling to make the logic slightly easier?
Oct 23 2020
Oct 22 2020
Jay, Matt do you have objections with this patch?
Oct 21 2020
Fixed formatting issue
Oct 20 2020
Fixed formattind and lint issues.
Oct 19 2020
Added test, more review issues fixed.
Oct 16 2020
Oct 14 2020
Per review fixes.
Oct 11 2020
I've found the case when execMayBeModifiedBeforeAnyUse randomly leads to a coredump, which is hard to debug. Most likely it's because an instruction beyond the end of a basic block is accessed. This means that the first loop calculates some instructions twice and I was wrong assuming use_nodbg_instructions doesn't repeat them. In fact there is no code in MachineRegisterInfo::verifyUseList that ensures that uses belonging to one instruction should be sequent in the use list nor the traces of such ordering can be found in MachineRegisterInfo::addRegOperandToUseList. I'm going to fix this code.
Sep 24 2020
If that piece of code turns out to be mandatory, we'll find out soon enough and we would get a test case :).
Sep 23 2020
Sorry, I had to explain the context around the modified code.
Sep 22 2020
Sep 7 2020
Aug 31 2020
Rebased, updated per review comments.
Aug 20 2020
Aug 18 2020
Jul 23 2020
Ok, let's keep it simple, no much benefit with maps. LGTM.
Overally looks good
Jul 15 2020
I have no objections but I'll ask Dmirty to check this from asm/dasm side.
Jul 14 2020
Jul 10 2020
Jul 7 2020
Given than the condition here is rare, I'm ok with the patch, reworking undef handling in scheduler is a massive work.
I understood your patch. Generally I think patching LIS with cleared undef flags ins't right at first place because the semantic of register lifetime is ruined. After the first move there should be an undef flag set at the %1.sub2 = IMPLICIT_DEF instruction which would break lives of all subregs live at this point. Can we drop updating LIS during scheduling and recreate it from scratch? Or may be fully recreate only the intervals for the registers involved in moves when the undef flag is patched after scheduling?
Can you please add a LiveInterval dump (possibly truncated) for the test before and after each move to this review? It's a bit hard to follow what happens there.
Jul 3 2020
Jul 2 2020
Update before commit: used isIdenticalTo, rebased.
Jun 26 2020
Rebased, added check if use is Src0 or Src1
Indeed, thanks Jay, updated.
Jun 25 2020
Rebased, per review issues fixed, moved VNInfo::print definition ot of class.
split to parts
Rebased, per review issues fixed:
Jun 24 2020
Jun 20 2020
Jun 9 2020
Updated patch. Everything is done except I decided to left readsM0 check for the no-ret atomics case.