This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
ClosedPublic

Authored by rampitec on Mar 20 2019, 7:21 PM.

Details

Summary

Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Diff Detail

Event Timeline

rampitec created this revision.Mar 20 2019, 7:21 PM

I feel like we just keep spamming DCE runs to solve issues. Is there some reason it's difficult for RenameIndependentSubregs to clean up after itself?

I feel like we just keep spamming DCE runs to solve issues. Is there some reason it's difficult for RenameIndependentSubregs to clean up after itself?

Also, I think making more passes to have to care about LiveIntervals should generally be avoided

I feel like we just keep spamming DCE runs to solve issues. Is there some reason it's difficult for RenameIndependentSubregs to clean up after itself?

Also, I think making more passes to have to care about LiveIntervals should generally be avoided

It is not only RenameIndependentSubregs, detect dead lanes also just mark them dead. I guess the assumption was it will be later deadcoded. And that never happened.
Even if both passes will cleanup after themselves both will need to handle LIS anyway.

arsenm added inline comments.Mar 25 2019, 12:33 PM
lib/CodeGen/DeadMachineInstructionElim.cpp
198–202

Can you just use shrinkToUses?

rampitec marked an inline comment as done.Mar 25 2019, 1:49 PM
rampitec added inline comments.
lib/CodeGen/DeadMachineInstructionElim.cpp
198–202

I am afraid not. It does far less than needed. In particular it does not handle defs, and this cannot be solved by just removing a dead interval.

Ping.

Since the primary goal of the DCE here is to clean dead subregs I can bail this pass invocation if subreg liveness if not supported by target. What do you think?

Ping.

Since the primary goal of the DCE here is to clean dead subregs I can bail this pass invocation if subreg liveness if not supported by target. What do you think?

In fact the BE affected most if x86. Adding @craig.topper as a reviewer. At the same time x86 does not override subRegLivenessEnabled, so it should not be limited to targets with subreg liveness enabled only.

craig.topper added inline comments.Apr 2 2019, 1:31 PM
test/CodeGen/X86/combine-bitselect.ll
657 ↗(On Diff #192198)

This test should be fixed to not be an infinite loop. The middle end wouldn't have given this code to the backend.

test/CodeGen/X86/speculative-load-hardening-gather.ll
10 ↗(On Diff #192198)

This looks like speculative load hardening is creating an instruction it doesn't end up using. The speculative loading hardening pass runs after the previous times the MachineDCE was ran. Maybe we should fix the pass to not insert dead instructions.

test/CodeGen/X86/speculative-load-hardening-indirect.ll
134 ↗(On Diff #192198)

This looks like new regular expressioning the script wasn't doing previously. Can you drop this from this from your diff or rerun the script on an unmodified trunk and commit it.

test/CodeGen/X86/tail-dup-merge-loop-headers.ll
262 ↗(On Diff #192198)

This basic block isn't reachable and is keeping other code alive. That would normally have been removed by IR passes.

rampitec marked 3 inline comments as done.Apr 2 2019, 3:33 PM
rampitec added inline comments.
test/CodeGen/X86/combine-bitselect.ll
657 ↗(On Diff #192198)

Does it change anything? The test has no side effects anyway.
I am not really sure what does it test, so I am a little afraid to touch it. I can add an option not run extra DCE pass here if needed.

test/CodeGen/X86/speculative-load-hardening-indirect.ll
134 ↗(On Diff #192198)

Will do.

test/CodeGen/X86/tail-dup-merge-loop-headers.ll
262 ↗(On Diff #192198)

Do I conclude x86 does not really need this pass, since most of the stuff can be handled separatetly, like speculative load hardening?
If that is the case I can only enable it if subreg liveness is enabled, like proposed earlier.

rampitec updated this revision to Diff 193382.Apr 2 2019, 3:50 PM
rampitec marked 2 inline comments as done.

Rebased.

test/CodeGen/X86/speculative-load-hardening-indirect.ll
134 ↗(On Diff #192198)

Done.

I don't think X86 needs this pass. Everything here looks like bad test construction or the issue in speculative load hardening.

test/CodeGen/X86/combine-bitselect.ll
657 ↗(On Diff #192198)

I updated this test in trunk so that it no longer has an infinite loop that would allow things to be deleted.

I don't think X86 needs this pass. Everything here looks like bad test construction or the issue in speculative load hardening.

OK, thanks! Than I will skip it for targets which do not track subreg liveness.

rampitec updated this revision to Diff 193617.Apr 3 2019, 3:47 PM

Skip this invocation for targets which do not have subregister liveness.

Hi @rampitec,

Are the dead instructions marked during the detect dead lanes pass or during the rename independent SubReg pass?

I would expect the former and in that case, it may be easier to just do the code deletion there.

Cheers,
-Quentin

Hi @rampitec,

Are the dead instructions marked during the detect dead lanes pass or during the rename independent SubReg pass?

I would expect the former and in that case, it may be easier to just do the code deletion there.

Cheers,
-Quentin

The detect dead lanes pass cannot really delete dead lanes, just mark them. So a subreg definition becomes dead, but in most cases retain a use by REG_SEQUENCE.
Rename independent subregs deletes REQ_SEQENCEs and the subreg looses that use, so finally can be deleted.

I.e. both passes can free up new dead instructions independently, but in most cases they need to work together.

qcolombet added inline comments.Apr 4 2019, 10:56 AM
lib/CodeGen/DeadMachineInstructionElim.cpp
120

I feel that the pass shouldn't be making that decision.
If someone put that in their pipeline, the pass should just run.

rampitec marked an inline comment as done.Apr 4 2019, 11:03 AM
rampitec added inline comments.
lib/CodeGen/DeadMachineInstructionElim.cpp
120

TargetPassConfig does not have subtarget access when it adds a pass to the pipeline.
So the choice is: 1) always run even if it will not find anything 2) perform this check 3) insert it from the target and not from common TargetPassConfig.
In all cases code to handle LIS is needed though.

I can make it only running from AMDGPU target since we always track subreg liveness. Would that be better?

rampitec updated this revision to Diff 193766.Apr 4 2019, 12:45 PM
rampitec retitled this revision from Add MachineDCE pass after RenameIndependentSubregs to [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs.

Moving pass insertion from common TargetPassConfig into AMDGPU, since most targets do not need it.
Changes to maintain liveness if available remain in the MachineDCE pass, but are a no-op for any other target until they decide to use it late in the RA.

rampitec marked an inline comment as done.Apr 4 2019, 12:47 PM
rampitec added inline comments.
lib/CodeGen/DeadMachineInstructionElim.cpp
120

I have reverted the check and only add it from AMDGPU. If any other target feels it may need it there is an option to do the same.
If this is committed it will be possible since pass will not destroy liveness anymore.

Looks reasonable to me.

I leave the final approval to @arsenm for the AMDGPU changes.

arsenm added inline comments.Apr 5 2019, 11:13 AM
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
168–170 ↗(On Diff #193766)

I prefer avoiding double negative flags, so -amdgpu-dce-in-ra and default to true

test/CodeGen/AMDGPU/dead-lane.mir
11–16

You can drop the registers section (although %0 will need a class marker)

rampitec updated this revision to Diff 193922.Apr 5 2019, 11:35 AM
rampitec marked 2 inline comments as done.

Addressed review comments.

arsenm accepted this revision.Apr 5 2019, 12:49 PM

LGTM except for the option description string

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
170 ↗(On Diff #193922)

Description needs to be updated

This revision is now accepted and ready to land.Apr 5 2019, 12:49 PM
rampitec updated this revision to Diff 193951.Apr 5 2019, 1:05 PM
rampitec marked an inline comment as done.

Changed option description.

arsenm accepted this revision.Apr 5 2019, 1:09 PM

LGTM

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2019, 1:13 PM
bjope added a subscriber: bjope.Apr 8 2019, 4:24 AM

I see a couple of problems with the changes to DeadMachineInstructionElim.cpp in this patch:

  • I get MachineVerifier problems because the LIS update does not take into consideration that the removal of a dead use can introduce disjoint live ranges (resulting in verifier complaining about "Multiple connected components in live interval".
  • I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this dead %42:regs = COPY $a1, if that is removed and it is the last use of the physical register $a1 then we need to shrink the live range for $a1 , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".
  • I do not think the !MRI->reg_empty(Reg) check takes debug info into consideration. Possibly introducing a debug invariance problem.
  • It does not seem like this pass is using the LIS itself. So why do we even bother keeping it update? It is destroyed, and then it is up to the pass manager to detect if it needs to be recalculated by a later pass, right? Or do we know that it will be used by a later pass, and thereby we want to bother about preserving it (even if it isn't for the greater good of the pass itself). If it is like that, then we probably need to add some code comments to highlight that we only do it because some other pass needs it (for example in case that pass is removed in the future).
  • Is it really true that we now preserve all possible analyses (both existing and future, both upstream and downstream), motivating the change from setPreservesCFG() to setPreservesAll(). I mean, the pass is pretty "destructive" as it removes instructions. You have only added (afaict incomplete) support to preserve the LIS. Notice that I'm not sure exactly what the difference between "PreservesAll" and "PreservesCFG" is, but I suppose it includes more analysis results besides LIS (and SlotIndices).

Maybe we can back out the change in DeadMachineInstructionElim.cpp until the above problems have been sorted out?

Bjorn,

did you enable MachineDCE in RA in some other target? So far it is only enabled in the AMDGPU, so I am curious where do you see these problems?

I see a couple of problems with the changes to DeadMachineInstructionElim.cpp in this patch:

  • I get MachineVerifier problems because the LIS update does not take into consideration that the removal of a dead use can introduce disjoint live ranges (resulting in verifier complaining about "Multiple connected components in live interval".

I am recomputing liveness for all affected uses, how could that be?

  • I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this dead %42:regs = COPY $a1, if that is removed and it is the last use of the physical register $a1 then we need to shrink the live range for $a1 , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".

That is probably right. I avoid deleting dead phys, but not their uses. I think shrinkToUses() will be appropriate here.

  • I do not think the !MRI->reg_empty(Reg) check takes debug info into consideration. Possibly introducing a debug invariance problem.

I thought about this, so I did not use reg_nodbg_empty() in this instance. I.e. this LI gets recreated.

  • It does not seem like this pass is using the LIS itself. So why do we even bother keeping it update? It is destroyed, and then it is up to the pass manager to detect if it needs to be recalculated by a later pass, right? Or do we know that it will be used by a later pass, and thereby we want to bother about preserving it (even if it isn't for the greater good of the pass itself). If it is like that, then we probably need to add some code comments to highlight that we only do it because some other pass needs it (for example in case that pass is removed in the future).

The pass does not use it, but should update if it exists. That is true, if I do not claim it is preserved then it will be recalculated, but that is very slow.
So far only AMDGPU should be using it with liveness available. Am I missing some other invocation?

  • Is it really true that we now preserve all possible analyses (both existing and future, both upstream and downstream), motivating the change from setPreservesCFG() to setPreservesAll(). I mean, the pass is pretty "destructive" as it removes instructions. You have only added (afaict incomplete) support to preserve the LIS. Notice that I'm not sure exactly what the difference between "PreservesAll" and "PreservesCFG" is, but I suppose it includes more analysis results besides LIS (and SlotIndices).

It does preserve LIS and SlotIndexes. The rest of the analysis passes (like PDT) are just unaffected by the pass itself. I believe that is more or less the same assumption for all passes which use PreservesAll() that we are considering exiting analysis passes. Otherwise this method is not future proof in general.

Maybe we can back out the change in DeadMachineInstructionElim.cpp until the above problems have been sorted out?

If there is no better way. If you see the problems do you mind to share an IR?

  • I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this dead %42:regs = COPY $a1, if that is removed and it is the last use of the physical register $a1 then we need to shrink the live range for $a1 , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".

Actually I cannot reproduce it:

# RUN: llc -march=amdgcn -mcpu=tonga %s -run-pass liveintervals,dead-mi-elimination -verify-machineinstrs -o - | FileCheck -check-prefix=GCN %s
  
---
name:            dead_phys_use
tracksRegLiveness: true
liveins:
  - { reg: '$vgpr0', virtual-reg: '%2' }
body:             |
  bb.0:
    liveins: $vgpr0

    %2:vgpr_32 = COPY $vgpr0
    $vgpr1 = COPY %2:vgpr_32
    %1:vgpr_32 = COPY $vgpr0, implicit $exec
    SI_RETURN_TO_EPILOG $vgpr1
...

The last COPY is dead and $vgpr0 will be affected, but verifier did not find any errors:

bb.0:
  liveins: $vgpr0

  %0:vgpr_32 = COPY $vgpr0
  $vgpr1 = COPY %0
  SI_RETURN_TO_EPILOG $vgpr1
bjope added a comment.Apr 8 2019, 11:18 AM

Bjorn,

did you enable MachineDCE in RA in some other target? So far it is only enabled in the AMDGPU, so I am curious where do you see these problems?

Well, our OOT target is using the DeadMachineInstructionElim pass in (maybe non-standard according to in-tree target) positions in the backend pipeline.

I see a couple of problems with the changes to DeadMachineInstructionElim.cpp in this patch:

  • I get MachineVerifier problems because the LIS update does not take into consideration that the removal of a dead use can introduce disjoint live ranges (resulting in verifier complaining about "Multiple connected components in live interval".

I am recomputing liveness for all affected uses, how could that be?

Ok, so the test case is from the OOT target. Here is an excerpt of the MIR:

0B	bb.0:

16B	  %0:rn = ...
32B	  %10:an32pairs = ...
80B	  %4:an32_0_7 = ... %10.hiAcc:an32pairs, ...
96B	  brr_cond %bb.3, ...
112B	  brr_uncond %bb.1

128B	bb.1:

144B	  brr_cond %bb.2, ...
160B	  brr_uncond %bb.3

176B	bb.2:

192B	  %8:anh_rn = COPY %4.lo16:an32_0_7
224B	  %7:anh_rn = COPY %10.hiAcc_then_lo16:an32pairs
256B	  brr_uncond %bb.4

272B	bb.3:

288B	  %9:rn = ...
304B	  %10:an32pairs = ... %9:rn, ...
352B	  %7:anh_rn = COPY %10.hiAcc_then_lo16:an32pairs
368B	  %8:anh_rn = COPY %10.hiAcc_then_hi16:an32pairs

384B	bb.4::

400B	  nop 0, $noreg, 0, implicit %10.loAcc:an32pairs
416B	  $a0h = COPY %8:anh_rn
432B	  $a1h = COPY %7:anh_rn
448B	  rets implicit $a0h, implicit $a1h

The nop is removed by by DMIE, resulting in

*** Bad machine code: Multiple connected components in live interval ***
- function:    f3
- interval:    %10 [32r,224r:0)[304r,368r:1)  0@32r 1@304r weight:0.000000e+00
  • I get MachineVerifier problems because the LIS update does not update live ranges for physical registers. Consider an instruction like this dead %42:regs = COPY $a1, if that is removed and it is the last use of the physical register $a1 then we need to shrink the live range for $a1 , otherwise the MachineVerifier will complain about "Live segment doesn't end at a valid instruction".

That is probably right. I avoid deleting dead phys, but not their uses. I think shrinkToUses() will be appropriate here.

  • I do not think the !MRI->reg_empty(Reg) check takes debug info into consideration. Possibly introducing a debug invariance problem.

I thought about this, so I did not use reg_nodbg_empty() in this instance. I.e. this LI gets recreated.

So it might be that that the call to createAndComputeVirtRegInterval only happens when having debug info. And afaict that might give a different codegen compared to the situation when not having any debug info. I think we usually try to avoid that. Codegen should not be impacted by existence of debug info.

  • It does not seem like this pass is using the LIS itself. So why do we even bother keeping it update? It is destroyed, and then it is up to the pass manager to detect if it needs to be recalculated by a later pass, right? Or do we know that it will be used by a later pass, and thereby we want to bother about preserving it (even if it isn't for the greater good of the pass itself). If it is like that, then we probably need to add some code comments to highlight that we only do it because some other pass needs it (for example in case that pass is removed in the future).

The pass does not use it, but should update if it exists. That is true, if I do not claim it is preserved then it will be recalculated, but that is very slow.
So far only AMDGPU should be using it with liveness available. Am I missing some other invocation?

Well, in my case it is an OOT target. So I'm can't blame you for missing that.

  • Is it really true that we now preserve all possible analyses (both existing and future, both upstream and downstream), motivating the change from setPreservesCFG() to setPreservesAll(). I mean, the pass is pretty "destructive" as it removes instructions. You have only added (afaict incomplete) support to preserve the LIS. Notice that I'm not sure exactly what the difference between "PreservesAll" and "PreservesCFG" is, but I suppose it includes more analysis results besides LIS (and SlotIndices).

It does preserve LIS and SlotIndexes. The rest of the analysis passes (like PDT) are just unaffected by the pass itself. I believe that is more or less the same assumption for all passes which use PreservesAll() that we are considering exiting analysis passes. Otherwise this method is not future proof in general.

Ok. A little bit scary. But if that is how PreservesAll() works in general I guess it is ok.

Maybe we can back out the change in DeadMachineInstructionElim.cpp until the above problems have been sorted out?

If there is no better way. If you see the problems do you mind to share an IR?

A little bit hard due to that I've only triggered it for the OOT target so far. For now I hacked around it in our downstream clone (although using setPreservesCFG instead gives some churn in some AMDGPU test cases, so it seems like we get different results if LIS is recalculated instead of just being preserved by DeadMachineInstructionElim, I would have expected to get the same result, at least if the preserve worked correctly).

Anyway. I can try to make in-tree reproducers tomorrow.

Actually shrinkToUses does not work with phys. That should be possible to avoid removing dead phys uses if that phys is allocatable (otherwise liveness is not calculated for it).
I also do not see how to reproduce disjoint intervals.
And yes, changing it back to PreservesCFG() should effectively switch this off.

@bjope could you please check if D60419 solves your problem with physreg use removal?

@bjope could you please check if D60419 solves your problem with physreg use removal?

WRT disjoint liveranges. I only see such possibility with a physreg, because virtual should be recomputed. So maybe the same patch would help.

bjope added a comment.EditedApr 9 2019, 7:28 AM

Here is a MIR-reproducer triggering "Multiple connected components in live interval".

For some reason I needed to add "simple-register-coalescing" at the end of the list of passes to run. Otherwise the pass manager decides to discard LIS before dead-mi-elimination.

# RUN: llc -mtriple=amdgcn-- -run-pass=liveintervals,machineverifier,dead-mi-elimination,machineverifier,simple-register-coalescing -o /dev/null %s

---
name:            foo
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $sgpr0_sgpr1

    %10:sreg_128 = S_LOAD_DWORDX4_IMM killed $noreg, 9, 0
    S_NOP 0, implicit-def %4:sreg_128, implicit %10.sub1:sreg_128
    S_CBRANCH_SCC0 %bb.3, implicit undef $scc
    S_BRANCH %bb.1

  bb.1:
    S_CBRANCH_SCC0 %bb.2, implicit undef $scc
    S_BRANCH %bb.3

  bb.2:
    %8:sreg_32_xm0 = COPY %4.sub1:sreg_128
    %7:sreg_32_xm0 = COPY %10.sub1:sreg_128
    S_BRANCH %bb.4

  bb.3:
    %10:sreg_128 = S_LOAD_DWORDX4_IMM killed $noreg, 10, 0
    %7:sreg_32_xm0 = COPY %10.sub1:sreg_128
    %8:sreg_32_xm0 = COPY %10.sub2:sreg_128

  bb.4:
    S_NOP 0, implicit %10
    $sgpr0 = COPY %8:sreg_32_xm0
    $sgpr1 = COPY %7:sreg_32_xm0
    S_ENDPGM 0, implicit $sgpr0, implicit $sgpr1
...
bjope added a comment.Apr 9 2019, 8:29 AM

@bjope could you please check if D60419 solves your problem with physreg use removal?

Isn't it weird that this pass now is doing less optimizations when an analysis pass is provided? It might be OK for your use case, but it seems weird from a more general perspective.

FWIW, I'm not sure exactly why we are running this pass in a lot of places in our OOT target (after ISel, before coalescing, post-coalescing, etc). But we have done it for years (since before I started working with this target), and now some of those runs won't do as much cleanup as they used to do (when a LIS is found). Not sure, but maybe it is exactly these physreg removals we aim for, so I'm not sure that solving the LIS preservation problem by not eliminating certain dead instructions is good enough for us. It will take some time to analyse that, so at the moment it is easier for me to just make a work around patch downstream.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Isn't it weird that this pass now is doing less optimizations when an analysis pass is provided? It might be OK for your use case, but it seems weird from a more general perspective.

FWIW, I'm not sure exactly why we are running this pass in a lot of places in our OOT target (after ISel, before coalescing, post-coalescing, etc). But we have done it for years (since before I started working with this target), and now some of those runs won't do as much cleanup as they used to do (when a LIS is found). Not sure, but maybe it is exactly these physreg removals we aim for, so I'm not sure that solving the LIS preservation problem by not eliminating certain dead instructions is good enough for us. It will take some time to analyse that, so at the moment it is easier for me to just make a work around patch downstream.

Bjorn, thank you for the testcase. It doesn't seem to be easy to quick fix, while I am going for vacation tomorrow. I will create a revert patch for now and try to sort it out later.

I understood someone might be aiming for dead phys regs, but handling phys without recompute can be much more challenging, so probably a full recompute is not unreasonable under such scenario too.

@bjope could you please check if D60419 solves your problem with physreg use removal?

Isn't it weird that this pass now is doing less optimizations when an analysis pass is provided? It might be OK for your use case, but it seems weird from a more general perspective.

FWIW, I'm not sure exactly why we are running this pass in a lot of places in our OOT target (after ISel, before coalescing, post-coalescing, etc). But we have done it for years (since before I started working with this target), and now some of those runs won't do as much cleanup as they used to do (when a LIS is found). Not sure, but maybe it is exactly these physreg removals we aim for, so I'm not sure that solving the LIS preservation problem by not eliminating certain dead instructions is good enough for us. It will take some time to analyse that, so at the moment it is easier for me to just make a work around patch downstream.

Bjorn, thank you for the testcase. It doesn't seem to be easy to quick fix, while I am going for vacation tomorrow. I will create a revert patch for now and try to sort it out later.

I understood someone might be aiming for dead phys regs, but handling phys without recompute can be much more challenging, so probably a full recompute is not unreasonable under such scenario too.

Revert patch: D60466

bjope added a comment.Apr 9 2019, 9:16 AM

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

Thank you for the hint and test case. For the phys I could remove all regunits, but for some reason I cannot always get them reconstructed. It will need some more debugging.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

For the record, this code works if applied after createAndComputeVirtRegInterval() call.

For the record, I think the "multiple connected components" problem could have been solved by something like this after the call to "createAndComputeVirtRegInterval"

LiveInterval &LI = LIS->getInterval(Reg);
assert(Reg == LI.reg && "Invalid reg to interval mapping");
if (TargetRegisterInfo::isVirtualRegister(LI.reg)) {
  // Check whether or not LI is composed by multiple connected
  // components and if that is the case, fix that.
  SmallVector<LiveInterval*, 8> SplitLIs;
  LIS->splitSeparateComponents(LI, SplitLIs);
}

However, that still leaves the questions about if the pass should do less optimizations related to physregs when LIS is available, or how to preserve LIS if optimizing on physregs.
And I'm not sure about the cost of using splitSeparateComponents compared to a full recompute.

For the record, this code works if applied after createAndComputeVirtRegInterval() call.

The problem with phys regs is that I see no way to recompute liveins. If I remove reg units for affected register they are reconstructed on the next query. However if it was a livein I will drop it as well, and it is only recomputed in private LiveIntervals::computeLiveInRegUnits(), which is only called from LiveIntervals::runOnMachineFunction()...