This is an archive of the discontinued LLVM Phabricator instance.

[ExecutionDepsFix] Recognize existing dep breaks
Needs ReviewPublic

Authored by loladiro on Feb 20 2017, 11:29 AM.

Details

Reviewers
myatsina
Summary

Teach ExecutionDepsFix to recognize instructions that are already
register-dependency breaking. This is done in preparation of being
more conservative in clearance assumptions at function entry/after
function calls. In this situation, this commit gives us two benefits:

  1. It reduces the number of inserted dependency breaks, by reusing those that already exist (at the moment we assume all registers have significant clerance at function entry, so basically any unused register can be used for undef reads - this will no longer be the case after the above mentioned change)
  1. It provides a simple way to test the clearance calculation code. Right now, those tests assume that all registers have large clearance at function entry. Further, while there is a way to forcably clobber register clearance (e.g. by using inline assembly), without this change, there is no easy way to clear register clearance again. This commit provides a way to do so (since LLVM materializes constant 0s in registers using dependency breaking instructions). E.g. the LLVM IR fcmp ult double %x, 0.0, will force such an instruction (assuming that %x is unknown).

Event Timeline

loladiro created this revision.Feb 20 2017, 11:29 AM
myatsina added inline comments.Feb 27 2017, 7:24 AM
lib/Target/X86/X86InstrInfo.cpp
8271

According to the guide, xorps and xorpd cannot break dependencies on pentium4, so this information should be added too.

test/CodeGen/X86/break-false-dep.ll
339

This assumption seems to be very fragile, vector zero initializer should also create a xor, and probably be more robust.
I'm starting to wonder if this test should be written in mir instead, then we would not be dependent on code gen optimizations that create xors and we could test all the additional dependency breaking instructions easily, not just xor.

349

What happens for this function?:

define double @recognize_existing(i64 %arg) {

; Mark all regs as "used" and thus having same clearance
tail call void asm sideeffect "", "~{xmm0},...,~{xmm15},~{dirflag},~{fpsr},~{flags}"()

%tmp1 = sitofp i64 %arg to double
%tmp2 = sitofp i64 %arg to double

%tmp3 = fadd double %tmp1, %tmp2
ret %tmp3
}

I expect a xor for some xmm will added before the first register.
Will we see additional xor before the second function?
Meaning, does this optimization take into account the xors we add to break dependency and thus re calculate their clearance?

myatsina edited edge metadata.Mar 15 2017, 9:56 AM

Do you have a new revision for this change?

Do you have any update on this change?