Page MenuHomePhabricator

[NFC][PowerPC] Adding FeatureFPU in the definition of FeatureISA3_0
AbandonedPublic

Authored by amyk on Sep 24 2019, 1:49 PM.

Details

Reviewers
power-llvm-team
nemanjai
echristo
hfinkel
Group Reviewers
Restricted Project
Summary

This is patch aims to add FeatureFPU into the definition of the feature, FeatureISA3_0.

There is no need to specify both predicates, IsISA3_0 and HasFPU, as a compliant implementation of ISA 3.0 should have the floating point unit.

Diff Detail

Event Timeline

amyk created this revision.Sep 24 2019, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2019, 1:49 PM
jsji added a reviewer: Restricted Project.Sep 24 2019, 3:10 PM
jsji added inline comments.Sep 27 2019, 2:05 PM
llvm/lib/Target/PowerPC/PPC.td
193

Looks like there is some misunderstanding of what FeatureISA3_0 is for?

As in the description, FeatureISA3_0 is for new instruction added in ISA 3.0, not all the instructions in ISA 3.0.
Why it need to depend on FeatureFPU?

In other words, with the new patch -mattr=-fpu will also remove isa-v30-instructions effectively.
So that means, if you add -mattr=-fpu, you can't use any of the new instruction added for P8?
Is that what you intended?

jsji added inline comments.Sep 27 2019, 2:57 PM
llvm/lib/Target/PowerPC/PPC.td
193

I meant P9.

nemanjai added inline comments.Sep 28 2019, 1:39 PM
llvm/lib/Target/PowerPC/PPC.td
193

Yes. That is very much part of the intent here. It is not possible to have a CPU that conforms to ISA 3.0 that does not support HW floating point operations (i.e. doesn't have a HW FPU).

Having an FPU does not imply we have an implementation of ISA 3.0. But having an implementation of ISA 3.0 absolutely implies having an FPU.

jsji added inline comments.Sep 30 2019, 8:04 AM
llvm/lib/Target/PowerPC/PPC.td
193

Yes, you are right, and I also agree that having an implementation of ISA 3.0 absolutely implies having an FPU.

However, looks like to me that FeatureISA3_0 is NOT corresponding to compliant implementation of ISA 3.0.
But rather controlling whether we can generate *new instructions in ISA 3.0*.
Also, FeatureFPU is also not corresponding to having an FPU, but rather controlling whether we can generates FPU instructions.

So if we add the *Implies* to FPU here, we are more or less saying: all ISA 3.0 instructions are also FPU instructions,
we can only generate ISA 3.0 instructions if we are allow to generate FPU instructions.

Does this dependency really exists?

What if someone would like to avoid generating FPU instructions (eg: kernel users),
but still would like to take advantage of P9 new instructions like extswsli, maddld,modud.

I believe he/she can use -mcpu=pwr9 -mattr=-fpu right now, but will get performance deg if we apply this patch.

What do you think?

nemanjai added inline comments.Sep 30 2019, 9:13 PM
llvm/lib/Target/PowerPC/PPC.td
193

If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted, we need to do a bit more work than just stop this patch from going in.

The reason we added this feature is to differentiate SPE codegen from codegen for CPUs that have an FPU. That is about all it is good for. As an example, the following will produce FP instructions:

define dso_local signext i32 @test(double %a, double %b) local_unnamed_addr #0 {
entry:
  %cmp = fcmp olt double %a, %b
  %conv = zext i1 %cmp to i32
  ret i32 %conv
}

llc -mattr=-fpu < a.ll | grep fcmpu
        fcmpu 0, 1, 2

So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.

jsji added inline comments.Oct 1 2019, 3:13 PM
llvm/lib/Target/PowerPC/PPC.td
193

If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted

I believe this is an example usage in Linux Kernel build. https://bugs.llvm.org/show_bug.cgi?id=38886, where kernel builder will use -msoft-float , which is corresponding to -mattr=-hard-float.

And with this patch we will definitely lose performance when they build with -mcpu=power9.

An example would be:

$ cat ex.ll
@z = external local_unnamed_addr global i32*, align 8

define signext i32 @_Z2tcii(i32 signext %x, i32 signext %y){
entry:
  %0 = load i32*, i32** @z, align 8
  %add = add nsw i32 %y, %x
  %idxprom = sext i32 %add to i64
  %arrayidx = getelementptr inbounds i32, i32* %0, i64 %idxprom
  %1 = load i32, i32* %arrayidx, align 4
  ret i32 %1
}

The current code gen:

$ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
        .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
# %bb.0:                                # %entry
        addis 5, 2, .LC0@toc@ha
        ld 5, .LC0@toc@l(5)
        ld 5, 0(5)
        add 3, 4, 3
        extswsli 3, 3, 2
        lwax 3, 5, 3
        blr
        .long   0
        .quad   0

With this patch:

$ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
        .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
# %bb.0:                                # %entry
        addis 5, 2, .LC0@toc@ha
        ld 5, .LC0@toc@l(5)
        add 3, 4, 3
        extsw 3, 3
        sldi 3, 3, 2
        ld 5, 0(5)
        lwax 3, 5, 3
        blr
        .long   0
193

As an example, the following will produce FP instructions:

Good example, this should be a bug, we should have a look and fix it later.

193

So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.

Understand, can we achieve the goal by using a new predicate, like:

def ISA3_0_and_FPU: Predicate<"PPCSubTarget->hasFPU() && PPCSubTarget->isISA3_0()">;
amyk updated this revision to Diff 222729.Oct 1 2019, 5:02 PM

I've realized I forgot to attach a full context patch.
Updated with full context.

nemanjai added inline comments.Oct 2 2019, 7:14 PM
llvm/lib/Target/PowerPC/PPC.td
193

I stand corrected. That is the use case I requested. Amy @amyk please abandon this patch for now as this has negative implications for kernel builds.

amyk abandoned this revision.Oct 3 2019, 9:50 PM

Thanks @jsji and @nemanjai for your insights on this patch. I will abandon this for now.