This is an archive of the discontinued LLVM Phabricator instance.

Implement rudimentary support for the PowerPC SPE APU
AbandonedPublic

Authored by jhibbits on Oct 10 2017, 8:32 PM.

Details

Summary

This adds support for the Signal Processing Engine, a non-standard FPU
and vector unit on some PowerPC cores (Freescale e500, maybe others).

Currently supports:

  • asm parsing and printing for almost all instructions
  • Code generation following the SPE ABI at the llvm IR level (function call)
  • Single- and Double-precision math at the level supported by the APU, including conversion between precisions and with integers
  • Support for some vector operations (float math, some integer math)

Currently not fully compliant with the SPEPIM; anything that takes a
__ev64_opaque argument in intrinsics instead takes a v2i32.

Along with this, add the Freescale e500 scheduler, which is slightly
different from the e500mc scheduler.

Event Timeline

chmeee created this revision.Oct 10 2017, 8:32 PM

Tests are in the works right now. Until they're ready, this is a preliminary review, to see if I did anything unnecessary, or if I'm missing anything obvious.

chmeee updated this revision to Diff 119542.Oct 18 2017, 8:20 PM

Add tests, add intrinsics (required for one test). Fix several bugs found via tests.

chmeee updated this revision to Diff 120760.Oct 29 2017, 6:17 PM

Add patterns for intrinsics for codegen, and fix the intrinsic definitions.

chmeee edited the summary of this revision. (Show Details)Oct 29 2017, 6:22 PM

Adding some more powerpc regulars in hopes of getting review. Feel free to remove yourself if you're not interested.

chmeee updated this revision to Diff 124311.Nov 26 2017, 1:25 PM

Correct e500 instruction scheduling.

nemanjai edited edge metadata.Nov 28 2017, 4:12 PM

Seems like this patch should include more tests:

  • How about disassembly? (maybe test/MC/Disassembler/PowerPC/ppc64-encoding-e500.txt)
  • It doesn't seem that the single CodeGen test case has enough coverage to exercise every single one of the patterns
  • How about FastISEL test cases since you added FastISEL support?
lib/Target/PowerPC/PPCFastISel.cpp
558

Debugging artifact left behind?

660–663

The code above uses the ternary operator and is more concise and readable. Could you use it here as well?

812

It seems very strange to me that we don't need to modify this at all. The SPE comparisons seem to always set bit 1 of the respective CR field (leaving the other 3 bits undefined). As such, won't we have to always add PPC::PRED_GT rather than PPCPred?

Or am I missing something?

If my assumption is correct here and the predicate needs to change, I would actually imagine it would be better if we define a new predicate - say PPC::PRED_SPE for this. The reason is that we have transformations that will convert one predicate into another and we certainly don't want to use undefined CR bits for SPE.

1254–1260

I think just using the ternary operator here is more readable.

2008

Nit: variables start with capitals.

2009

This on the other hand, I think looks a little messy as a ternary operator now. I think it would be more readable as an if.

2020–2021

Similarly here - the ternary operator looks messy.

lib/Target/PowerPC/PPCFrameLowering.cpp
178

Nit: complete sentences in comments.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2475

Is the reason for this change to ensure that we don't enter the block when the value type of the LHS is an SPE vector? If so, wouldn't it be clearer to actually check for that (i.e. that the LHS is a vector type that is not an SPE vector type on an SPE target)?

I guess what I'm getting at is whether it is possible to end up in this block if the LHS has an SPE vector type if the target happens to have both Altivec and SPE (if that's even possible).

lib/Target/PowerPC/PPCInstrInfo.td
1217

It seems like this might be a bit fragile. With this approach, we have to ensure that future FP patterns we add have this predicate so they don't break SPE.

I'm not sure how we can do this more reliably though. Is there any precedent for this on other targets?
I suppose we could use the AddedComplexity hack to ensure that when the SPE feature is available, we ensure we use the patterns for SPE. That seems more hacky, but less fragile as none of the patterns we add in the future will be selected over the SPE patterns so we shouldn't break SPE.

3619

Why does this block need to be in this file rather than PPCInstrSPE.td?

3862

This is part of the reason I don't like the HasTraditionalFPU solution. It is not clear or obvious why we don't need that pattern here.

lib/Target/PowerPC/PPCInstrSPE.td
14–61

I am really not reviewing the changes to this file as it would be too time-consuming to cross-reference this to the ISA. I assume there's some thorough functional testing for this enablement on a processor with SPE.

test/MC/PowerPC/ppc64-encoding-spe.s
1–2

These two lines have no actual change? Just an extra space? If so, please refrain from that.

chmeee marked 8 inline comments as done.Nov 28 2017, 7:44 PM

Thanks for the review. It is in need of more tests, I know.

  • I can add to the Disassemble tests. There were not disassemble tests for the existing SPE support, so I missed that section.
  • What more CodeGen tests are needed? I know what I have is not exhaustive for error cases or edge cases. Some code paths couldn't be tested with simple tests (efdabs doesn't get generated with function calls, so I have to get creative, and haven't found anything easy for it)
  • I'll see what I can add for the fast isel tests.
lib/Target/PowerPC/PPCFastISel.cpp
558

Yeah. I don't even remember what I was debugging with it now.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2475

Since the opcodes overlap it is impossible to have both SPE and Altivec on the same target.

lib/Target/PowerPC/PPCInstrInfo.td
1217

I admit this is really fragile, but I couldn't think of a better way. What's the AddedComplexity hack?

3619

To be honest, I don't recall. I had added it to PPCInstrSPE.td and ran into problems, but it magically worked in here. I'll have to try that again.

test/MC/PowerPC/ppc64-encoding-spe.s
1–2

I think my editor screwed this up, I didn't intentionally change it. Fixing.

Thanks for the review. It is in need of more tests, I know.

  • I can add to the Disassemble tests. There were not disassemble tests for the existing SPE support, so I missed that section.

OK. Cool, that would be good.

  • What more CodeGen tests are needed? I know what I have is not exhaustive for error cases or edge cases. Some code paths couldn't be tested with simple tests (efdabs doesn't get generated with function calls, so I have to get creative, and haven't found anything easy for it)
  • There should be a test case for every instruction that can be matched with a pattern (including those that are matched to intrinsics which appear to be numerous)
  • There should also be a test case for each instruction that is emitted through custom logic - things like the various setcc/selectcc patterns, etc.
  • I'll see what I can add for the fast isel tests.

Yeah, pretty much anything you're emitting through FISel (IIRC you have loads, stores, compares, etc.)

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2475

OK. Fair enough.

lib/Target/PowerPC/PPCInstrInfo.td
1217

On targets that have VSX, we prefer to use VSX opcodes for scalar floating point operations. To ensure we don't select VSX patterns when we don't have VSX, we use a predicate. And to ensure we select the VSX pattern rather than an FPU pattern, we wrap the patterns in let AddedComplexity = 400 blocks. The selector will always choose the highest complexity pattern when multiple patterns produce a match.
Of course, it's a hack, but it has an obvious advantage over this approach - when someone adds a pattern, they'll add a test case to ensure that pattern is matched, they won't necessarily think to add a test case to ensure their pattern isn't matched on subtargets that have some other feature enabled.

Ideally, the selector would use the fact that the output pattern has operands with registers that aren't available on the target. Perhaps there's a way to do that - maybe @kparzysz would know.

chmeee marked 2 inline comments as done.Nov 30 2017, 11:50 AM
chmeee added inline comments.
lib/Target/PowerPC/PPCInstrInfo.td
1217

There is one other problem with using the AddedComplexity hack: Not all "traditional" FPU constructs are available in the SPE, so if it doesn't match an SPE operation it would still end up falling back to the FPU constructs, which obviously wouldn't work.

nemanjai added inline comments.Nov 30 2017, 12:39 PM
lib/Target/PowerPC/PPCInstrInfo.td
1217

Sure, but if such a construct made it to ISEL with your current solution, you'd get a failure to select. I suppose a compile failure is better than SIGILL at runtime, but it is still not the desired behaviour.

I think that any operations that aren't available in SPE should not be marked legal. Furthermore, I think it would be a good idea to add an assert somewhere (perhaps the asm streamer) that would trip if you ever allocate an FPR on a subtarget that has SPE.

kparzysz added inline comments.Dec 5 2017, 8:25 AM
lib/Target/PowerPC/PPCInstrInfo.td
1217

There is no explicit association between registers and targets, so the instruction selector does not have that knowledge. I cannot think of any elegant solution to this. I suspect that the only viable approach is to always assume the most restricted target and include any extra features under predicates. This means that if you add support for a target that is more restricted than everything else so far, then that "everything" now becomes "extensions" to the new "most restricted baseline".

What could be a possibility is that the DAG operations that are not available on SPE are replaced with special SPE-specific ISD opcodes (during DAG preprocessing), so that during pattern matching they could only match the SPE-specific patterns (which would have to be provided for each such operation).

hfinkel added inline comments.Dec 11 2017, 7:16 PM
lib/Target/PowerPC/PPCCallingConv.td
187

Please update this comment for SPE.

lib/Target/PowerPC/PPCFastISel.cpp
583

Remove spaces to line up break on this line with the break on the previous line.

812

I agree with @nemanjai, I think that we need a new predicate value to handle this kind of usage of the CR bits. efscmpeq and friends seem to just set bit 1 of the CR register (and, moreover, leave the others explicitly undefined).

996

Line is too long.

1234

You should do this.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2475

If you do it this way, you'll break the QPX check below (which is also mutually-exclusive with Altivec). Why can't you just add the SPE check to the QPX check below?

test/CodeGen/PowerPC/spe.ll
4

Please also add tests for fast-isel, spill/restore (you can use inline asm to force spilling), inline asm register constraints, full coverage for all fcmp operations. Not all of these tests actually check their output, please fix that.

316

Why are these commented out?

499

Why are these commented out?

chmeee marked 4 inline comments as done.Dec 15 2017, 12:43 PM
chmeee added inline comments.
lib/Target/PowerPC/PPCFastISel.cpp
1234

Nope, the comment is wrong, as is the containing 'else' block, so deleting.

chmeee marked 9 inline comments as done.Dec 19 2017, 8:41 AM
chmeee added inline comments.
lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2475

Oops, sorry about that. I'm surprised existing tests didn't pick up on that.

chmeee updated this revision to Diff 128842.Jan 5 2018, 8:32 PM
chmeee marked an inline comment as done.

Address most comments. I think tests are pretty complete now, and all bugs
found have been fixed. Removed selection for setcc on vectors.

I have gotten to PPCInstrInfo.td and wanted to submit the comments rather than delaying the review further until I am completely done with it. I'll continue to review it and add comments over the remainder of the week.
A couple of general notes:

  • In retrospect, the review process would have probably been much quicker and easier if this was split up into a bunch of smaller patches. Perhaps the target features and calling convention in the first, then the new register classes and spill/restore bits, then legalization of small groups of instructions along with the associated intrinsics/builtins/tests, etc.
  • Considering we have a mutually exclusive situation between FPU and SPE and the sheer prevalence of FPU cores and developers adding code for those, it would probably be a good idea to add some checks that we don't break SPE. What I'm thinking would be a good idea is a linear traversal in SDAG post-processing that would ensure that all the MachineSDNodes in the DAG are available on SPE. I think this can simply be done by checking the operand/result register classes to make sure you don't end up with any F4RC or F8RC registers. Of course, this check would only fire on SPE subtargets.
  • Is there a Clang portion to follow? To define builtins for all the intrinsics in C/C++. Or do you not plan to target SPE from C/C++?
lib/Target/PowerPC/MCTargetDesc/PPCPredicates.h
52 ↗(On Diff #128842)

Just a comment stating something like SPE instructions always set the GT bit for comparisons.

lib/Target/PowerPC/PPCCallingConv.td
255

Please add a comment to make it clearer here that we're splitting the CSR list into GPR and FPR since SPE targets don't use FPR's and therefore can't have them in the CSR list.

lib/Target/PowerPC/PPCFastISel.cpp
160

If you've changed all the call sites, I think it'd be good to remove the default arg.

1094

Shouldn't there be a user for this result? I think at this point, we use updateValueMap() to map the instruction to the vreg we are defining with the newly emitted instruction.

lib/Target/PowerPC/PPCFrameLowering.cpp
1752

Seems that we would have to handle SPE registers here, wouldn't we? I don't think we can rely on the ordering of SPE registers with VR registers in order to determine the lowest numbered register that needs to be spilled (and therefore how large the spill area is).

For the GPRC/SPE4RC class, we should be safe since they are classes that have the same registers in them, but SPERC/VRRC do not have the same registers.

1890

Do the SPE register spills require 16-byte alignment? It may very well be so, but I'd prefer that this be explicit in a comment.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2478

Why not in a single if statement with an or condition? In any case, the else is redundant.

2508

Nit: unnecessary parentheses.

3051

Not sure if anyone will object to this, but I find that even though the else/else if binds to the right if in situations like this, I prefer to have braces around the block for the outer if.

When reading this, I find it more clear with the braces when the block contains more than 2-3 lines.

3066

Do you think it would make sense to assert here that you have a type that SPE can handle? I know legalization should clean this up, but I think it's a good way to catch any missed legalization here if any bleeds through.

I probably would have done the same thing for the fallback path to PPC::SELECT_CC_VRRC there too, so I think it's no worse for you to leave this as the fallback for SPE.

lib/Target/PowerPC/PPCInstrInfo.cpp
279

Why are PPC::EVLDD and PPC::SPELWZ not here?

986

Can we actually get here? Won't the registers be in GPRCRegClass and be caught above?

lib/Target/PowerPC/PPCRegisterInfo.td
225

Maybe just a quick comment as to what SPEACC is and why it doesn't need to be encoded.

  • Considering we have a mutually exclusive situation between FPU and SPE and the sheer prevalence of FPU cores and developers adding code for those, it would probably be a good idea to add some checks that we don't break SPE. What I'm thinking would be a good idea is a linear traversal in SDAG post-processing that would ensure that all the MachineSDNodes in the DAG are available on SPE. I think this can simply be done by checking the operand/result register classes to make sure you don't end up with any F4RC or F8RC registers. Of course, this check would only fire on SPE subtargets.

Of course, this won't catch late additions such as regclass copies and MachineInstr's we add in various MI-level passes and late pseudo-expansion, so perhaps the better choice would be to catch this in EmitInstruction by getting the MCInstrDesc for the MI and checking the register class there.

chmeee marked 8 inline comments as done.Feb 11 2018, 2:49 PM
chmeee added inline comments.
lib/Target/PowerPC/PPCFastISel.cpp
160

Not all have been changed (line 1172 lacks it).

lib/Target/PowerPC/PPCFrameLowering.cpp
1752

You're right. It's better to split it up. Even though the code itself is identical between the two, it's better to be more explicit of the purpose of the block than to be clever to save the code, particularly considering, as you said, we shouldn't rely on ordering of the register definitions.

1890

Yes, it does require 16-byte alignment. But, as you pointed out above, it's better to split them up to be more explicit.

lib/Target/PowerPC/PPCInstrInfo.cpp
279

Oversight.

chmeee marked 4 inline comments as done.Feb 11 2018, 2:58 PM

Is there a Clang portion to follow? To define builtins for all the intrinsics in C/C++. Or do you not plan to target SPE from C/C++?

Yes, there is a Clang portion to follow, and without the intrinsics it's a trivial patch. I'm waiting on this to be completed and committed before I post that, in case things have to change for it. The trivial no-intrinsics part is done and works. It merges -mspe and -mabi=spe of GCC into a single -mspe, since I'm not splitting the codegen into code and ABI here.

I have a couple of questions (along with a nitpicky inline comment).

The e500 docs from Freescale specify e500v1 which supports basically everything added except for the efd* instructions and the various vector insts; e500v2 is required for efd* inst. / double support. They also have different scheduling models. It looks you're just implementing e500v2, should this be specified at all, or is the e500v1 not needed as a target?

Probably less important: The same manual also mentions that IEEE 754 compliance isn't full in hardware with respect to denorms; Is this handled at the OS level or does some fp exception handler wind up getting linked in for full support?

lib/Target/PowerPC/PPCFastISel.cpp
483

Personally this ternary is (and nearly already was) complicated enough I'd rework it into an if statement. It's not unreadable but takes longer to parse than necessary.

I have a couple of questions (along with a nitpicky inline comment).

The e500 docs from Freescale specify e500v1 which supports basically everything added except for the efd* instructions and the various vector insts; e500v2 is required for efd* inst. / double support. They also have different scheduling models. It looks you're just implementing e500v2, should this be specified at all, or is the e500v1 not needed as a target?

Yes, this is intended only for e500v2, not e500v1. I can't find any current SoC that uses the e500v1 core anywhere, so didn't put much thought into splitting that part out (it could probably be done trivially, just not adding the SPERCRegisterClass in the PPCISelLowering bit). Do you know of any current SoC with e500v1? Additionally, this is to target the powerpcspe target for FreeBSD, which is modeled after the powerpcspe target for Debian, which itself targets e500v2.

Probably less important: The same manual also mentions that IEEE 754 compliance isn't full in hardware with respect to denorms; Is this handled at the OS level or does some fp exception handler wind up getting linked in for full support?

This is expected to be handled at the OS level.

I just did a search on NXP's website, and it appears the last e500v1-based chip went into production around 2001/2002, and had a Longevity-program lifetime of 10 years (MPC8560), superseded by rev 2.0 of the silicon, which used e500v2. So it appears there are no current e500v1-based SoCs in production at this time. e500v2-based SoCs appear to have another 8 years or so, as the latest one went into the longevity program in 2010, with a 15 year product lifetime.

A few more comments inline and a couple of general comments.

  1. Due to the size of this patch, a thorough review is difficult and time-consuming. Furthermore, it is not feasible that anyone will review in detail whether the tests cover all instructions, all intrinsics. I think much of this support would have been upstream if it were done in small incremental patches.
  2. I am still not thrilled that FP ops in general now need a predicate defined in the .td file. There are a couple of issues with doing that. The obvious one is that anyone adding patterns in the future now has to be careful to set the predicate. Of course, this isn't all that likely to happen a lot since any new ISA implementations will have to be guarded by their own predicates. So the problem only really exists if someone is adding new patterns that are to work with FPU instructions on all the CPU's that have it. The second issue with this is a bit more subtle. Namely something like this:
let Predicates = [ PredicateOne ] in {
  // Many many lines of code ...
  let Predicates = [ PredicateTwo ] in {
    // More code
  }
}

If I recall correctly, // More code actually only requires PredicateTwo to be satisfied.

  1. I did not review the instruction definitions or the scheduling info since I don't know anything about SPE and there's a whole lot of code.
  2. I will apply your next update to ToT and try to do some extensive testing on PPC64LE and PPC64BE in case this patch causes unforeseen failures.
lib/Target/PowerPC/PPCFastISel.cpp
813

Is it an option to just update getComparePred() to do the right thing and then not need this conditional op?

lib/Target/PowerPC/PPCISelLowering.cpp
120

I am hoping for us to clean this section up to make it more readable in the future. Namely, I would like for us to minimize the nesting based on target features. As such, is it possible for you to add things that change as a result of the subtarget having SPE into a single section. Perhaps a section in the end that will reset everything that needs to be reset with SPE.

Of course, this may be somewhat controversial, so if you or others don't agree, I suppose it can stay this way until we actually get to the clean-up effort.

346

Sorry, this is a big patch so it's hard to keep track of things. Does SPE have these conversions for the vector types? In any case, in places where you're adding legalization actions for SPE, if v2f32 and v2f32 don't need to be mentioned, perhaps just mention it in a comment.

lib/Target/PowerPC/PPCSubtarget.cpp
147 ↗(On Diff #128842)

So -mcpu=pwr7 -mattr=-fpu will not actually disable FPU. I don't think that's a problem. I'm thinking more along the lines of whether this affects soft-float support, but I've never really looked all that closely into soft-float.

I will attempt to break this up into (slightly) smaller chunks. My current thought is:

  • New instructions, and encoding tests
  • e500 scheduling
  • floating point and ABI
  • intrinsics
  • vector

Each will build upon the previous, but it should be smaller and easier to review. Does this seem reasonable?

Apologies long after the fact, I did intend to start this as only adding the instructions and floating point bits, but it grew as it sat waiting for review in the initial stages.

Is this patch ready to be abandoned now that it has been broken up into smaller patches which were committed?

jhibbits commandeered this revision.Aug 14 2018, 10:38 AM
jhibbits added a reviewer: chmeee.

Yes, yes it is

jhibbits abandoned this revision.Aug 14 2018, 10:39 AM