Page MenuHomePhabricator

[CodeGen] Have CodeGen for fixed-point unsigned with padding emit signed operations.
Needs ReviewPublic

Authored by ebevhan on Jun 26 2020, 9:13 AM.

Details

Summary

The design of unsigned fixed-point with padding did not really
work as originally intended downstream.

The issue with the design is that the concept of the unsigned
padding bit disappears in the transition to IR. On the LLVM
level, there is no padding bit and anything goes with the
operations. This has the unfortunate effect of generating
invalid operations during ISel for operations that a target
should be perfectly capable of selecting for.

For example, for an unsigned saturating _Fract division of
width 16, we emit IR for an i15 udiv.fix.sat. In the legalization
of this operation in ISel, the operand and result are promoted
to i16, and to preserve the saturating behavior, the LHS is
shifted left by 1.

However... This means that we now have a division operation
with a significant value in the LHS MSB. If the target could
select this, there would be no meaning to the padding bit.
Considering that ISel will always promote this due to type
illegality, there's no way around the production of illegal
operations.

This patch changes CodeGen to emit signed operations when
emitting code for unsigned with padding. At least for us
downstream, being able to reuse the signed instructions is
the one of the points of having the padding bit, so this
design seems to align better.

Diff Detail

Unit TestsFailed

TimeTest
7,550 mslinux > libomp.env::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/env/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && env KMP_DISP_NUM_BUFFERS=0 /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp
1,320 mslinux > libomp.worksharing/for::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp 7

Event Timeline

ebevhan created this revision.Jun 26 2020, 9:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2020, 9:13 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ebevhan updated this revision to Diff 273762.Jun 26 2020, 9:46 AM

Fixed some broken CHECK lines.

Why not legalize to the signed operation?

Why not legalize to the signed operation?

My feeling was that it wasn't right do so in LLVM, because LLVM has no notion of the padding bit and therefore doesn't really care about Clang's rationale for doing the legalization that way.
If an illegal udiv.fix.sat needs to be legalized via promotion, it makes more sense to legalize it to another unsigned operation rather than arbitrarily doing it as a signed one.

I guess it could be done in cases where the resulting signed operation was legal and the unsigned one was not, but that isn't testable upstream since none of the operations are legal on upstream targets.

Another point is, if we for example have an i16 umul.fix in legalization, we have no way of knowing in the general case that it is safe to replace it with an smul.fix, since the information that the MSB is not significant does not exist on IR level. This is the 'loss of padding bit' that I'm referring to.

Can the missing bit just be added? It seems to me that frontends ought to be able to emit the obvious intrinsic for the semantic operation here rather than having to second-guess the backend.

Well, it's not so much as adding the bit, but adding the information that the bit exists. That means either new intrinsics for all of the operations, or adding flags to the existing ones. That's a fair bit of added complexity. Also, <signed operation> + <clamp to zero> would do virtually the exact same thing as the new unsigned-with-padding operations, so the utility of adding all of it is a bit questionable.

ebevhan updated this revision to Diff 275657.Jul 6 2020, 4:29 AM

Rebased.

Well, it's not so much as adding the bit, but adding the information that the bit exists. That means either new intrinsics for all of the operations, or adding flags to the existing ones. That's a fair bit of added complexity. Also, <signed operation> + <clamp to zero> would do virtually the exact same thing as the new unsigned-with-padding operations, so the utility of adding all of it is a bit questionable.

Could the work involved just be adding the flags, then in llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp for unsigned operations, choosing between the signed/unsigned underlying ISD when lowering intrinsics to DAG? I think you could just pass the padding bit to FixedPointIntrinsicToOpcode and handle it from there. This is just off the top of my head so I could be missing other things.

I don't think this is necessarily the same as "legalizing" the intrinsic, but this would at least prevent frontends from second-guessing.

clang/include/clang/Basic/FixedPoint.h
66

Is -> If

clang/lib/Basic/FixedPoint.cpp
143–158

If this is exclusively for codegen purposes with binary operations, would it be clearer to move this to EmitFixedPointBinOp? If UnsignedPaddingIsSigned doesn't need to be used for stuff like constant evaluation, it might be clearer not to provide it for everyone.

clang/test/Frontend/fixed_point_add.c
290–294

If this is a workaround for not being able to convey the padding bit to LLVM intrinsics, I think we should only limit changes to instances we would use intrinsics.

Well, it's not so much as adding the bit, but adding the information that the bit exists. That means either new intrinsics for all of the operations, or adding flags to the existing ones. That's a fair bit of added complexity. Also, <signed operation> + <clamp to zero> would do virtually the exact same thing as the new unsigned-with-padding operations, so the utility of adding all of it is a bit questionable.

Could the work involved just be adding the flags, then in llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp for unsigned operations, choosing between the signed/unsigned underlying ISD when lowering intrinsics to DAG? I think you could just pass the padding bit to FixedPointIntrinsicToOpcode and handle it from there. This is just off the top of my head so I could be missing other things.

It wouldn't just be restricted to fixed-point intrinsics, though. It would have to be added to intrinsics like uadd.sat and usub.sat as well, which aren't really tied to fixed-point at all. Changing the semantics of those intrinsics would be unfortunate for targets that have started using them for their own instructions. I don't really think it's an option to move the padding semantic info into the IR; the intrinsic interface is fairly lean, and I think keeping it that way is a good idea.

I could change the emission to not be so heavy-handed and only use signed operations for the intrinsics, rather than everything. That makes EmitFixedPointBinOp a bit messier, though.

clang/lib/Basic/FixedPoint.cpp
143–158

FixedPointSemantics is immutable except for saturation, unfortunately. I'd end up having to reconstruct the semantics object from scratch immediately after calling getCommonSemantics.

clang/test/Frontend/fixed_point_add.c
290–294

I suppose this makes sense, but the logic will be a bit more convoluted in that case.

It is true that in most cases, the clamp-to-zero resulting from the signed->unsigned conversion at the end isn't even necessary. For addition, multiplication, division and shift, the result of positive operands can never become negative, so there's no point to the clamp.

It just felt more general to do it for all of them instead of littering EmitFixedPointBinOp with special cases. But perhaps it would be better to deal with each case individually instead. Still feels like that would make the implementation less clean.

It wouldn't just be restricted to fixed-point intrinsics, though. It would have to be added to intrinsics like uadd.sat and usub.sat as well, which aren't really tied to fixed-point at all.

Oh wait, sorry. I think I'm starting to understand now. You're saying that if you're using the padding bit in the first place, ISel shouldn't need to perform the underlying shift during integral promotions, but we do it anyway still. Yeah it seems a lot of this could be addressed simply by just using the corresponding signed intrinsics.

I guess I'd be ok with purely making this a clang change for now, but if other frontends see interest in the unsigned padding bit then we could migrate this to LLVM down the line.

clang/lib/Basic/FixedPoint.cpp
143–158

Fair.

Minor nit: Could you rename the parameter to UnsignedPaddingIsSignedForCG? Just want to make it clearer that this should specifically be used for codegen only.

Would it be sensible to use a technical design more like what the matrix folks are doing, where LLVM provides a small interface for emitting operations with various semantics? FixedPointSemantics would move to that header, and Clang would just call into it. That way you get a lot more flexibility in how you generate code, and the Clang IRGen logic is still transparently correct. If you want to add intrinsics or otherwise change the IR patterns used for various operations, you don't have to rewrite a bunch of Clang IRGen logic every time, you just have to update the tests. It'd then be pretty straightforward to have internal helper functions in that interface for computing things like whether you should use signed or unsigned intrinsics given the desired FixedPointSemantics.

My interest here is mainly in (1) keeping IRGen's logic as obviously correct as possible, (2) not hard-coding a bunch of things that really feel like workarounds for backend limitations, and (3) not complicating core abstractions like FixedPointSemantics with unnecessary extra rules for appropriate use, like having to pass an extra "for codegen" flag to get optimal codegen. If IRGen can just pass down the high-level semantics it wants to some library that will make intelligent decisions about how to emit IR, that seems best.

Would it be sensible to use a technical design more like what the matrix folks are doing, where LLVM provides a small interface for emitting operations with various semantics? FixedPointSemantics would move to that header, and Clang would just call into it. That way you get a lot more flexibility in how you generate code, and the Clang IRGen logic is still transparently correct. If you want to add intrinsics or otherwise change the IR patterns used for various operations, you don't have to rewrite a bunch of Clang IRGen logic every time, you just have to update the tests. It'd then be pretty straightforward to have internal helper functions in that interface for computing things like whether you should use signed or unsigned intrinsics given the desired FixedPointSemantics.

This seems like a reasonable thing to do for other reasons as well. Also moving the actual APFixedPoint class to LLVM would make it easier to reuse the fixedpoint calculation code for constant folding in LLVM, for example.

My interest here is mainly in (1) keeping IRGen's logic as obviously correct as possible, (2) not hard-coding a bunch of things that really feel like workarounds for backend limitations, and (3) not complicating core abstractions like FixedPointSemantics with unnecessary extra rules for appropriate use, like having to pass an extra "for codegen" flag to get optimal codegen. If IRGen can just pass down the high-level semantics it wants to some library that will make intelligent decisions about how to emit IR, that seems best.

Just to clarify something here; would the interface in LLVM still emit signed operations for unsigned with padding? If so, why does dealing with the padding bit detail in LLVM rather than Clang make more sense? The regular IRBuilder is relatively straightforward in its behavior. I suspect that if anything, LLVM would be equally unwilling to take to take IRBuilder patches that emitted signed intrinsics for certain unsigned operations only due to a detail in Embedded-C's implementation of fixedpoint support.

I could remove the special behavior from FixedPointSemantics and only deal with it in EmitFixedPointBinOp instead. I agree that the FixedPointSemantics interface is muddied by the extra parameter.
Unless I alter the semantics object it might make EmitFixedPointBinOp rather messy, though.


Regarding backend limitations, I guess I could propose an alternate solution. If we change FixedPointSemantics to strip the padding bit for both saturating and nonsaturating operations, it may be possible to detect in isel that the corresponding signed operation could be used instead when we promote the type of an unsigned one. For example, if we emit i15 umul.fix scale 15, we could tell in lowering that i16 smul.fix scale 15 is legal and use that instead. Same for all the other intrinsics, including the non-fixedpoint uadd.sat/usub.sat.

The issue with this approach (which is why I didn't really want to do it) is that it's not testable. No upstream target has these intrinsics marked as legal. I doubt anyone would accept a patch with no tests.
It may also be less efficient than just emitting the signed operations in the first place, because we are forced to trunc and zext in IR before and after every operation.

Would it be sensible to use a technical design more like what the matrix folks are doing, where LLVM provides a small interface for emitting operations with various semantics? FixedPointSemantics would move to that header, and Clang would just call into it. That way you get a lot more flexibility in how you generate code, and the Clang IRGen logic is still transparently correct. If you want to add intrinsics or otherwise change the IR patterns used for various operations, you don't have to rewrite a bunch of Clang IRGen logic every time, you just have to update the tests. It'd then be pretty straightforward to have internal helper functions in that interface for computing things like whether you should use signed or unsigned intrinsics given the desired FixedPointSemantics.

This seems like a reasonable thing to do for other reasons as well. Also moving the actual APFixedPoint class to LLVM would make it easier to reuse the fixedpoint calculation code for constant folding in LLVM, for example.

Just to say "I told you so", I'm pretty sure I told people this would happen. :)

My interest here is mainly in (1) keeping IRGen's logic as obviously correct as possible, (2) not hard-coding a bunch of things that really feel like workarounds for backend limitations, and (3) not complicating core abstractions like FixedPointSemantics with unnecessary extra rules for appropriate use, like having to pass an extra "for codegen" flag to get optimal codegen. If IRGen can just pass down the high-level semantics it wants to some library that will make intelligent decisions about how to emit IR, that seems best.

Just to clarify something here; would the interface in LLVM still emit signed operations for unsigned with padding?

If that's the best IR pattern to emit, yes.

If so, why does dealing with the padding bit detail in LLVM rather than Clang make more sense?

Because frontends should be able to just say "I have a value of a type with these semantics, I need you to do these operations, go do them". The whole purpose of this interface would be to go down a level of abstraction by picking the best IR to represent those operations.

Maybe we're not in agreement about what this interface looks like — I'm imagining something like

struct FixedPointEmitter {
  IRBuilder &B;
  FixedPointEmitter(IRBuilder &B) : B(B) {}

  Value *convert(Value *src, FixedPointSemantics srcSemantics, FixedPointSemantics destSemantics);
  Value *add(Value *lhs, FixedPointSemantics lhsSemantics, Value *rhs, FixedPointSemantics rhsSemantics)
};

The regular IRBuilder is relatively straightforward in its behavior. I suspect that if anything, LLVM would be equally unwilling to take to take IRBuilder patches that emitted signed intrinsics for certain unsigned operations only due to a detail in Embedded-C's implementation of fixedpoint support.

Most things in IRBuilder don't have variant representations beyond what's expressed by the value type. The fact that we've chosen to do so here necessitates a more complex interface.

Regarding backend limitations, I guess I could propose an alternate solution. If we change FixedPointSemantics to strip the padding bit for both saturating and nonsaturating operations, it may be possible to detect in isel that the corresponding signed operation could be used instead when we promote the type of an unsigned one. For example, if we emit i15 umul.fix scale 15, we could tell in lowering that i16 smul.fix scale 15 is legal and use that instead. Same for all the other intrinsics, including the non-fixedpoint uadd.sat/usub.sat.

The issue with this approach (which is why I didn't really want to do it) is that it's not testable. No upstream target has these intrinsics marked as legal. I doubt anyone would accept a patch with no tests.
It may also be less efficient than just emitting the signed operations in the first place, because we are forced to trunc and zext in IR before and after every operation.

I don't want to tell you the best IR to use to get good code; I just want frontends to have a reasonably canonical interface to use that matches up well with the information we have.

Would it be sensible to use a technical design more like what the matrix folks are doing, where LLVM provides a small interface for emitting operations with various semantics? FixedPointSemantics would move to that header, and Clang would just call into it. That way you get a lot more flexibility in how you generate code, and the Clang IRGen logic is still transparently correct. If you want to add intrinsics or otherwise change the IR patterns used for various operations, you don't have to rewrite a bunch of Clang IRGen logic every time, you just have to update the tests. It'd then be pretty straightforward to have internal helper functions in that interface for computing things like whether you should use signed or unsigned intrinsics given the desired FixedPointSemantics.

This seems like a reasonable thing to do for other reasons as well. Also moving the actual APFixedPoint class to LLVM would make it easier to reuse the fixedpoint calculation code for constant folding in LLVM, for example.

Just to say "I told you so", I'm pretty sure I told people this would happen. :)

Well, transferring the fixed point concept over to LLVM felt like it would happen sooner or later, for the reasons we've discussed here as well as for other reasons. I'm not sure that the discrepancies between the Clang and LLVM semantics were predicted to be the driving factor behind the move, though.

My interest here is mainly in (1) keeping IRGen's logic as obviously correct as possible, (2) not hard-coding a bunch of things that really feel like workarounds for backend limitations, and (3) not complicating core abstractions like FixedPointSemantics with unnecessary extra rules for appropriate use, like having to pass an extra "for codegen" flag to get optimal codegen. If IRGen can just pass down the high-level semantics it wants to some library that will make intelligent decisions about how to emit IR, that seems best.

Just to clarify something here; would the interface in LLVM still emit signed operations for unsigned with padding?

If that's the best IR pattern to emit, yes.

If so, why does dealing with the padding bit detail in LLVM rather than Clang make more sense?

Because frontends should be able to just say "I have a value of a type with these semantics, I need you to do these operations, go do them". The whole purpose of this interface would be to go down a level of abstraction by picking the best IR to represent those operations.

Maybe we're not in agreement about what this interface looks like — I'm imagining something like

struct FixedPointEmitter {
  IRBuilder &B;
  FixedPointEmitter(IRBuilder &B) : B(B) {}

  Value *convert(Value *src, FixedPointSemantics srcSemantics, FixedPointSemantics destSemantics);
  Value *add(Value *lhs, FixedPointSemantics lhsSemantics, Value *rhs, FixedPointSemantics rhsSemantics)
};

I've spent some time going over this and trying to figure out how it would work. I think the interface seems fine on the surface, but I don't see how it directly solves the issues at hand. Regardless of whether this is factored out to LLVM, we still have the issue that we have to massage the semantic somewhere in order to get different behavior for certain kinds of semantics during binop codegen.

Since the binop functions take two different semantics, it must perform conversions internally to get the values to match up before the operation. This would probably just be to the common semantic between the two, and it would then return the Value in the common semantic (since we don't know what to convert back to).

In order for the binop functions to have special behavior for padded unsigned, they would need to modify the common semantic internally in order to get the conversion right. This means that the semantic of the returned Value will not be what you would normally get from getCommonSemantic, so the caller of the function will have no idea what the semantic of the returned value is.

Even if we only treat it as an internal detail of the binop functions and never expose this 'modified' semantic externally, this means we might end up with superfluous operations since (for padded saturating unsigned) we will be forced to trunc the result by one bit to match the real common semantic before we return.

The only solution I can think of is to also return the semantic of the result Value, which feels like it makes the interface pretty bulky.

I can start off by moving APFixedPoint and FixedPointSemantic to ADT, though. Perhaps I should send an RFC.

Would it be sensible to use a technical design more like what the matrix folks are doing, where LLVM provides a small interface for emitting operations with various semantics? FixedPointSemantics would move to that header, and Clang would just call into it. That way you get a lot more flexibility in how you generate code, and the Clang IRGen logic is still transparently correct. If you want to add intrinsics or otherwise change the IR patterns used for various operations, you don't have to rewrite a bunch of Clang IRGen logic every time, you just have to update the tests. It'd then be pretty straightforward to have internal helper functions in that interface for computing things like whether you should use signed or unsigned intrinsics given the desired FixedPointSemantics.

This seems like a reasonable thing to do for other reasons as well. Also moving the actual APFixedPoint class to LLVM would make it easier to reuse the fixedpoint calculation code for constant folding in LLVM, for example.

Just to say "I told you so", I'm pretty sure I told people this would happen. :)

Well, transferring the fixed point concept over to LLVM felt like it would happen sooner or later, for the reasons we've discussed here as well as for other reasons. I'm not sure that the discrepancies between the Clang and LLVM semantics were predicted to be the driving factor behind the move, though.

My interest here is mainly in (1) keeping IRGen's logic as obviously correct as possible, (2) not hard-coding a bunch of things that really feel like workarounds for backend limitations, and (3) not complicating core abstractions like FixedPointSemantics with unnecessary extra rules for appropriate use, like having to pass an extra "for codegen" flag to get optimal codegen. If IRGen can just pass down the high-level semantics it wants to some library that will make intelligent decisions about how to emit IR, that seems best.

Just to clarify something here; would the interface in LLVM still emit signed operations for unsigned with padding?

If that's the best IR pattern to emit, yes.

If so, why does dealing with the padding bit detail in LLVM rather than Clang make more sense?

Because frontends should be able to just say "I have a value of a type with these semantics, I need you to do these operations, go do them". The whole purpose of this interface would be to go down a level of abstraction by picking the best IR to represent those operations.

Maybe we're not in agreement about what this interface looks like — I'm imagining something like

struct FixedPointEmitter {
  IRBuilder &B;
  FixedPointEmitter(IRBuilder &B) : B(B) {}

  Value *convert(Value *src, FixedPointSemantics srcSemantics, FixedPointSemantics destSemantics);
  Value *add(Value *lhs, FixedPointSemantics lhsSemantics, Value *rhs, FixedPointSemantics rhsSemantics)
};

I've spent some time going over this and trying to figure out how it would work. I think the interface seems fine on the surface, but I don't see how it directly solves the issues at hand. Regardless of whether this is factored out to LLVM, we still have the issue that we have to massage the semantic somewhere in order to get different behavior for certain kinds of semantics during binop codegen.

Since the binop functions take two different semantics, it must perform conversions internally to get the values to match up before the operation. This would probably just be to the common semantic between the two, and it would then return the Value in the common semantic (since we don't know what to convert back to).

In order for the binop functions to have special behavior for padded unsigned, they would need to modify the common semantic internally in order to get the conversion right. This means that the semantic of the returned Value will not be what you would normally get from getCommonSemantic, so the caller of the function will have no idea what the semantic of the returned value is.

Even if we only treat it as an internal detail of the binop functions and never expose this 'modified' semantic externally, this means we might end up with superfluous operations since (for padded saturating unsigned) we will be forced to trunc the result by one bit to match the real common semantic before we return.

The only solution I can think of is to also return the semantic of the result Value, which feels like it makes the interface pretty bulky.

I don't understand. The problem statement as I understood it is that using unsigned intrinsics to do unsigned-with-padding operations is leading to poor code-gen, so you want to start using signed intrinsics, which you can safely do because unsigned-with-padding types are intended to be exactly signed types with a dynamic range restriction to non-negative values. The result of that operation is still logically an unsigned-with-padding value; there's no need to return back a modified semantic that says that the result is really a signed value because it's *not* really a signed value, you're just computing it a different way. I also don't understand why you think need a modified semantics value in the first place as opposed to just using a more complex condition when deciding which intrinsics to use.

ebevhan added a comment.EditedThu, Jul 16, 4:54 AM

I don't understand. The problem statement as I understood it is that using unsigned intrinsics to do unsigned-with-padding operations is leading to poor code-gen, so you want to start using signed intrinsics, which you can safely do because unsigned-with-padding types are intended to be exactly signed types with a dynamic range restriction to non-negative values. The result of that operation is still logically an unsigned-with-padding value; there's no need to return back a modified semantic that says that the result is really a signed value because it's *not* really a signed value, you're just computing it a different way. I also don't understand why you think need a modified semantics value in the first place as opposed to just using a more complex condition when deciding which intrinsics to use.

Well, it's not just about intrinsics. For saturating operations, the common semantic width is currently narrowed by one bit to get the correct saturating behavior on the operation afterwards. This affects the conversion to the common semantic, since the type will be one bit narrower.

But I realize now that it probably doesn't matter if we simply don't do that narrowing when constructing the common semantic. That way it will always have the right width and we can just do as you say and select the right intrinsic based on the padding bit information. Alright, that should probably work.

EDIT: Ah, but this breaks constant evaluation, which expects the common semantic to be narrower to get the right saturation behavior. So then these special cases for codegen also need to be added to the constant evaluation.