Compilers at a fruit company
- User Since
- Sep 9 2013, 3:45 AM (271 w, 2 d)
Thu, Nov 15
So these extends are created by the legalizer, correct? If so, it seems ok to me as that's what these artifact combines were intended for.
Mon, Nov 5
Added comment for G_BUILD_VECTOR_TRUNC and tests for the verifier. I haven't added all the different combinations because we can only write one test per file, so I tested the most useful of the verifier errors.
Updated patch to introduce a separate G_BUILD_VECTOR_TRUNC opcode.
Fri, Nov 2
As long as <4 x s32> = the G_BUILD_VECTOR could be legalized I think the worst that will happen is that we just get poor codegen.
Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
I.e., maybea(s8) = ...
is not legal buta(s8) = ... b(<4 x s8>) = build_vector a(s8)
I think the issue of sequences becoming illegal because of things like copies being introduced is still a deal breaker for that, and it would also introduce a code motion barrier since we can't select sequences spanning blocks. E.g. it wouldn't be safe to move the insn in the predecessor due to other uses.
This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....
I've had a think about what having separate type and container sizes would actually mean. For brevity, let’s call values which have distinct type and container sizes “partial values”. A partial value s8:s32 means that the upper 24 bits of this 32 bit register is always undef. For some operations, it should always be possible to ignore the type size and just use the container size for selection. E.g. non-flag setting G_ADD could select a 32 bit add and satisfy the semantics of an s8 add. But for other operations like shifts, or count-leading-zeros, we can't use a wider operation type, so the entire chain of partial value operations would be restricted to selecting the narrower type, rather than the container. And at that point, the container type may as well not be there. There may be a way around this issue I haven't seen though.
Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.
Sure. Given that we already have this implicit truncating behavior on account of G_MERGE being an omnipotent operation, should we allow the implicit truncation with G_BUILD_VECTOR for now? If we have problems with it later we can revisit the idea of adding a separate opcode/something else.
Tue, Oct 30
LGTM with the nits addressed.
Mon, Oct 29
Updated AArch64 legalization. I've removed the unsupported case since we're actually making it valid.
Updated patch to allow G_BUILD_VECTOR to have scalar operands not identical to the result vector element types. This allows implicit truncation.
Fri, Oct 26
Thu, Oct 25
Committed in r345265.
Wed, Oct 24
+ @huntergr On a tangential note, G_BUILD_VECTOR for expressing broadcasts of scalars won't work for SVE and other variable vector width targets. We do however still leave the door open here for a new G_BROADCAST opcode in future.
Tue, Oct 23
Oct 15 2018
Thanks for working on this. Apart from Daniel's suggestion, I have a few more comments but haven't done an in depth analysis of the memory costs/algorithmic properties.
Oct 11 2018
Oct 9 2018
Oct 1 2018
Sep 20 2018
@aditya_nandakumar I've added a target hook that defaults to enabling this.
This seems like something that most targets would want done by default. I wonder if we should have a standard set of legalizer actions that targets can use. LGTM anyway.
Sep 17 2018
This isn't really ideal because of how trivial it is currently. The issue is that if we added the capability to InstSimplify (which is where it would fit most naturally), we'd still end up with the problem of running the whole of InstSimplify in the GlobalISel pipeline. It's reasonable for now I think so I'll commit with the changes requested.
Looks ok, if a bit cumbersome with the additional state (do we need to hold references to it?). I'd like to see how this state is used with the CSE builder first though.
Sep 13 2018
Ok, I'm surprised that would be a realistic scenario but I'll make the change. Thanks.
Sep 12 2018
You're right, that's a simplified expression.
Sep 11 2018
Sep 8 2018
Sep 5 2018
Sep 4 2018
Ok, to me it seems like instructions for which fast math flags can't apply shouldn't be classed as FPMathOperators. insertelement/extractelement are just moving raw bits. If we disallowed those instruction types in the FPMathOperator classof() then this problem should go away, and we don't incorrectly relax FP semantics like in the case I pointed out.
Aug 31 2018
I’m not seeing how FP instructions can not carry flags. The IR semantics are defined to be strict FP unless explicitly relaxed.
Hit submit too early:
So with the above code B will not have the flag cleared? If that's the case it doesn't seem right to me.
So with this change, if you have:
Aug 30 2018
So overall I think we should have some form of IR canonicalization before translation for the majority of cases to be handled for -O0.
It can't do that all the time, with these G_ICMPs for example it also has to swap the predicate (not that we even have imported patterns that could possibly match for AArch64).
Aug 29 2018
I'm not against a separate IR canonicalization stage, even though it's a bit overkill for this particular case. At the moment it looks like InstCombine is doing the canonicalizaton for this case, so re-running that is out of the question. A new pass looks to be on the cards. Anyone else have opinions on this?
Aug 28 2018
I'd like to get this in. LGTM but needs one issue addressed.
x86 is still using UADDE, and it's generated only by IRTranslator, so if you do this then x86 will lose support for compiling llvm.uadd.with.overflow intrinsics. @igorb what do you think?
Aug 21 2018
Looks like Lang added the original code, but LGTM anyway. I even think we did this exact change downstream a few years ago at ARM...
Aug 15 2018
Aug 14 2018
I'll take over and fix this in the way I suggested. Thanks for reporting this!
Aug 10 2018
I'm a little uneasy about specifying libm in the names of the intrinsics. The langref mentions libm in the description, but the semantics don't exactly match due to ignoring errno. That said, I don't really have a suggestion for a better name, so the only thing I can ask is that we make it a bit more explicit in the documentation what we mean by "LIBM" here.
Aug 8 2018
Aug 7 2018
Thinking about this more, I can't think of any reason why we'd ever want to add offsets to an existing offset list. It would probably be better if valueIsSpit cleared the vector (and the doc comment made this clear).
Could you upload with more context please, and a test case?
Jul 31 2018
Jul 30 2018
Jul 29 2018
Thanks for taking this on. I think some x86 tests would also be good here as there evidently isn't much coverage at the moment.
Jul 26 2018
Jul 25 2018
Jul 3 2018
Jul 2 2018
Jun 24 2018
Jun 20 2018
Hi Daniel, sorry for the delay.