User Details
- User Since
- Jan 23 2015, 9:38 AM (312 w, 1 d)
Thu, Jan 14
Test case is missing. Please add it.
LGTM.
I think it would be useful to run a functional test with a toolchain that has these library functions. That shouldn't gate this change though.
Added Eric as the owner of debug info.
Added Eric as the owner of debug info.
Wed, Jan 13
This is still incorrect. The indices for the hi/low words are backwards. You can easily demonstrate this with a test case such as:
a.c: vector double test() { return (vector double)4.23422e12; }
Thanks for addressing my comments and I'm sorry I didn't get a chance to re-review prior to commit. Looks good now.
If all the values are in GPR's, the code produced with this patch:
mtvsrdd 34, 4, 3 mtvsrdd 35, 6, 5 vpkudum 2, 3, 2 mtvsrdd 35, 8, 7 mtvsrdd 36, 10, 9 vpkudum 3, 4, 3 vpkuwum 2, 3, 2
is certainly better than the naive code we currently produce. But I don't think we should be doing the merging/packing in the vector domain because (at least on P9) we get half the dispatch width and the permute operations potentially have a higher latency. Furthermore, there is a potential of increasing vector register pressure with this approach which is probably not ideal. I think that for the basic case (where all values are in GPR's) we should simply add a pattern in the .td file that does something like this (similar to what we did for the wider elements):
rlwimi 3, 4, ... # merge r3 and r4 rlwimi 5, 6, ... # merge r5 and r6 rlwimi 7, 8, ... # merge r7 and r8 rlwimi 9, 10, ... # merge r9 and r10 rldimi 3, 5, ... # merge r3, r4, r5, r6 rldimi 7, 9, ... # merge r7, r8, r9, r10 mtvsrdd 34, 3, 7
For 32-bit mode, we can't really do the merging to doublewords in GPR's but I think they can be moved to VSR's after the word merges and then merged with a single vpkuwum.
Tue, Jan 12
Please mention the commit that added this support initially (and maybe to one or two other targets) to the commit message.
LGTM for PPC. Thanks for this improvement.
Mon, Jan 11
@abebeos Unfortunately there is no method for us to backport this to LLVM 10.x. Older releases are not maintained. There is typically a single point release for bug fixes after every major release and we have unfortunately missed the deadline for 11.0.1 with this so it will only be in 12.0.0.
Is it possible to maintain this patch as a necessary fix for LLVM 10 in Numba?
Thanks for doing this. I think it is useful for test cases to fail if a check prefix is unused (or becomes unused). LGTM.
Fri, Jan 8
Kind of bizarre that we missed these, thanks for adding them. LGTM.
Historically, the letter prefixes were always stripped out because AIX and Linux assemblers don't accept them. The mentioned patch added the ability to produce prefixes with percent sign and letter which is actually accepted by the Linux assembler (GNU as). It was thought (not known) at the time that AIX does not accept this %<letter> syntax either so the guard for AIX was kept. If you can confirm that the AIX system assembler accepts this syntax, then this is fine.
Tue, Jan 5
LGTM.
It seems like this patch depends on another patch. Please don't forget to add the dependencies to the Phabricator reviews.
Mon, Jan 4
I don't really have any objections to this, but if there is a similar options to GCC, I would hope that they are the same. Please add a comment in the commit message to that end.
Other than that, LGTM.
Tue, Dec 29
LGTM with a minor nit.
LGTM. Thanks.
Mon, Dec 28
I understand that the back end only needs Altivec to support the operations, but enabling this on non-vsx subtargets is different from what GCC does. I don't see a good reason for us to depart from that.
Hmm... I think it is very surprising if DAG.getConstantPool() does not return ConstantPoolSDNode. I am not in favour of any design that violates such seemingly reasonable assumptions.
Please explain in the description of the review (and subsequent commit message) why custom lowering is needed. I suppose for SELECT_CC, it is needed because using setOperationAction() uses the result type which is only guaranteed to match the true/false value but not the comparison types. However, it is not immediately obvious to me why we need custom lowering for BR_CC.
Tue, Dec 22
Which ones? I don't see them listed in "Parent Revisions"
Mon, Dec 21
Seems that conversion diagnostic test cases are completely missing.
LGTM.
Thank you so much for fixing this and I'm sorry I didn't get around to going over it sooner.
LGTM aside from a minor nit.
I have added a number of comments to code that isn't part of this review which may seem odd. However, you are adding a fair amount of code to a function that is already very long/complex and rather than doing so, I'd prefer that you refactor the original function so the new code integrates more seamlessly.
Dec 17 2020
Just my general $0.02 regarding this refactoring effort...
LGTM.
I am really sorry that I've missed these notifications. The PPC changes are perfectly fine. This might lead to regressions if the shift amount is large enough to require more than a single instruction to produce each constant (i.e. the constant no longer fits in a 16-bit signed immediate). But there's work underway to improve constant materialization in the back end that will alleviate this problem.
Dec 15 2020
Dec 14 2020
Dec 11 2020
This is not functionally correct.
LGTM.
Most of this will be going away soon since we're refactoring how we select memory access instructions, but that's fine.
Dec 8 2020
Dec 2 2020
Dec 1 2020
My comments are all cosmetic, feel free to address them on the commit. LGTM otherwise.
This seems problematic to me for a few reasons:
- There is no 32-bit toolchains or libraries for little endian Linux systems
- There is no support in the ELFv2 ABI for 32-bit object mode and there may be a number of places we assume that little endian systems use ELFv2 ABI
- There are a number of places that make the assumption that little endian systems must be 64-bit so there's no checking for 32-bit mode (which will likely result in 64-bit specific code gen)
I think Baptiste's suggestion to add the expression prior to each pattern is a good idea.
LGTM as long as the test case is updated.
Nov 24 2020
I don't think adding an option to GNU ld is appropriate because it does the slow thing by default. Stefan is working on some test cases to illustrate GNU ld's behaviour so I think we should start with what it does and see whether we want to match behaviour and whether we want to do so under option control.
I don't really see it as an improvement in usability to have to post-process object files, potentially introducing another possible point of failure vs. providing an option to the linker.
(In addition, binutils does not have --add-missing-tls-reloc.)
I think that a departure from option compatibility is reasonable here if we can provide an option that will:
- Allow us to handle the missing relocation when the option is specified
- Not introduce any performance overhead when the option is not specified (i.e. the default behaviour)
LGTM.
LGTM. This seems like a reasonable solution.
Nov 20 2020
LGTM. Thanks for commoning this.
It would appear to me that the only test case changes are that the libcalls were turned into tail calls - but of course my knowledge of X86 and ARM asm is next to zero. Presumably the original lowering didn't set the necessary flags on the call and now the legalizer does so.
The patch LGTM, but I really think the test case should change to more clearly convey the behaviour you desire to test.
Nov 19 2020
I know that Simon mentioned that perhaps we should limit this to cast operations but TBH, I think there may be value in making this general and passing in the SDNode itself rather than opcode/type(s). Then all of this logic can be greatly simplified into something like this:
void SelectionDAGLegalize::LegalizeOp(SDNode *Node) { LLVM_DEBUG(dbgs() << "\nLegalizing: "; Node->dump(&DAG)); if (TLI.opRequiresLibCall(Node)) ConvertNodeToLibcall(Node);
And presumably, there would be another function for the TLI to provide the RTLIB::Libcall for the specific SDNode.
Nov 5 2020
This change is fine as it is. However, since we don't model the VSCR and we mark saturating intrinsics as having no sideeffects, we can miscompile wrt. this instruction/intrinsic.
Here is an example where we schedule a saturating instruction before the mfvsrc where the user clearly wanted it after:
vector int test(vector int a, vector int b, vector int aa, vector signed short *__restrict FromVSCR) { vector int c = __builtin_altivec_vsumsws(a, b);
Nov 3 2020
LGTM.
Oct 27 2020
Oct 23 2020
Oct 15 2020
Forgot to accept.
LGTM. Let's reduce this until we can add the option to limit the number of threads for LLD into lit configs.
Oct 8 2020
Oct 7 2020
LGTM. Nice to have this cleaned up. Thanks.
LGTM. Checking for overlap rather than simply the same register should be the right thing to do. However, I wonder if we might have other places where a similar assumption is made that could become a problem since we presumably only mark the subregister as an implicit def.
Oct 5 2020
Hi @gribozavr do you think we can do something about this test?
Sep 29 2020
LGTM. The code here is correct, but I do have a couple of questions:
- Do these copies come up in real code? And if so, under which circumstances. We don't have copies in the other direction implemented, so that makes me wonder when these ever come up.
- Does this solve a problem seen in real code? If so, is there a bugzilla report? At the very least, the conditions under which this causes a problem should be added to the description.
- Is there any follow up required/planned for this?
Sep 24 2020
This LGTM. My comment isn't really meant to require you to change the test, just to indicate that there are a lot of moving parts that are very difficult to account for completely. So if update_llc_test_checks.py works, you're better off just using that.
The remaining requests can be fulfilled when committing. I don't think this requires another review. Thanks.
This is not what we want. The builtin behaves correctly. It is equivalent to the generic __builtin_copysign and it would be very surprising to a user if it reverses the operands depending on which version you use. What is implemented incorrectly is vec_cpsgn so that's what should change.
This clearly changes behaviour and should thereby not have the [NFC] tag.
I think the code being added here (and perhaps the existing code as well) requires deep expertise in both PowerPC ISA and Exegesis. But I think with more descriptive comments, we can make it consumable for those that only check one of those boxes. So most of my comments are requests to add more descriptive/explanatory comments.
Sep 23 2020
LGTM. Thanks for your patience and for addressing all the comments.
Sep 22 2020
The commit that mentions this (https://reviews.llvm.org/rGf1746be6667) adds the cleanup. Please have a look at it and if it isn't the right way to clean up, modify the fix as needed. I had to make sure this doesn't continue to build up queues as it was causing build bot failures.
It turns out that the culprit for the PPC bot failures is actually https://reviews.llvm.org/rG144e57fc9535
But this just took a while to manifest.
The test case added in this patch creates System V message queues that it never cleans up. Why is that? We now have this failing on a machine that runs a lot of our bots. Presumably it will start failing on any machine that runs bots after enough runs.
Sep 21 2020
The nits can be addressed when committing the code. LGTM otherwise.