Compilers at a fruit company
- User Since
- Sep 9 2013, 3:45 AM (253 w, 6 d)
Tue, Jul 3
Mon, Jul 2
Sun, Jun 24
Jun 20 2018
Hi Daniel, sorry for the delay.
Jun 4 2018
Jun 2 2018
Jun 1 2018
May 31 2018
New patch now only zero-extends for stores, removing the use of getBooleanContents.
May 29 2018
I'll re-do this patch to unconditionally zero-extend for returns and also fix up the G_STORE legalisation to always zero-extend too. Given we don't have any target hooks that give us the information we need, let's go with just matching SelectionDAG behaviour?
I got my branches mixed up, this patch is missing test updates. Will upload a new one soon.
May 28 2018
Overall I'm in favour of this change, so LGTM.
May 27 2018
May 23 2018
May 22 2018
Looks like this change caused http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/1944/ as well as others.
May 18 2018
This caused a failure in green dragon: http://green.lab.llvm.org/green/job/lldb-xcode/6644
May 16 2018
May 15 2018
I think I've addressed most of the issues. I've tried to clean up the computeValueLLTs function a little bit, still perhaps not ideal. Dead code has been removed (I think an earlier revision of the patch required it for the tests to pass). Also added the tests you wrote, thanks.
May 14 2018
Some review comments not addressed yet, will add a test requested in a later revision.
Thanks. -time-passes shows that IRTranslator is now using around 2MB. It's also taking most of the compile time vs other passes like your initial analysis showed, but we trade off the legalizer doing much less work.
I've made the changes to not generate copies for the insertvalue/extractvalue, but how did you measure the memory consumption of the individual passes? With the change it now produces 4k lines of assembly on your test case, vs 3.5k assembly. The MIR is much larger but that's because of the extra GEPs and G_CONSTANTs. Compile time is now comparable to without this change.
May 10 2018
My thoughts were that we'd take a compile time hit with this, although in the end it was mostly neutral, but the code size would regress as you saw. The idea was to have a pre-legalizer combiner (Aditya had a prototype demo of one to show how the combiner API would work). Either that, or we have a clean up phase at the end of IRTranslator where we eliminate G_INSERT and G_EXTRACT pairs, perhaps with some caching of potentially redundant sets of these instructions due to pack/unpack regs as translation is happening, and then a quick pass to eliminate them without searching the whole function.
May 9 2018
May 8 2018
The actual problem here seems to be superlinear performance with this cutoff value. The 5x I mentioned was inappropriate as that was over a whole compile and compared to an old version of LLVM, before the store merging after legalisation patch for AArch64 was landed in December. The actual problem in my test case, which is admittedly a very large one, is that with the original value of 8192, I get around 330s runtime for my test case, at Max=2048 its 189s, Max=1024 its 34s, Max=512 21s, Max=256 20s. So somewhere between 512 and 1024 we start to see this superlinear compile time.
I think we lost 0.5% on compile time on CTMark because of this. Is there any scope left here for performance improvements? With the addition of the combiner landing later, I expect a further CT hit so anything we can do here would be good.
I ran the test suite and SPEC2000/2006 benchmarks and didn't see any significant change with this.
May 7 2018
Apr 29 2018
Apr 26 2018
Apr 25 2018
Apr 24 2018
Apr 23 2018
Apr 19 2018
Apr 18 2018
I'm assuming this is in Target/AArch64 because other targets haven't been updated to use the new opcodes yet? We do eventually want to use these representations for every target though right?
LGTM, with addition of -verify-machineinstrs to the tests.
Apr 11 2018
The motivating use case for this was actually fixed in clang, r325478. Although this may be useful in future for other front-ends/clients of LLVM, this can be revived if needed later.
Apr 10 2018
Apr 9 2018
Apr 6 2018
AArch64 was the test guinea pig for this new representation, and it's proved to be a smooth transition. If you change the semantics now, even if the intrinsics are still experimental in name, IR from LLVM 6.0 may be silently miscompiled if someone implements a valid optimization based on your new proposal. That's a fact, and I don't think this patch review is the right place to discuss that if you want to do this, I suggest you send a new RFC or revive the old one.
Simon asked this on the PR, let's continue the discussion in one place:
Do we really want to completely ignore an intrinsic argument depending on the fast flags? There might be valid reasons to want to include an accumulation value.
The raison d'être for the argument is for ordered reductions, and the intrinsics were designed to allow the expression of the reduction idiom only, in light of newer vector architectures where the previous representation was inadequate. The use of an accumulator argument for fast reductions wasn't necessary, so the semantics were supposed to be defined in the most minimal form. The accumulator can easily be expressed as a extractelement->op->insertelement sequence. The question here I think becomes our good old friend: what should the canonical form be?
Mar 29 2018
Mar 26 2018
Mar 23 2018
Mar 22 2018
Mar 19 2018
Mar 16 2018
Mar 15 2018
Mar 13 2018
Mar 8 2018
LGTM, can you also make the test name a little more specific, e.g. 'vecreduce-propagate-sd-flags.ll'?
Please add a test.
Mar 1 2018
This part what I wanted to clarify, -mstack-probe-arg is enabling stack probes if the ABI requires it only, not for other reasons like security.
Feb 28 2018
Can we clarify the meaning of this option a bit. The doc you've added here is saying that -mno-stack-arg-probe disables stack probes. Then what does -mstack-arg-probe mean specifically? Does it mean that only stack probes for ABI required reasons are enabled, or probes are done even in cases where the ABI doesn't require them? Either way, the doc needs to be clearer on the exact purpose.
Feb 27 2018
Feb 26 2018
Changed to use errorUnsupported() so we get some diagnostics first.
Feb 21 2018
Feb 19 2018
As stated in PR36345 I'm committing this now to get it into 6.0.