- User Since
- Apr 14 2013, 11:48 AM (427 w, 1 d)
Fri, Jun 18
This looks pretty good to me. One thing that is odd is the extra always-zero parameter to the LoopVarLen insns. This really should be removed.
Mon, Jun 7
I haven't looked at the code in detail yet, but just a couple of general comments:
- The GNU attribute support should probably just go into the generic ELF target streamer. This concept is supported in principle on all ELF targets. (That might also avoid the need for multiple inheritance.)
- If we support the GNU attribute generally in the ELF streamer, then we should also support it generally in the asm streamer (and then also the common asm parser, I guess)
- ASM parser support for .gnu_attribute should support any tag&value just like GAS does.
Wed, Jun 2
Mon, May 31
Ah wait: the test case uses -fno-asynchronous-unwind-tables ! This explains why unwind cannot work - on s390x the ABI requires .eh_frame everywhere, and there is no alternative unwind mechanism via a frame pointer. The "fast unwinder" instead uses the (deprecated) "backchain" mechanism, but that is off by default and requires use of the -mbackchain option. That option is actually set by the various .lit.cfg files, but I guess the special invocation in this test case ignores those flags?
Maybe it would be clearer to be explicit and write:
Looking at the assembly, only a single function shows any difference between the two versions: "interp" from MultiSource/Benchmarks/MallocBench/gs/interp.c.
Sat, May 29
Unfortunately, I now see this patch also caused another regression, this time in the s390x multistage builder. Again, this was initially hidden by other regressions; now that these are resolved, you can see the problems here (https://lab.llvm.org/buildbot/#/builders/8/builds/1277):
Fri, May 28
This commit also introduced a regression in the s390x-lnt buildbot due to an apparent miscompilation of the test-suite::gs.test test case:
Hmm, the https://reviews.llvm.org/D103288 commit did fix the problem on s390x again. Thanks!
Thu, May 27
I looks like this patch, when committed as 1c7f32334d4becc725b9025fd32291a0e5729acd, caused two additional build bot failures on s390x:
FAIL: libFuzzer:: msan.test
FAIL: libFuzzer:: swap-cmp.test
This was unfortunately hidden by other failures at the time, so I didn't notice earlier.
Wed, May 26
This version LGTM now. Thanks!
Tue, May 25
Function return value / parameters: A C function with a vector parameter will be lowered by Clang differently depending on the subtarget vector support. On z13 it may be a <4 x i32>, while on zEC12 it is instead <4 x i32>*. When running instruction selection there would be then no way of telling if <4 x i32>* was a pointer argument or if it was a software ABI vector variable argument. Therefore a function attribute is emitted for each function "has-vector-arg" where as applicable. I guess the alternative might be to actually do this during instruction selection while considering the subtarget vector support presence - if using clang (and not llc), this might be enough. (E.g. if subtarget has vector support, then <4 x i32>* is really a pointer...)
Now it's looking mostly good to me; a couple of (mainly cosmetic) remaining comments inline.
Indeed, this looks like a win. LGTM.
May 20 2021
This already looks a lot simpler. I'm now wondering if it would be even better to change the getNumRegisters interface from (ValueVT, RC) to (ValueVT, RegisterVT). (Or I guess an Optional<RegisterVT> if we don't want to change all callers.) All callers of getNumRegisters already have the RegisterVT immediately available, and it allows the back end to make any distinction it needs just as well as a RC would.
May 19 2021
Yes, this patch fixes the problem for me. Thanks!
May 18 2021
So it looks like this is good to go now. LGTM, thanks!
OK, this looks good to me now. Thanks!
The new test case fails on s390x with "WARNING: Symbolizer buffer too small" from SymbolizerProcess::ReadFromSymbolizer, and it is indeed true that the full backtrace at this point does not fit in the default buffer size of 16 KB. In the innermost function we have 1024 instances of the string "std::vector<int, std::allocator<int> >" in the function type, which in itself is already 40 KB. Plus the higher stack frames as well ...
Looks like this is also failing on s390x:
May 14 2021
May 12 2021
May 11 2021
This means the implementation now deviates from the documented vector extension syntax, right? I guess it's strictly an extension of the documented syntax, but that may still lead to incompatibilities with other compilers for the platform. If we want to make such a change, should it be synchronized with e.g. GCC, XL, etc. ?
May 7 2021
May 6 2021
Thanks, this LGTM. We can always add further improvements later if necessary.
May 5 2021
May 3 2021
Apr 28 2021
Apr 27 2021
LGTM as well.
Apr 23 2021
B.t.w. even if the raised exception does cause an actual trap, the result of the instruction might _still_ be accessed. This is now of course platform-specific, but e.g. on Linux it is possible to inspect the result of the instruction in the trap handler if one is installed; it is also possible for the trap handler to resume execution so that the result may also still be used by the original code afterwards. In those cases, even if there is a trap, there would still be difference between fcmp and fcmps visible.
Apr 22 2021
The statement is now correct, but it's still not completely clear to me why it is necessary. All floating-point intrinsics may still produce a result in addition to raising an exception (if they do), so the definition of the result is always independent of the question what -if any- exceptions are raised. Why does this need to be called out here specifically?
I don't believe this change is correct. Note that even if an exception is raised, the instruction may still have a result (if exceptions are configured to only set a flag bit and not actually trap). This result will still be different for ordered vs. unordered comparisons.
Apr 13 2021
Apr 6 2021
Thanks for the heads-up, @luismarques ! However, this is not really an issue on SystemZ as all arguments in question have an alignment requirement of 8 bytes on our platform.
Mar 15 2021
Mar 12 2021
Mar 11 2021
I don't think there is a s390x platform-specific problem involved here at all, so I'm not sure disabling the test for the platform is the right fix.
Mar 4 2021
Mar 3 2021
Ahh, you're right. It's done in common code by the default ATOMIC_CMP_SWAP_SUCCESS expander -- but we're not using that since we use our own custom expander! So it indeed has to be done there in the back end.
The change in lowerATOMIC_CMP_SWAP should be removed now. Otherwise this LGTM.
Mar 2 2021
Mar 1 2021
CmpVal: Should not need a LL[HC]R, as it should already be zero-extended also in the case of a non-constant, or?
Looks good in general. Just as a cosmetic issue, it would be nicer to be able to simply write:
Feb 26 2021
Looks like resolving the general issues with the AsmDialect setting is more complicated than I thought and may still take a while.
I'm not sure if "getTypeToTransformTo(*DAG.getContext(), OrigArgVT)" results in the same type that is used by common code in all cases.
Feb 19 2021
Feb 18 2021
LGTM as well, thanks!
Feb 17 2021
Feb 12 2021
Feb 11 2021
It's not a good idea to use the library routine "memcpy" here, because the name in theory be redefined by a user before including the header, and also there may not be a prototype in scope causing warnings when building with -Wall.
Feb 9 2021
Feb 8 2021
Jan 26 2021
Jan 21 2021
Jan 20 2021
Jan 15 2021
Jan 14 2021
Just a quick comment that I'm looking at this, but before approval I want to resolve the issues described in the comment in front of getMAIAssemblerDialect. See also discussion here: https://reviews.llvm.org/D82862
Hi @hans , we're having some issues with using the AssemblerDialect mechanism to support both the GNU assembler and the IBM HLASM assembler in the SystemZ back-end. See also the discussion started here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146875.html
Jan 13 2021
Jan 12 2021
I don't think we need to bother with skipping advancing the hazard state. I believe the main point of the cutoff is to avoid combinatorial explosion where there are many instructions to schedule and at each step there are many candidates to consider. Advancing the hazard state doesn't consider candidates and is therefore just a linear pass over instructions.
Jan 11 2021
Could you elaborate why initPolicy is the correct place to clear the Available list? I'm wondering because the default implementation doesn't appear to do that either, it looks like common code only clears the list in the main "init" ...
Dec 16 2020
Dec 15 2020
Dec 14 2020
This all looks good to me, the test case also looks fine (and the offset match what GCC is doing).