- User Since
- Aug 13 2013, 3:10 PM (205 w, 2 d)
- I would be slightly hesitant here, as it complicates code and none of the public targets needs this. We don't even try the "uint64_t" instead of "unsigned" step but instead just make it all complicated right away...
- I would not commit this without careful measurements to ensure that compilers indeed optimize all of this away and we don't have a compiletime impact.
- You probably can use some of the APInt::tcXXX functions to avoid code duplication.
Wed, Jul 19
Finally got around pushing this. You may consider asking for commit access (https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access) if you plan further changes.
Testcase? It seems that your change wants to allow the following:
Seems fine to me, but needs more documentation and this touching a core regalloc datastructure I'm including Quentin in the reviewers.
I'm confused by this change (and the testcase doesn't really help understanding what is going on). You are probably onto a real bug here where we cannot assume the same lanemasks work for the input/output register class of a copy. However that should be independent of the fact that a full or partial copy is present.
I'm not sure I like this:
- It introduces a new word "cluster", so we end up having the words "MacroFusion" and "Cluster" around which appear to mean the same thing?
- Using computeOperandLatency() to compute something called cluster feels out of place, similar with getReadAdvanceCycles
LGTM. I wish we wouldn't use std::functions here at all and instead something like a base class with two subclasses, but that is for another patch.
LGTM, style suggestions below.
This is out of date. Nowadays targets can decide themselfes whether they want post-ra fusion by overriding TaretPassConfig::createPostMachineScheduler() and adding a dag mutation there.
This seems to have landed a while ago in r298648
Tue, Jul 18
... PASS: AddressSanitizer-x86_64-darwin :: TestCases/Darwin/dump_registers.cc (911 of 2593) Build timed out (after 45 minutes). Marking the build as aborted. Set build name. New build name is 'r308348 (#17551)' Variable with name 'BUILD_DISPLAY_NAME' already exists, current value: 'r308348 (#17551)', new value: 'r308348 (#17551)' Build was aborted Recording test results ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?
Various greendragon builds started to timeout when running unittests after this commit. Any idea what causes this? Should we revert the commit to bring the builders back?
- Add unittests
- Add documentation
Mon, Jul 17
I think this broke the greendragon libfuzzer build: http://lab.llvm.org:8080/green/job/libFuzzer/5100/ this one seems like the most likely cause.
Any idea how to get this back to green?
Sun, Jul 16
Fri, Jul 14
Wed, Jul 12
Tue, Jul 11
Mon, Jul 10
Yep, that was an oversight. Thanks for fixing!
Can you give an example of a typical use? Is this just a different way to express macrofusion opportunities or is this more or less powerfull?
Hi, this change leads to llvm-test-suite tramp3d-v4 failing (at least on macOS systems).
Looks good to me. Possible improvement below.
Please, ignore the first part in my last reply that proposes immediately throwing an exception. I realized that we cannot compile that line while -fno-exceptions is enabled. Hence I proposed a different solution that could go as part of llvm::report_bad_alloc_error() (as I proposed later in the review).
So how about
#include <new> ... x = malloc(...); if (x == nullptr) throw std::bad_alloc();
Thu, Jul 6
There is already precedent with
TargetInstrInfo::getSerializableTargetIndices(), getSerializableDirectMachineOperandTargetFlags() and getSerializableBitmaskMachineOperandTargetFlags()
This adds bundle-specific behaviour into the register allocator. I think this is wrong!
Wed, Jul 5
In general: This may warrant an RFC to llvm-dev to discuss the scope of these patches. I feel that making all of LLVM abort gracefully in out of memory situations will be hard/unmaintainable. I believe the trick would be to limit the scope. Remembering your EuroLLVM talk it seemed that you would only need IR construction in the long running server process so you can hand over that IR to shorter lived/restartable worker processes doing the actual code generation. So the scope may be limited to this use case avoiding the need to fix all the transformation passes to recover gracefully.
Fri, Jun 30
Changing the meaning of "all" to really mean all targets makes sense. But there is of course a fundamental discussion hidding here:
- Is this motivated by AMDGPU having a huge number of registers in the mask?
- For X86, AArch64, ARM the output certainly takes up a bunch of space but I find the information important enough that I'd like to see it by default
- How about a compromise: We stop using an On/Off switch but instead influence the number of registers printed in the regmask instead of the hardcoded 10 at the moment. Then choose a big default value that easily fits X86, AArch64, ARM etc.
See comments below but nothing blocking. LGTM.
Thu, Jun 29
Thanks for the review!
What happens when the user is using the lit from "pip install lit" instead of the one coming with llvm? Alternative question: Who is able to update the pypy package?
Wed, Jun 28
The change LGTM, but please keep it on topic (see below).
- Actually I have one more request: Could you move libs/benchmark-1.1.0 to MicroBenchmarks/benchmark-1.1.0 (or MicroBenchmarks/libs/benchmark-1.10 if you prefer that). The library should only be used by microbenchmarks and that way people that do things like just running CTMark don't have to compile the library. It also saves you from modifying the toplevel cmake file and adding that exception for libs to the TEST_SUITE_SUBDIR list construction.
Fix typo in docu.
- Rebase to ToT
- Add "format_version": "2" to the toplevel dictionary.
Before I dive in the meta discussion, are there any issues with the patch itself?
Tue, Jun 27
The patch works fine. People using non-nts testsuites (which I have a strong suspicion only exist within apple), may need to supply a -s MySchema flag for the commandline tools or use db_default/v4/MySchema/submitRun instead of db_default/submitRun to express they don't use the 'nts' default.
Mon, Jun 26
I think this landed in r294268
I'm fine with the review as is, to get things working on FreeBSD.
While I think you are fine here, you should probably double check that TargetRegisterInfo::getMinimalPhysRegClass() does not start returning your new register class for the "normal" go regs or if it does at least keep an eye on the performance bots to make sure it doesn't accidentally change.
From what I remember the notion of a nullptr in Map for a block we had already Seen was introduced with the changes supporting undefs when recalculating liveness. I assumed that had the same role as the special VNI you introduce here. Can you elaborate the difference (and/or eliminate the cases looking for nullptr)?
Fri, Jun 23
Add missing file
Fix comment in server_wrapper.sh
- We should really add a litsupport/ plugin rather than starting an LNTBased suite. I see no advantage of using LNTBased, but at the same time writing a new runner from scratch means you have to reimplement (or not implement) codesize, compiletime measurements, pgo data collection feeding back into the build etc.
- Indeed, active test-suite development is done in the cmake build (test_suite.py in lnt). You may still change nt.py but landing features exclusively there would be bad.
- It seems the llvm test-suite today only checks for SMALL_PROBLEM_SIZE, occasionally LARGE_PROBLEM_SIZE and for SPEC you can choose the size manually. Do you plan to change the test-suite for the new sizes? We should probably do further discussions there.
- Do you plan to set any rules/expectations on those new sizes? (Something like "mini" finished in half the time than "small")
- Out of interest: Why do you need finer control over the problem sizes?
Thu, Jun 22
Wed, Jun 21
Thanks for working on this! A bunch of nitpicks are below but overal the fix looks fine.
D34464 fixes this in a less hacky way.
For the record if someone wants to reproduce: