- User Since
- Aug 13 2013, 3:10 PM (257 w, 6 h)
For the record: The check is based on a LiveInterval::size() which gives you the number of segments. So I assume what is "huge" here is the number of basic blocks?
Fri, Jul 13
I think fastregalloc/-O0 isn't really expecting all liveness flags to be complete (unfortunately hard to tell/not written down).
My LGTM still stands (in case you are waiting for another approval).
I still think we should rather try to not set TracksRegLiveness early in the codegen pipeline instead. Let me try how well that works...
- Patches should be relative to the toplevel test-suite directory.
- 94k of reference output seems a lot and smells like you may end up in the anti-pattern of spending most of the benchmark time in printing instead of doing calculations.
- We currently tend to not add -ffast-math as part of the makefiles, but instead people choose it together with their other optimization flags.
- How long does this benchmark run? (We aim for 0.5-1s on typical PC hardware); Bonus points if the duration can be changed from the commandline (no expections for matching reference outputs for different input size though).
- srand()/rand() produce different output with different libc implementations!
or maybe prependImplicitDef()?
I would consider setDesc() a rare/exceptional operation. So I'm not sure I like an extra loop for the common addOperand() maybe we can rather provide an insertOperand() to accomodate the setDesc() scenario you describe?
I fear the covered check will hit us perf wise either way (+CC some people from our GPU team) and that we need to find a different solution long term (like the Targets announcing the saved registers themselfes instead of the generic code piecing together the information...).
Anyway if this works for you I'd be fine with this patch as a stopgap.
Thu, Jul 12
Adding more people; still reading up on why the covered logic was necessary in the first place...
turns out my checkout was slightly outdated too, rebased. Fixed warning.
Makes sense, thanks for the patch.
Put an updated version of the patch here: https://reviews.llvm.org/D49256
Sorry this fell to the cracks. I tried comitting it yesterday, but it needs some adaptation for GlobalISel now. I'm currently making some simple changes for that and will then commit...
Wed, Jul 11
See also: https://reviews.llvm.org/D37582
(Looks like I never got around comitting it and nobody pinged, oops)
Mon, Jul 2
Thu, Jun 28
Landed in rL335877
Wed, Jun 27
Thu, Jun 21
I don't have a copy of Spec2017 yet so I cannot test any of it.
Wed, Jun 20
Mon, Jun 18
Looking around the code I would guess it just works. You should however also remove the && MRI.tracksLiveness() from MachineBasicBlock::print() if you change the default.
IMO MachineFunction::init() should not set TracksLiveness by default and instead VirtRegMap / FastRegAlloc should set it once it actually filled in the basic block live-in lists.
From what I remember our use of this flag is strange:
Jun 14 2018
LGTM with this round of nitpicks addressed.
Jun 13 2018
Take a look at RAGreedy::selectOrSplit for an example on how to report fatal errors in the backend.
I'm pretty sure I only added the timer->statistics support for the "legacy" pass manager.
Jun 8 2018
usr/bin/env is the recommended way to invoke python... Or is lldb somehow hardcoded to use /usr/bin/python and you need to match this here?
May 29 2018
I'd also recommend using the api_auth_token. Actually if you look at the Rest API you can see it supporting submissions only with api_auth_token set. I just never wanted to introduce this to the /submitRun pages to remain compatible with old clients.
I assume you don't have commit access?
Usually it's the author of the patch that commits it unless he doesn't have commit access yet in which case it's upon the reviewers.
May 22 2018
Why not let the database do the sorting? .order_by(ts.Machine.name) or whatever sqlalchemy wants there?
Cross compiling currently is a mess. Basically cmake only works properly when you have a complete gcc style cross toolchain, or a MacOS style toolchain on Mac. Most llvm developers currently are not in a position to set one of those two things up when they just want to test a compiler they made some changes to; to be honest when I started hacking on the llvm test-suite cmake I wasn't even aware of what assumptions are made there/what problem exist...
May 18 2018
LGTM. Please watch the buildbots carefully (and revert in case of problems) when you push this.
May 16 2018
No feedback from my side, I#m not familiar with the profile code.
May 14 2018
(FYI: I'm busy with non-llvm at the moment, so I may not be responsive for things I cannot answer right away; but I'll try to find a timeslot this week going through the backlog of llvm things I have to look at)
May 11 2018
Some optimizations (e.g. cache-locality, parallelization) can cut the execution time by order by magnitudes. With gemm, I have seen single-thread speed-ups of 34x. With parallelization, it will be even more. If the execution time without optimization is one second, it will be too short with optimization, especially with parallelization and accelerator-offloading which adds invocation overheads.
It's great to have a discussion on how such benchmarks should look like.
Instead of one-size-fits-it-all, should we have multiple problem sizes? There is already SMALL_DATASET, which is smaller than the default, but what about larger ones? SPEC has "test" (should execute everything at least once, great to check correctness), "train" (for PGO-training), "ref" (the scored benchmark input; in CPU 2017 runs up to 2 hrs). Polybench has MINI_DATASET to EXTRALARGE_DATASET which are defined by workingset-size, instead of purpose or runtime.
- First: We have to choose a default problem size and I just wanted to emphasize that it should not be too big, so running the llvm test-suite finishes in a reasonable timeframe.
May 10 2018
Are you writing these from scratch? If so, I'd like to make some suggestions:
It's odd to have this in the repository, but admittedly we don't really have a wiki or similar in LLVM so I may be ok.
Apr 30 2018
(this patch is still good to go/accepted)
BTW: I did not do this years ago, because the doxygen documentation claims that the brief comment stops at the first linebreak or the first dot. Hence I was preaching the rule that you still need \brief when the comment is more than 1 line. I now heard that apparently this is not how doxygen behaves (so documentation bug I guess).
Apr 26 2018
Apr 25 2018
Tgiugh this is another sign that the global_status view probably isn‘t used byanyone ans has no tests. I still think we should just remove it...
Apr 24 2018
I think this is fine, but maybe wait a week or so before comitting in case @cmatthews has more comments.
This seems straightforward.
Maybe the answer here would be to further break up templates/reporting/runs.html into smaller pieces that can then be included from the WebUI and the E-Mail reports. So we can leave the accordions in the web-ui part. (That may be a bunch of refactoring though)
templates/reporting/runs.html reporting is also used to produce E-Mail reports I believe. And I'm not sure the E-Mail output has the CSS required for the accordions?
Looks good enought to commit, though I would recommend factoring things out differently:
Apr 18 2018
Oh sorry, I didn't notice there is another "scavenging" thing in there.
Huh? You cannot just disable stack slot scavenging. Who guarantees you that you don't need it?
Is the bug related to the latest changes in PEI? (I've seen some commits talking about base and frame pointers flying by but did not have time to read them or dive into the details...)
- Do non gold linkers all accept the -Wl,-plugin-opt lines?
- Maybe the clang driver knows if a gold linker is used and could add the -Wl,-plugin opt lines itself when -save-stats is used?
I am pretty sure this is not correct: Autobrief is stupid and stops at the first dot or at the line ending AFAIK. This means when the first sentence stretches over more than 1 line you still need need \brief. So a bunch of the changes here appear invalid to me, please check the doxygen output and revert if necessary.
Apr 17 2018
Apr 16 2018
Have you seen LiveRegUnits::accumulate() glancing over the code here (and earlier) it seems to me that TII::tracksRegDefsUses() was just invented to do the same thing: figuring out which registers are free/usable over a range of instructions...
Apr 3 2018
$ clang -arch arm64 -O3 -S -o- t.c -mcpu=generic ... ldrb w8, [x1] ldrb w9, [x1, #1] ldrb w10, [x1, #2] ldrb w11, [x1, #3] strb w8, [x0] strb w9, [x0, #1] strb w10, [x0, #2] strb w11, [x0, #3] ret
Should we maybe move the requirement list into setup.py instead of using this functionality which they apparently don't want us to use like this? (I'm fine with this patch in the meantime)
Mar 26 2018
LGTM, though none of us apple people know much about the profile code as it only works on linux right now. But unless someone with more knowledge of the code objects, I'd push this.
Mar 8 2018
Mar 5 2018
Thanks for working on this; I'm looking forward to have this!
Mar 4 2018
This is great! I always wanted to see a IACTA like tool for other architectures.
Feb 28 2018
... looks like we should add some tests for the stuff in profile_views.py
Feb 27 2018
This is surprising: DBG_VALUE should not have any defs so removeDefs() should have no effect and all uses should be marked with the debug flag so none of them should pass the MachineOperand::readsReg() test in addUses. Could you check whether the debug flag is set correctly? Did you try running your test with -verify-machineinstrs?
Now, should we go to such length to make the debug information be friendlier to some debugger, I am not sure, but I let you decide.
Definitely not! That debugger constraint feels like it is designed for ancient compilers producing code and register allocating for one C statement at a time.
Feb 25 2018
I've been looking into this, and we do indeed produce something like this for naked function with a parameter (this from http://llvm.org/PR28641 compiled with -O0):
bb.0 (%ir-block.1): liveins: $edi %0:gr32 = COPY $edi %1:gr32 = COPY killed %0 INLINEASM &ret [sideeffect] [attdialect], $0:[clobber], implicit-def early-clobber $rdi, $1:[clobber], implicit-def early-clobber $eflags, !2
Feb 22 2018
Looks good from what I can see, though I'm not super experienced with fast-isel either. Does this work correctly in the case when fast-isel aborts while selecting a block?
Looks nearly good to me. I'd just opt for a different naming of the sub-tests. Also some nitpicks/comments on other nitpicks below:
- You should double check whether you really need to repeat the lit.local.cfg changes in each directory.
- I haven't actually tried compiling/running the benchmark but the changes LGTM.
This should be a good thing! Among other things it should also allow us to switch register mask operands from physical registers to register units.
Your reasoning makes sense to me. And I think it's fine to push this given no floating point expert objects.
- I'd rather keep MachineRegisterInfo to things directly related to information about virtual registers (register class etc.); I don't think it's role should be to provide transformations for involving virtual registers. So I'd rather move the functionality somewhere else.
- I'm skeptical this works as intended in the presence of control flow (= multiple uses and defs of a vreg anywhere in the control flow graph).
- I'm not sure I completely understand what is going on here, is it looking for COPYs killing a value so it can replace the debug values with the copied value? The commit message should mention that then.
This is a tricky case, good catch.
I'm still a bit confused as to why we start spilling unused input vregs in the first place. Can you add a test case so we can reproduce? (The test can't get commit without a test anyway)
- Just a warning: I'm a bit skeptical timers will work reliable for things that happen in well under a millisecond (did you do some sanity checking? i.e. do all the timer roughly add up to the time spent in the frontend?)
- In the same vain I wonder if you add overhead by adding timers to functions that are likely called hundreds of times for a typical source file. Have you measured a release build with and without your patch.
LGTM with declarations moved into a header.
Please append a "; NFC" to the commit title.
Feb 21 2018
Okay reading the code a bit more seems like that is what you are doing.
But we don't want user flags for bugpoint, we want it to try both variants and keep the smaller one when it still reproduces the crash.
Feb 20 2018
Since a "naked" function can only have nothing but "asm" statements inside, and those "asm" statements can not have input or output parameters, it is safe to just disable any registers spilling for a "naked" function.