- User Since
- Nov 23 2012, 10:16 AM (360 w, 3 d)
Tue, Oct 15
I'm a bit puzzled by the need for this change. The bump allocator already has logic to do power-of-two scaling for the allocation, so I wonder why it doesn't work properly here.
Mon, Oct 14
Sun, Oct 13
Fri, Oct 11
Adjust based on comments.
Mon, Oct 7
Why go back to the large tables for crc32? Just because JamCRC had that bug doesn't mean it should persist.
Sat, Oct 5
I wonder if we should actually enumerate evil here, i.e. give the situations in which inlining actually fails. As mentioned on IRC, I wonder if we shouldn't aim for the stronger semantics and at least warn by default of any situation that prevents always_inline from doing its job.
Thu, Oct 3
I'd make the argument that it should be strlcpy if present. That covers a lot of existing systems too. I'm not sure this optimisation should be done for freestanding mode though.
Tue, Oct 1
2nd ping. Chandler, care to check this please?
Sep 21 2019
Sep 16 2019
Can you declare a volatile variable and set that from f instead?
Sep 9 2019
Switch the pass to use two rounds. The first round will collect all relevant intrinsics in RPO, the second one will translate them accordingly.
Aug 29 2019
Aug 28 2019
Aug 27 2019
Chandler, are you OK with getting the InstructionSimplify.h part in now, so that it can be merged into 9.0 and the rest follow separately?
Aug 26 2019
It would be interesting to get an actual backtrace. It should not be a DWARF register number, but it might be from a direct caller of the unw interface.
At least for freestanding environments, it would be useful to separate nonnull completely from deferencable. GCC has a separate flag for it, which might also be a reasonable idea.
Hook up a second instance of the pass after the Float To Int pass for optimized builds. This is after the initial loop transforms, so it can profit from some unrolling, but it is before vectorization. The late run of the pass should is kept for now and ensures any potentially added variants are still dropped before SDAG.
Aug 24 2019
It would be nice to also default to this for -ffreestanding.
Aug 23 2019
Simplify the pass logic. First round will update the predecessor links and note if any block is orphaned, a second round will remove unreachable blocks if necessary.
The pass can be further refined to use and update DominatorTree and AssumptionCache incrementally, but this should be functionally complete now. It will not handle more complex cases like orphaned loops, but I don't think those are commonly used with is.constant or objectsize conditions either.
The pass will now scan every BB once, but fall back to the start of a BB of recursive removal invalidates the iterator. This seems to be the strictest form I can manage.
Aug 21 2019
Please read the discussion. The general consensus was and likely is that this is at least somewhat bogus in the standard at best and the cost of not artificially breaking code is much higher than the benefit. Yes, GCC made different choices, but that's no excuse.
Take a look at the list archives for the discussion about attribute nonnull and nullable a while ago. The short version is that the library interface specification might have a wildcard clause that NULL pointers are invalid unless explicitly allowed, but it works perfectly fine on any sensible implementation except the ds9k. When the combination of glibc annotations and aggressive gcc optimizations started, it pointlessly broke a lot of code for no good reason.
Aug 7 2019
Thanks, this looks good enough within the limitations of the framework.
Aug 5 2019
Looking a bit more into the details. Chandler, you've originally suggested going with the LowerAtomic route and that actually does create code that fails the SDAG lowering if the pass is skipped, e.g. on ARM.
The second part is currently overlapping with the CodeGenPrepare pass. I can cleanup the implementation somewhat by reusing the same functionality as that pass is using OR I could factor out a minimal branch for doing the constant folding optimization from CodeGenPrepare as a general branch that is included in the pass chain for -O0, e.g. instead of the more general CodeGenPrepare pass. The main difference is that the non-optimized build would not get any recursive folding from PHI-simplification, but I think that's fine for the original use case. It would also not get the block merging, but again, that seems to be fine for the constraints.
Aug 4 2019
For the first part, I was actually asked to do that. I don't mind either way.
Aug 3 2019
Generalize slightly to also cover llvm.objectsize which has very similar constraints.
Aug 2 2019
Aug 1 2019
Update PHI nodes in disconnected block.
Jul 30 2019
Replace boolean argument to replaceAndRecursivelySimplify with optional vector of un-modified instructions. This simplifies the API change significantly and allows other potential use cases. Redo the restart handling. After a successful simplification step, restart the current BB only, but always do another full pass of the function.
Jul 29 2019
Jul 27 2019
Avoid goto. Create new BranchInst instead of modifying in-place. Update tests to reflect changes. Move most of the x86 is-constant test to generic.
With the exception of the small improvements outlined, this looks good to me.
I understand, but the current version just doesn't work anyway to delay it.
Jul 26 2019
Jul 25 2019
Jul 24 2019
You lost the changes to lib/Sema/SemaStmtAsm.cpp to actually do the delaying for immediate operands?
Jul 19 2019
Jun 7 2019
Jun 5 2019
I think MMX code is obscure enough at this point that it doesn't matter much either way. Even less across DSO boundaries. That's why I don't really care either way.
The patch still needs to be generalized to handle all ICE constraints, but it is not blocked by the review for handling the is.constant intrinsic.
May 30 2019
May 16 2019
May 13 2019
On the frontend side:
- __builtin_constant_p is not expanded correctly in EvalConstant.cpp when used as part of codegen for -O0. This means that trivially unreachable branches are emitted (e.g. with a BR with a literal false as condition),
- The constant evaluator has a general problem with overly pessimistic analysis when it comes to possible side effects. Still have to discuss the correct handling with Richard for this.
Apr 24 2019
It does not fix the issues on our side, but pushes them to a different place. It is still an improvement, but the problem is not solved yet.
Apr 23 2019
Apr 22 2019
I'm in the process of testing this, but feedback will take a bit.
Mar 27 2019
Mar 26 2019
I'd recomment copying the version from libarchive (https://github.com/libarchive/libarchive/blob/master/libarchive/archive_crc32.h):
- don't bother with the 1KB table in both binary and source, just recompute it on the first use.
- at least in the past unrolling the inner loop somewhat helped a lot
Mar 11 2019
LGTM from my perspective at least.
For it to be really useful for the majority of bugs, it would be nice to figure out automatically how to get the preprocessing step done and filter out the # lines afterwards. That part alone significantly cuts down the creduce time.
Why do you exclude clang? I would expect LLVM at this point to not support visibility attributes on XCOFF either?
Mar 5 2019
Well, that was a sample to illustrate the point. A full working (and now failing) example is:
Mar 4 2019
The other problem is that we don't use the CFG machinery to prune dead branches. Consider the x86 in/out instructions: one variant takes an immediate, the other a register. The classic way to deal with that is something like
Mar 2 2019
Can you include a patch for something like (int *)0xdeadbeeeeeef on amd64? That's a valid value for "n", but clearly too large for int. Thanks for looking at this, it is one of the two large remaining show stoppers for the asm constraint check.
Placing .bss.rel.ro before .data doesn't make sense. It forces the content of .bss.rel.ro to embedded into the binary. I also don't really understand the motivation here. Memory mappins on the kernel side are quite cheap.