- User Since
- May 22 2014, 1:24 PM (273 w, 5 d)
Sun, Aug 18
Fri, Aug 16
LGTM - in D63233, I added a TODO that I've marked here for reference, but this appears to be a minimal/safe fix.
Thu, Aug 15
Wed, Aug 14
- Switch flag requirements based on opcode (only FADD handled for now).
- Added partial reduction tests (rL368913).
- Added negative test for mismatch on fast-math-flags.
x86 is moving away from reciprocal estimate code because recent hardware implements a real fdiv at about the same speed as reciprocal estimate+refinement. So x86 perf probably isn't that big of a concern, but it would be good to see the regression test diffs (use the auto-generation script to update those files).
Tue, Aug 13
Mon, Aug 12
Sorry for the delay in looking at this. What do the motivating examples look like for codegen? Are we getting the optimal codegen for these clamps, or would we better off trying to create min/max and/or saturating intrinsics?
Sat, Aug 10
Fri, Aug 9
Thu, Aug 8
I stepped through this example, and wouldInstructionBeTriviallyDead() returns true because the call has neither uses nor side effects. So we mark it as dead, and then it gets deleted. I'm not sure if that explains the other bug reports mentioned in D65336 though.
- Added documentation comment for vectorizeLoads().
- Use isSimple() to filter out 'volatile' and other loads that we don't want to alter.
- Moved vectorization of loads to end of processing per block (not sure if that answers the request for "end of vectorization" though).
- Removed check for load larger than vector register size (that was an attempt to not create something harmful, but now we are using the cost model).