User Details
- User Since
- Jan 29 2018, 7:03 AM (305 w, 5 d)
Jul 10 2018
Jul 2 2018
Jun 29 2018
Updated per comments. Typedefs for intermediate short vectors moved into the bodies of the functions using them.
Jun 28 2018
Uploaded the correct diff.
Jun 22 2018
Jun 21 2018
Changed the type checks to ensure that the input has a valid type, rather than guarding against specific invalid types.
Jun 20 2018
Jun 19 2018
Ping.
Jun 15 2018
Updated the AVX512 isel pattern to account for the instruction record name change.
Abandoning this due to D48067 being accepted instead.
Jun 14 2018
Fixed the typo in the test name and added checks to make the transform stop if the rounding mode immediate and/or SAE are not constant.
Changes made per comments. Note that zext IR instructions have been fully excluded from all patterns, which will require altering vec_floor.ll tests in D45203 if this revision is accepted.
Jun 12 2018
Jun 11 2018
Fixed the error in the final check (it was from badly undone edits around there). Moved the early-exit check. Expanded the comment on the AVX512BW check for clarity. Some names changed per comments.
Jun 5 2018
Added tests for floor intrinsics and masked scalar double patterns to cover all introduced isel patterns.
Closing this due to failure of D45723.
Closing this due to failure of D45721.
Corrected the scalar pattern predicates, added packed zero-masked instruction patterns and tests to cover zero-masking. Changed the RUN line of vec_floor.ll to give different results for AVX512F and AVX512VL where needed (e.g. in 128- and 256-bit masked operations).
Jun 4 2018
Changed isSequentialOrUndefInRange to take an increment argument and added isAnyInRange.
Taking over at @GBuella's request. The patterns that are currently implemented will be finished, but I don't have much hope for the masked versions. Since the mask is only good for the lower elements and the upped elements must be zeroed out, lowering the masked versions of these intrinsics would require not simple selects (see PR34877), but patterns like
Added zero extension of mask to i32 in the masked scalar tests and added more ways to represent the mask, testing the 8-bit mask pattern among others. 16-bit mask patterns removed due to scalar_to_vector errors.
Jun 1 2018
Changed the scalar intrinsic lowering to work via extract-insert. D45203 contains tests for folding the resulting IR patterns.
Changed the folding to use isel patterns from D47012.
Added an extra flag to the test, so that both flag-reading paths are covered. Since flags are module-wide, if there need to be tests covering them separately they would have to be put in new files.
Changes made per comments.
May 30 2018
Replaced assertion with a more explicit error at the start of the code path. Changed the emission of the "GNU\0" byte sequence to use EmitBytes with the StringRef constructor specifically to include the trailing zero byte.
May 29 2018
Removed the llvm-readobj part to release as a separate patch.
May 28 2018
Made changes to the readobj part per comments. Will wait on comment from @craig.topper before splitting the patch.
May 25 2018
Enabled readobj to parse sections that use 32 bits for the data on 64-bit targets (like gcc currently does) and added explicit alignment to the section generation.
The bug was in the emitLongJmpShadowStackFix and emitEHSjLjLongJmp. The longjmp builtin call is not a terminator, so it's followed by an "unreachable" IR instruction. The previous patch put the register restoration and the indirect branch at the end of the final basic block of the resulting code. It produced correct code on Linux, but on Mac where "unreachable" produces ud2 instructions this lead to a crash because the ud2 was left in place and ended up being before the longjmp code, regardless of whether the shadow stack fix was present. This patch moves the longjmp pseudo-instruction and the ud2 (if present) to the final basic block of the produced code and emits the longjmp logic before them, so the ud2 is correctly placed at the end of longjmp and is properly unreachable.
May 24 2018
May 23 2018
Updated to use full 8/4 byte integers to store the flags rather than using 1 byte and padding.
May 21 2018
Uploaded the version without the test for the section emission. Corrected.
May 17 2018
Removed the unused HasIBT variable declaration from X86.h.
Removed the missed FIXME.
May 16 2018
Replaced FIXMEs with more appropriate cautionary comments. They do not need to be addressed directly by any patches, so much as they served to warn about using the instructions carelessly now that there is no longer a simple way to restrict their usage.
May 15 2018
May 7 2018
@craig.topper, please review the latest changes.
Restored shuffle combining tests. Fixed the folding to account for unary (A == B) patterns, added tests.
May 4 2018
@RKSimon, please review the latest changes.
Apr 30 2018
Test fix.
Style fix.
Updated per comments.
Apr 27 2018
Apr 26 2018
Changed the shuffle mask emission code to match D45721.
Tidied up the shuffle mask emission and checking and moved the PACKUSDW SSE4.1 check into tracePackVectorShuffle.
Apr 25 2018
@RKSimon what's the motivation for leaving the tests for upgraded intrinsics in place instead of moving them to appropriate files?
Updated per comments.
Style fix.
Apr 24 2018
Updated per comments in the clang part.
Updated per comments.
Apr 19 2018
Fast-isel tests restored and updated.
Style fix.
Updated fast-isel tests for SSE and AVX2. There don't appear to be any AVX512 fast-isel tests for these intrinsics, will add them if required.
Apr 18 2018
Removed incorrect checks from SSE tests.
Updated per comments.
Updated per comments.
Style update.
Updated per comments.