This is an archive of the discontinued LLVM Phabricator instance.

Fix use of DemandedMask to NewMask for SIGN_EXTEND_INREG
Needs ReviewPublic

Authored by gilr on Nov 19 2014, 5:47 AM.

Details

Reviewers
nadav
Summary

SimplifyDemandedBits takes into account multiple uses by using an all-ones mask instead of the caller's DemandedMask input. However for SIGN_EXTEND_INREG the original DemandedMask is used instead of the safe NewMask, which makes the SHL simplication unsafe for SIGN_EXTEND_INREG with multiple uses.

The following LIT reproduces this bug in LLVM 3.2 (the mask of the second selects ends up zeroed after being SHL 31 bits and then again SHL 63 bits).

; RUN: llc < %s -march=x86-64 -mcpu=generic -mattr=sse42 | FileCheck %s
; CHECK-LABEL: sext_inreg_multiple_uses:
; CHECK: pslld $31, [[REG:%[a-z0-9]+]]
; CHECK-NEXT: psrad $31, [[REG]]
define void @sext_inreg_multiple_uses(<4 x double> %a, <4 x double> %b, <4 x i32>* %out1, <4 x double>* %out2) {
entry:

%mask = fcmp olt <4 x double> %a, %b
%res1 = select <4 x i1> %mask, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 -2, i32 -2, i32 -2, i32 -2>
%res2 = select <4 x i1> %mask, <4 x double> <double 3.0, double 3.0, double 3.0, double 3.0>, <4 x double> <double 7.0, double 7.0, double 7.0, double 7.0>
store <4 x i32> %res1, <4 x i32>* %out1, align 4
store <4 x double> %res2, <4 x double>* %out2, align 8
ret void

}

On the current trunk this code (and my attempts to tweak it) is not lowered to a double-use SIGN_EXTEND_INREG so the problem is not reproduced. Any suggestion on how to tweak the code to generate a double-use SIGN_EXTEND_INREG would be greatly appreciated.

Diff Detail

Event Timeline

gilr updated this revision to Diff 16373.Nov 19 2014, 5:47 AM
gilr retitled this revision from to Fix use of DemandedMask to NewMask for SIGN_EXTEND_INREG.
gilr updated this object.
gilr edited the test plan for this revision. (Show Details)
gilr added reviewers: eli.friedman, nadav.
gilr added a subscriber: Unknown Object (MLST).
nadav edited edge metadata.Dec 3 2014, 8:57 AM

This looks like an obvious bug that should be fixed, unless the code is never executed. Can you try to delete this optimization altogether and see if any tests break? You can also try to use llvm-stress to generate lots of code and see if it ever reaches this optimization.

gilr added a comment.Dec 30 2014, 3:50 AM

Removing this optimization breaks CodeGen/X86/vector-blend.ll & CodeGen/X86/vselect-avx.ll.

nadav added a comment.Jan 5 2015, 9:19 AM

Okay, so if the code is executed then we should try to come up with a testcase that exposes the bug that you found. Can you modify the test cases that broke and try to expose the bug?

mkuper added a subscriber: mkuper.Jan 8 2015, 1:19 AM

We've tried to modify the test-cases, but we can't actually get it to fail.
We could try to track down why (I guess some earlier transformation changes the code for the problematic cases, where there are multiple uses, into something nicer) but I'm not sure it's worth the effort, since, as you say, this is an obvious bug.

Would you be ok with committing it without a test?

mkuper removed a subscriber: mkuper.Jun 6 2016, 5:33 PM
eli.friedman resigned from this revision.Jun 6 2016, 5:44 PM
eli.friedman removed a reviewer: eli.friedman.