Page MenuHomePhabricator

[DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
ClosedPublic

Authored by craig.topper on Jan 30 2018, 1:16 AM.

Details

Summary

Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Not sure if the other targets here are good or bad. Or if there's some other changes that could be made to recover any loss.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jan 30 2018, 1:16 AM
RKSimon added inline comments.
test/CodeGen/NVPTX/param-load-store.ll
28 ↗(On Diff #131932)

This looks like the only regression - I think cvt.u32.u16 effectively acts as "ZERO_EXTEND_INREG" in this case (which we don't have).

@jholewinski any thoughts?

RKSimon edited reviewers, added: tra, sfertile; removed: jholewinski.Jan 30 2018, 3:57 AM
craig.topper added inline comments.Jan 30 2018, 11:34 PM
test/CodeGen/NVPTX/param-load-store.ll
28 ↗(On Diff #131932)

Looks like we are able to do fold (zext (and/or/xor (zextload x), cst)) -> (and/or/xor (zextload x), (zext cst)) when the load has multiple uses, but we can't do (zext (zextload x)) -> (zextload x) when there is an additional use.

Rebase after D43063

The NVPTX test got weirder now. The setp.eq.b16instruction was always there, but not part of the CHECKs. The and.b32 is new.

tra added a comment.Feb 15 2018, 2:35 PM

Rebase after D43063

The NVPTX test got weirder now. The setp.eq.b16instruction was always there, but not part of the CHECKs. The and.b32 is new.

That setp should not be there to start with. AFAICT it's result is not used by anything. I'll take a look at what's going on.

@tra, did you figure out what was going on?

tra added a comment.Feb 23 2018, 10:44 AM

@tra, did you figure out what was going on?

Some of it. Something weird is going on during lowering or i1 arguments in NVPTX and we end up with CopyToReg on a chain, but with no other uses. I've found references to what looks like a similar issue in PPC code (ANDIGlueBug), so it may not be a new issue. It certainly has nothing to do with your patch.

LGTM as far as NVPTX is concerned.

tra added a comment.Feb 27 2018, 9:58 AM
In D42679#1017544, @tra wrote:

@tra, did you figure out what was going on?

Some of it. Something weird is going on during lowering or i1 arguments in NVPTX and we end up with CopyToReg on a chain, but with no other uses. I've found references to what looks like a similar issue in PPC code (ANDIGlueBug), so it may not be a new issue. It certainly has nothing to do with your patch.

That unused setp is an artifact of -O0. Without it (or with any other -O level) NVPTX generates more sensible code (with or without this patch):

	ld.param.u8 	%r1, [test_i1_param_0];
	and.b32  	%r2, %r1, 1;
	st.param.b32 	[func_retval0+0], %r2;
	ret;

@RKSimon, are you ok with this patch then?

RKSimon accepted this revision.Mar 1 2018, 1:04 PM

LGTM

This revision is now accepted and ready to land.Mar 1 2018, 1:04 PM
This revision was automatically updated to reflect the committed changes.