This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add a new pass to fold extension into load instructions in previous BB
Needs ReviewPublic

Authored by Carrot on May 24 2018, 2:16 PM.

Details

Summary

Compile the attached new test case, original llvm generates

movzwl  a, %esi          // %esi already contains zero extended value
calll   v1  
jmp     .LBB0_3

.LBB0_2:

movzwl  a+2, %esi    // %esi already contains zero extended value
calll   v2

.LBB0_3:

movzwl  %si, %eax    // another zext, we should avoid it.
cmpl    $4, %eax

In source code all related values are 16 bit. In lowering phase, function X86TargetLowering::EmitCmp intentionally creates zero extension before cmp in order to avoid 16 bit immediate, which is slow on modern x86. Later in X86FixupBWInsts.cpp 16 bit loads are changed to load and extension, it makes the extension before cmp redundant.

The extension created by X86TargetLowering::EmitCmp is beneficial, we need to fold it into previous load instructions to get best performance. The function optimizeLoadInstr() can only move the load into its single user in the same BB, so it can fold the simple ext(load) pair, but it can't do the folding cross BB, and it can't fold ext into load, so it can't be used for this purpose.

So I write this new pass X86FoldXBBExtLoad.cpp to fold 16bit extension into previous load instructions.

Diff Detail

Event Timeline

Carrot created this revision.May 24 2018, 2:16 PM

We have an MI combine pass IIRC -- why not put this there? (Genuine question, haven't looked at these enough to know if that doesn't work...)

test/CodeGen/X86/fold-xbb-ext-load.ll
8–15

Can you craft yoru test case to be reasonable to read with the automatically generated CHECK lines? That always makes maintenance easier *if* the result is readable... not sure if these will be.

You might need to break this apart into separate functions to make it read better.

Carrot updated this revision to Diff 150206.Jun 6 2018, 3:07 PM
Carrot marked an inline comment as done.

We have an MI combine pass IIRC -- why not put this there? (Genuine question, haven't looked at these enough to know if that doesn't work...)

The patterns handled by MachineCombiner usually are simple code sequences in a single basic block, new code sequences are inserted at the root instruction. The pattern handled by this file may cross several basic blocks, new instructions are inserted at multiple positions in previous basic blocks. So it can't be handled by MachineCombiner.

RKSimon added inline comments.Jun 7 2018, 2:12 AM
lib/Target/X86/X86FoldXBBExtLoad.cpp
2

Fix filename and brief

test/CodeGen/X86/fold-xbb-ext-load.ll
8–15

This doesn't look like its auto-generated - can you use update_llc_test_check.py?

Carrot updated this revision to Diff 150417.Jun 7 2018, 3:31 PM
Carrot marked 2 inline comments as done.

This looks nice. Any more reviews?

Some stats from benchmarks maybe? :)

Still looking at this, @Carrot ?

RKSimon resigned from this revision.Feb 7 2021, 6:24 AM