This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Use wzr/xzr for FP stores of zero
Needs ReviewPublic

Authored by paquette on Jan 13 2021, 4:26 PM.

Details

Reviewers
aemerson
Summary

In situations like the example below, GlobalISel produces an unnecessary fmov from wzr.

https://godbolt.org/z/Y45W8T

define void @test(float* %ptr, float %x) {
    %cmp = fcmp une float undef, 0.000000e+00
    br i1 %cmp, label %a, label %b
a:
    %gep = getelementptr inbounds float, float* %ptr, i32 1
    store float 0.000000e+00, float* %gep, align 4
    ret void
b:
    ret void
}

In a lot of cases, matchFConstantToConstant in the pre-legalizer combiner handles this sort of thing.

However, that combine only works when all users of a G_FCONSTANT are stores. In this case, we have a G_FCMP which also uses the G_FCONSTANT.

This patch adds a post-legalization (lowering) combine which looks for G_STORES which store a positive 0. If such a store is found, and we know it can use wzr/xzr, then it's updated to use a G_CONSTANT 0.

During selection, the constant 0 + store will be selected to a store of wzr/xzr.

This is a 0.1% code size improvement at -Os for CTMark/mafft/pairlocalalign. There are minor code size improvements in other CTMark tests as well.

Diff Detail

Event Timeline

paquette created this revision.Jan 13 2021, 4:26 PM
paquette requested review of this revision.Jan 13 2021, 4:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2021, 4:26 PM

Why is this in lowering instead of combine?

Why is this in lowering instead of combine?

Originally I did it in select, since it seemed like something we would want at all optimization levels.

Moving it to lowering gave slightly better code size results on bullet and pairlocalalign. (152 B and 76 B respectively)

I think that it's likely because of regbankselect. Some of the instructions in there check onlyUsesFP to decide on which register bank to use (G_LOAD, G_SELECT, and G_UNMERGE_VALUES). So, we can potentially avoid some extra copies by making sure that instructions like G_STORES of 0.0 don't end up being marked as FPR.

A better, more general solution might be to move some of the G_FCONSTANT selection code into the legalizer or post-legalizer lowering. That might make some of the selection code which relies on seeing a G_FCONSTANT more difficult to get right though.

Or, alternatively, teach regbankselect to check if a G_FCONSTANT is actually producing something that can/should be emitted using fmov. If it's not, then it could actually be assigned a GPR, because we know that's what it will end up on.

It it possible to extend the existing combine in prelegalizer combiner to handle this too? Or will that cause us to still miss some cases?

It it possible to extend the existing combine in prelegalizer combiner to handle this too? Or will that cause us to still miss some cases?

The problem with that combine is it looks for G_FCONSTANTS which are only used by stores, so as is, it would miss cases.

I think that we could probably replace that combine with something similar to this though.

It it possible to extend the existing combine in prelegalizer combiner to handle this too? Or will that cause us to still miss some cases?

The problem with that combine is it looks for G_FCONSTANTS which are only used by stores, so as is, it would miss cases.

I think that we could probably replace that combine with something similar to this though.

Right. I think it would be nice to keep these int/fp constant of 0 combines somewhat together. If you feel that it's not a net benefit to do that then I'm happy for this to go in as is.