Page MenuHomePhabricator

[SystemZ] Use VGM whenever possible to load FP immediates
ClosedPublic

Authored by jonpa on Feb 9 2019, 9:37 AM.

Details

Reviewers
uweigand
Summary

VGM (Vector Generate Mask) can be used to build certain FP immediates.

isFPImmLegal() extended to recognize these values and return true.

New pseudo instructions with a custom inserter use VGM and then copy the right subreg to the FP virtual register.

Tests updated + new test: fp-const-12.ll.

Diff Detail

Event Timeline

jonpa created this revision.Feb 9 2019, 9:37 AM
jonpa updated this revision to Diff 186167.Feb 10 2019, 3:25 PM

Patch was proven to be incorrect by an assertion failure.

Updated to take better care with bitwidths (double / float) by using VGMG / VGMF.

It seems you're assuming VGM is always available. I guess there needs to be a check for hasVector() somewhere.

Is there a reason why you're only creating the VGM at the MI level via a custom inserter, instead of introducing it during late DAG matching like we do for vector types? It seems odd to handle the scalar and vector cases differently here ...

In general, this does look like a good idea.

jonpa updated this revision to Diff 186329.Feb 11 2019, 2:12 PM

It seems you're assuming VGM is always available. I guess there needs to be a check for hasVector() somewhere.

Yes, fixed.

Is there a reason why you're only creating the VGM at the MI level via a custom inserter, instead of introducing it during late DAG matching like we do for vector types? It seems odd to handle the scalar and vector cases differently here ...

At first that seemed simpler to me, but I agree now that it is better to do it in Select(). Interestingly enough, this change gives a slightly different code generation:

                             CustomInserter vs Select()
mdbr           :                 7040                 6957      -83
wfmdb          :                19484                19561      +77
wfadb          :                10173                10240      +67
vsteg          :                 1774                 1720      -54
adbr           :                10648                10595      -53
vlrepg         :                 4836                 4795      -41
std            :                23026                23057      +31
ld             :                34156                34176      +20
vl             :                23472                23490      +18
vgmg           :                 3999                 3982      -17
lr             :                29745                29729      -16
cdbr           :                 7785                 7800      +15
lgr            :               355323               355309      -14
lg             :               374064               374052      -12
jne            :                14047                14059      +12
vgmf           :                  866                  860       -6
sdbr           :                 3122                 3127       +5
wldeb          :                  326                  321       -5
ltg            :                49842                49847       +5
...
Spill|Reload   :               189197               189161      -36

After isel, we now do the VGM with a vr128bit register, whereas the custom inserter used a vf128 register.

I was able to make a reduced test case from a file where the number of VGM?s was different now. In this one case trunk (and previous version of patch) had a loop invariant instruction hoisted by MachineLICM, whereas now it did not. It was due to some reg pressure heuristic. Not sure if this is important overall... Using vr128 bit seems right, by thereby giving regalloc more freedom...

uweigand accepted this revision.Feb 12 2019, 4:41 AM

Yes, I think this version is better.

LGTM, thanks!

This revision is now accepted and ready to land.Feb 12 2019, 4:41 AM
jonpa closed this revision.Feb 12 2019, 10:11 AM

Thanks for review. r353867.