This is an archive of the discontinued LLVM Phabricator instance.

X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free
ClosedPublic

Authored by MatzeB on Oct 11 2018, 3:11 PM.

Details

Summary

DIV/REM by constants should always be expanded into mul/shift/etc.
patterns. Unfortunately the ConstantHoisting pass runs too early at a
point where the pattern isn't expanded yet. However after
ConstantHoisting hoisted some immediate the result may not expand
anymore. Also the hoisting typically doesn't make sense because it
operates on immediates that will change completely during the expansion.

Report DIV/REM as TCC_Free so ConstantHoisting will not touch them.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Oct 11 2018, 3:11 PM

This is currently not motivated by a benchmark but preparation for an upcoming commit of mine where I don't want ConstantHoisting to screw with dev/rem and mess up testcases.

However it does make sense in general and speeds up this microbenchmark by 10x on my measurements:

#include <stdint.h>

#define SIZE 100000

void __attribute__((noinline)) divfoo(int64_t *signed_arr, uint64_t *unsigned_arr) {
    for (unsigned i = 0; i < SIZE; ++i) {
        signed_arr[i] /= 0xcafebabe5;
        unsigned_arr[i] /= 0xcafebabe5;
    }
}

int main(void) {
    static int64_t signed_arr[SIZE];
    static uint64_t unsigned_arr[SIZE];
    for (unsigned i = 0; i < 1000; ++i)
        divfoo(signed_arr, unsigned_arr);
}

This totally makes sense, but why is ConstantHoisting pulling those constants out of the only block that uses them? I thought it tried to keep constants in the nearest dominator of the uses?

Or is moving them because its a loop and we don't want to reload it every iteration and don't trust LICM?

Or is moving them because its a loop and we don't want to reload it every iteration and don't trust LICM?

I'm not completely sure. I see the code operating with block frequencies so it probably sees a lower block frequency in front of the loop.

The pass itself currently runs relatively close to SelectionDAG so there is no LICM happening anymore. Though IMO it really should be implemented in SelectionDAG after everything is expanded and actually all constants are visible and not play all the games with hoisting constants between blocks.

Or is moving them because its a loop and we don't want to reload it every iteration and don't trust LICM?

I'm not completely sure. I see the code operating with block frequencies so it probably sees a lower block frequency in front of the loop.

The pass itself currently runs relatively close to SelectionDAG so there is no LICM happening anymore. Though IMO it really should be implemented in SelectionDAG after everything is expanded and actually all constants are visible and not play all the games with hoisting constants between blocks.

Also with ConstantHoisting being a late SelectionDAG pass we could get rid of the terrible OpaqueConstant concept in SelectionDAG/DAGCombiner. (I'm not signing up to actually change this at this time though)

MatzeB updated this revision to Diff 169325.Oct 11 2018, 3:49 PM

noticed that the test needs to operate on i64 to have any effect (as all i32 values are reported as TCC_Free anyway)

This revision is now accepted and ready to land.Oct 11 2018, 3:49 PM
This revision was automatically updated to reflect the committed changes.