This is an archive of the discontinued LLVM Phabricator instance.

[LoopDataPrefetch/AArch64] Don't add prefetch intrinsic, when the loop already has InlineAsm prefetch.
AbandonedPublic

Authored by flyingforyou on Jul 14 2016, 10:16 PM.

Details

Summary

We need to check InliineAsm prefetch, not only Intrinsic::prefetch.

Diff Detail

Event Timeline

flyingforyou retitled this revision from to [LoopDataPrefetch/AArch64] Don't add prefetch intrinsic, when the loop already has InlineAsm prefetch..
flyingforyou updated this object.
flyingforyou added reviewers: anemet, hfinkel, rengolin.
flyingforyou added a subscriber: llvm-commits.
hfinkel added inline comments.Jul 14 2016, 10:50 PM
lib/Transforms/Scalar/LoopDataPrefetch.cpp
199

I think this would need to be some kind of TTI callback; we don't want to embed this kind of logic here. Also, I assume you'd need to check all of the lines, not just the first one.

flyingforyou added inline comments.Jul 14 2016, 11:26 PM
lib/Transforms/Scalar/LoopDataPrefetch.cpp
199

I think this would need to be some kind of TTI callback; we don't want to embed this kind of logic here.

Sure, Could you recommend callback function name please?

Also, I assume you'd need to check all of the lines, not just the first one.

Yes. I will change the StringRef's function to find from startswith_lower.

Addressed Hal's comments.

Thanks.

flyingforyou added subscribers: mcrosier, t.p.northover.

Is Diff 64111 too much modification?

I think we just check below string. Because it depends on ISA, only.
AArch64 : prfm
AArch32 : pld
PowerPC: dcbt
...

How about making helper function for checking these string?

or Just checking string in TargetTransformInfoImpl.h likes isLoweredToCall?

bool isLoweredToCall(const Function *F) {
  // FIXME: These should almost certainly not be handled here, and instead
  // handled with the help of TLI or the target itself. This was largely
  // ported from existing analysis heuristics here so that such refactorings
  // can take place in the future.

  if (F->isIntrinsic())
    return false;

  if (F->hasLocalLinkage() || !F->hasName())
    return true;

  StringRef Name = F->getName();

  // These will all likely lower to a single selection DAG node.
  if (Name == "copysign" || Name == "copysignf" || Name == "copysignl" ||
      Name == "fabs" || Name == "fabsf" || Name == "fabsl" || Name == "sin" ||
      Name == "fmin" || Name == "fminf" || Name == "fminl" ||
      Name == "fmax" || Name == "fmaxf" || Name == "fmaxl" ||
      Name == "sinf" || Name == "sinl" || Name == "cos" || Name == "cosf" ||
      Name == "cosl" || Name == "sqrt" || Name == "sqrtf" || Name == "sqrtl")
    return false;

  // These are all likely to be optimized into something smaller.
  if (Name == "pow" || Name == "powf" || Name == "powl" || Name == "exp2" ||
      Name == "exp2l" || Name == "exp2f" || Name == "floor" ||
      Name == "floorf" || Name == "ceil" || Name == "round" ||
      Name == "ffs" || Name == "ffsl" || Name == "abs" || Name == "labs" ||
      Name == "llabs")
    return false;

  return true;
}

Thanks,
Junmo Park.

t.p.northover edited edge metadata.Aug 2 2016, 2:41 PM

This is a pretty nasty hack. The only other place I know of where we inspect inline asm (also poorly) is in BranchRelaxation when we need to know how big it is. This is justified only by the fact that it's a major correctness issue if we don't.

We have no idea if the prefetch is for the address we're considering, or even if it actually exists! (e.g. asm volatile("; We don't want a prfm here\n\t" ...)).

IMO, users should be encouraged to use the ACLE intrinsics instead. They produce completely sensible @llvm.prefetch instructions LLVM can reason about properly.

Thanks for the comment Tim.

We have no idea if the prefetch is for the address we're considering.

This is original algorithm's behavior. When you see the comment over the checking routine, there is mentioned likes below.

// If the loop already has prefetches, then assume that the user knows
// what they are doing and don't add any more.

even if it actually exists! (e.g. asm volatile("; We don't want a prfm here\n\t" ...)).

This can be fixed when we make check routine more robust.

IMO, users should be encouraged to use the ACLE intrinsics instead.

I am not sure that all users will use ACLE intrinsics instead of inline asm. I think using inline asm or intrinsic is still user's choice. Because inline asm has much longer history than intrinsic. And there are so many codes which use inline asm.
If I miss something, please let me know.

Or how about give up inserting prefetch intrinsic when the loop has inline asm?
I don't think users are stupid that they don't know prefetch is necessary or not when they use inline asm.

I am not sure that all users will use ACLE intrinsics instead of inline asm.

They should always use intrinsics when standardized ones are available. Usually even when non-standard ones are. They're strictly more portable than asm, and one of the main reasons we advocate them is precisely because they give the compiler a better idea of what's happening (otherwise we could quite easily have implemented arm_neon.h using inline asm, like GCC).

I think using inline asm or intrinsic is still user's choice. Because inline asm has much longer history than intrinsic. And there are so many codes which use inline asm.

It's certainly their choice, but one penalty of using asm has always been that the compiler doesn't know what's going on. I'm pretty strongly opposed to changing that.

Or how about give up inserting prefetch intrinsic when the loop has inline asm?

I could just about live with that.

OK Tim.

Hal, If you also agree with this, I will abandon this patch.

hfinkel edited edge metadata.Aug 14 2016, 10:53 AM

OK Tim.

Hal, If you also agree with this, I will abandon this patch.

I agree.

flyingforyou abandoned this revision.Aug 15 2016, 2:47 PM

Thanks. Hal, Tim.