This is an archive of the discontinued LLVM Phabricator instance.

[X86] Implement GCC's -mfentry flag
ClosedPublic

Authored by niravd on Dec 20 2016, 1:38 PM.

Details

Summary

the -pg flag inserts calls to mcount at the entry of functions in the body for profiling purposes. The -mfentry flag converts such calls into "naked" calls to fentry (in that the call has no prelude/postlude operations).

Diff Detail

Repository
rL LLVM

Event Timeline

niravd updated this revision to Diff 82145.Dec 20 2016, 1:38 PM
niravd retitled this revision from to [X86] Implement -mfentry.
niravd updated this object.
niravd added reviewers: hfinkel, craig.topper.
niravd added a subscriber: llvm-commits.
sanjoy added a subscriber: sanjoy.Jan 3 2017, 1:14 PM

Can you please give some context about this: what is __fentry__, mcount, their use cases, relevant specifications?

Also: have you checked to see if some of this code can be shared with XRay?

niravd updated this object.Jan 4 2017, 7:36 AM
niravd retitled this revision from [X86] Implement -mfentry to [X86] Implement GCC's -mfentry flag.Jan 4 2017, 7:48 AM
niravd updated this object.
niravd edited subscribers, added: dberris; removed: llvm-commits.Jan 4 2017, 7:54 AM

Can you please give some context about this: what is __fentry__, mcount, their use cases, relevant specifications?

Updated commit message to reflect this.

Also: have you checked to see if some of this code can be shared with XRay?

I don't think it is (but CC-ing dberris for second opinion). My understanding was that XRay is concerned with function exit and this is concerned with entry.

Ping. I believe the only thing holding this up is the potential for sharing overlaps with XRay code, but since this fentry and xray are nominally orthogonal and this deals solely with function entry points we should preserve the logical partitioning as it is.

Sorry for the delay -- XRay is interested in both entry and exits. I haven't quite looked into the specifics of this, but it seems like it's doing something very similar to what XRay, or at least the profiling instrumentation code is doing.

All fentry is doing is adding "callq fentry" at the first instruction of every function. It seems unlikely that fentry would ever want to be used with XRay (at the very least it's not obvious who should be on the outside).
We should probably throw a warning/error if they're used together. Do you think it makes sense then to merge the fentry pass into PATCHABLE_FUNCTION_ENTER or leave it as is?

Sorry for the delay -- XRay is interested in both entry and exits. I haven't quite looked into the specifics of this, but it seems like it's doing something very similar to what XRay, or at least the profiling instrumentation code is doing.

I'm a little confused here -- you're directly calling a function, before all the stack adjustments are done? I'm sure I understand what this is meant to do, but I'm wondering whether this as written is sufficient (i.e. whether the function being called needs to do the normally caller-saved registers, before doing any work aside from setting up its own stack frame) or "safe".

As far as whether this conflicts with XRay, I'm not really sure it does -- I'd think the XRay instrumentation should come in before the "blind" call to __fentry__ but that's mostly because XRay is meant to measure the cost of running the body of the functions that are instrumented.

Is there a definitive guide to what the flag is supposed to do? And what this supposed to be used for?

I'm a little confused here -- you're directly calling a function, before all the stack adjustments are done? I'm sure I understand what this is meant to do, but I'm wondering whether this as written is sufficient (i.e. whether the function being called needs to do the normally caller-saved registers, before doing any work aside from setting up its own stack frame) or "safe".

Right. My understanding is that this is something that was added to gcc 4.6 to help debug in kernel-esque environments. Specifically the goal was to do a mcount-like call before the function prologue. I haven't found a definitive guide though.

The change itself seems straightforward in that it's just mimicking GCC. The only question is its relationship with XRay. I see no good justification for merging the passes as they're seem orthogonal. It might make sense to add a check to cause an error if mfentry and xray are used together since they're both should operate at the prologue (This would be in the clang half D28001). This is not currently the case when -pg and -fxray-instrument are both used; we get the XRay sled before the prologue and mcount afterwards.

dberris edited edge metadata.Jan 29 2017, 4:14 PM

I'm a little confused here -- you're directly calling a function, before all the stack adjustments are done? I'm sure I understand what this is meant to do, but I'm wondering whether this as written is sufficient (i.e. whether the function being called needs to do the normally caller-saved registers, before doing any work aside from setting up its own stack frame) or "safe".

Right. My understanding is that this is something that was added to gcc 4.6 to help debug in kernel-esque environments. Specifically the goal was to do a mcount-like call before the function prologue. I haven't found a definitive guide though.

The change itself seems straightforward in that it's just mimicking GCC. The only question is its relationship with XRay. I see no good justification for merging the passes as they're seem orthogonal. It might make sense to add a check to cause an error if mfentry and xray are used together since they're both should operate at the prologue (This would be in the clang half D28001). This is not currently the case when -pg and -fxray-instrument are both used; we get the XRay sled before the prologue and mcount afterwards.

I think it's fine to keep them separate, I just want to make sure that the XRay sleds are inserted before the mfentry call -- I don't see a good reason to error out in case mfentry and xray are both enabled, they just have to "work as expected". In the same vein -pg and -fxray-instrument ought to work with each other as expected too.

I think it's fine to keep them separate, I just want to make sure that the XRay sleds are inserted before the mfentry call -- I don't see a good reason to error out in case mfentry and xray are both enabled, they just have to "work as expected". In the same vein -pg and -fxray-instrument ought to work with each other as expected too.

Seems good. It's currently in the opposite order, but I'll move the FEntryInserter pass to happen before XRayInstrumentation in TargetPassConfig.cpp.

Dean, can you give me an LGTM on this (and the clang partner diff 28001)?

dberris accepted this revision.Jan 30 2017, 9:29 PM

LGTM

I might be missing it, but do you have the order reversed already?

This revision is now accepted and ready to land.Jan 30 2017, 9:29 PM
This revision was automatically updated to reflect the committed changes.