This is an archive of the discontinued LLVM Phabricator instance.

[Pipeline] Don't limit ArgumentPromotion to -O3
ClosedPublic

Authored by aeubanks on Apr 13 2023, 1:27 PM.

Details

Summary

This is a cheap pass so there's no need to limit to -O3.
This removes some differences between various pipelines.

Diff Detail

Event Timeline

aeubanks created this revision.Apr 13 2023, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 1:27 PM
aeubanks requested review of this revision.Apr 13 2023, 1:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 1:27 PM
nikic added a comment.Apr 13 2023, 1:53 PM

Running this at O2 sounds fine, but there's probably no need to run it at O1?

Running this at O2 sounds fine, but there's probably no need to run it at O1?

I don't think it really matters, not checking the opt level simplifies things a tiny bit, but if you feel strongly I can restrict this to not run at -O1

nikic accepted this revision.Apr 14 2023, 7:49 AM

Not going to insist. I don't particularly care about the O1 pipeline, and I don't think we have a good understanding of what it is supposed to be anyway.

This revision is now accepted and ready to land.Apr 14 2023, 7:49 AM
This revision was landed with ongoing or failed builds.Apr 14 2023, 10:05 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.EditedApr 14 2023, 5:33 PM

Ignore this, D148396 seems to have fixed/worked-around the issue for the time being.

Hi, @aeubanks , this patch is hurting bpf backend.
The bpf backend calling convention is only allowing 5
parameters in registers and does not allow pass arguments
through stacks. But ArgumentPromotionPass could increase
the number of arguments for the function. If the eventual
number of arguments is more than 5, bpf backend will
have a fatal error.

The issue is discovered when I am compiling with latest clang
on bpf selftests, By default, for bpf community, it is recommended
to use -O2 as the compilation flag and that is why the issue is
not exposed earlier. This patch enabled the transformation at -O2,
and I hit the following errors with latest clang:

...
progs/test_xdp_noinline.c:739:8: error: too many args to t8: i64 = GlobalAddress<ptr @encap_v4> 0, progs/test_xdp_noinline.c:739:8
              if (!encap_v4(xdp, cval, &pckt, dst, pkt_bytes))
                   ^
...
progs/test_xdp_noinline.c:321:6: error: defined with too many args
bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
     ^
...

The following is a further analysis why the above happens.

struct flow_key {
      union {
              __be32 src;
              __be32 srcv6[4];
      };
      union {
              __be32 dst;
              __be32 dstv6[4];
      };
      union {
              __u32 ports;
              __u16 port16[2];
      };
      __u8 proto;
};
struct packet_description {
      struct flow_key flow;
      __u8 flags;
};
static __attribute__ ((noinline))
bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
              struct packet_description *pckt,
              struct real_definition *dst, __u32 pkt_bytes)
{
  ...
  __u32 ip_suffix = bpf_ntohs(pckt->flow.port16[0]);
  ...
  ip_suffix ^= pckt->flow.src;
  ...
}
__attribute__ ((noinline))
static int process_packet(...) {
  struct packet_description pckt = { };
  ...
  encap_v4(xdp, cval, &pckt, dst, pkt_bytes)
  ...
}

In function encap_v4, for argument pckt, only pckt->flow_port16[0]
and pckt->flow_src are referenced.

Before ArgumentPromotionPass pass, encap_v4 signature (in IR):

i1 @encap_v4(ptr noundef %xdp, ptr noundef %cval, ptr noundef %pckt, ptr noundef %dst, i32 noundef %pkt_bytes)

After ArgumentPromotionPass pass, encap_v4 signature (in IR):

i1 @encap_v4(ptr noundef %xdp, ptr noundef %cval, i32 %pckt.0.val, i16 %pckt.32.val, ptr noundef %dst, i32 noundef %pkt_bytes)

Basically, the argument 'pckt' is replaced with two arguments corresponding two pckt dereferences
inside encap_v4. But this transformation increases the number of arguments to 6 and eventually
bpf backend complains and errors out.

What is the proper way to resolve this? Adding a TTI hook to get target maximum number of
arguments? E.g, if -1, means no constraints, otherwise, it should return a >= 0 number which
indicates the maximum number of allowed arguments? Is it possible ArgumentPromotionPass may
replace the old reference parameter with a new struct argument with struct size > 8? If this
is the case we will have problem too.

Or for simplicity, simply disable this transformation for bpf backend?

nikic added a comment.Apr 17 2023, 5:18 AM

The bpf backend calling convention is only allowing 5
parameters in registers and does not allow pass arguments
through stacks.

To clarify, is this a limitation of the BPF backend, or BPF itself? That is, is the backend just missing legalization support for converting too many arguments into pass by stack?

Generally I'd be fine with a TTI hook to control the maximum number of arguments though.

yonghong-song added a comment.EditedApr 17 2023, 7:16 AM

@nikic This is a BPF limitation. See the following kernel document:

https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html#q-can-more-than-5-function-arguments-be-supported-in-the-future
Q: Can more than 5 function arguments be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF calling convention only allows registers R1-R5 to be used
as arguments. BPF is not a standalone instruction set.
(unlike x64 ISA that allows msft, cdecl and other conventions)

I will propose a TTI hook then.

Hi, this causes test failures on chromium's crashpad_tests.

Note: Google Test filter = StartHandlerForSelfTestSuite/StartHandlerForSelfTest.StartHandlerInChild/94
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from StartHandlerForSelfTestSuite/StartHandlerForSelfTest
[ RUN      ] StartHandlerForSelfTestSuite/StartHandlerForSelfTest.StartHandlerInChild/94
ERROR crashpad_tests[4189468:4189468]: [file_io.cc(94)] ReadExactly: expected 1, observed 0
../../third_party/crashpad/crashpad/client/crashpad_client_linux_test.cc:474: Failure
Value of: LoggingReadFileExactly(ReadPipeHandle(), &c, sizeof(c))
  Actual: false
Expected: true
Stack trace:
#0 0x55ac0ffc6334 crashpad::test::(anonymous namespace)::StartHandlerForSelfInChildTest::MultiprocessParent()
#1 0x55ac1033b844 crashpad::test::Multiprocess::RunParent()
#2 0x55ac1033b322 crashpad::test::Multiprocess::Run()
#3 0x55ac0ffc5cb7 crashpad::test::(anonymous namespace)::StartHandlerForSelfTest_StartHandlerInChild_Test::TestBody()

../../third_party/crashpad/crashpad/client/crashpad_client_linux_test.cc:494: Failure
Expected equality of these values:
  reports.size()
    Which is: 0
  report_expected ? 1u : 0u
    Which is: 1
Stack trace:
#0 0x55ac0ffc666f crashpad::test::(anonymous namespace)::StartHandlerForSelfInChildTest::MultiprocessParent()
#1 0x55ac1033b844 crashpad::test::Multiprocess::RunParent()
#2 0x55ac1033b322 crashpad::test::Multiprocess::Run()
#3 0x55ac0ffc5cb7 crashpad::test::(anonymous namespace)::StartHandlerForSelfTest_StartHandlerInChild_Test::TestBody()

../../third_party/crashpad/crashpad/test/multiprocess_posix.cc:139: Failure
Failed
Child exited with code 0, expected termination by signal 11 (Segmentation fault)
Stack trace:
#0 0x55ac1033b733 crashpad::test::Multiprocess::Run()
#1 0x55ac0ffc5cb7 crashpad::test::(anonymous namespace)::StartHandlerForSelfTest_StartHandlerInChild_Test::TestBody()

[  FAILED  ] StartHandlerForSelfTestSuite/StartHandlerForSelfTest.StartHandlerInChild/94, where GetParam() = (true, false, true, true, true, 4-byte object <02-00 00-00>) (6 ms)
[----------] 1 test from StartHandlerForSelfTestSuite/StartHandlerForSelfTest (6 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (6 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] StartHandlerForSelfTestSuite/StartHandlerForSelfTest.StartHandlerInChild/94, where GetParam() = (true, false, true, true, true, 4-byte object <02-00 00-00>)

Hi, this causes test failures on chromium's crashpad_tests.

NVM, it's due to flaky tests.