Page MenuHomePhabricator

[AArch64] Fix a crash in driver
AbandonedPublic

Authored by ahatanak on Nov 6 2015, 5:46 PM.

Details

Reviewers
rengolin
Summary

The crash can be reproduced with the following command on an x86 machine:

$ clang -arch arm64 -mtune=native -c hello.c

The bug is in AddAArch64TargetArgs where it invokes A->getAsString(Args) without checking A is null (and local variable A isn't initialized in this function).

In this patch, I changed the line to first check if A is null. If it is null, it will print either "native" or the target CPU name getAArchTargetCPU returns. I added the check for "native" so that the cpu name provided by the user using -mtune or -mcpu is printed rather than the host cpu name (e.g., haswell):

clang-3.8: error: the clang compiler does not support 'native'

Diff Detail

Event Timeline

ahatanak updated this revision to Diff 39612.Nov 6 2015, 5:46 PM
ahatanak retitled this revision from to [AArch64] Fix a crash in driver.
ahatanak updated this object.

Send the patch to cfe-commits.

Hi Akira,

I'm uncomfortable with this change, since it introduces a dependency between the two calls, and that's fragile. Also, the nullptr fiddling is not a good design overall.

If there is a dependency, I suggest you encode it directly into getAArch64TargetFeatures() and not involve getAArch64TargetCPU() at all.

Maybe getAArch64TargetCPU() should just return "generic" for null pointer?

cheers,
--renato

Hi Renato

I'm uncomfortable with this change, since it introduces a dependency between the two calls, and that's fragile. Also, the nullptr fiddling is not a good design overall.

If there is a dependency, I suggest you encode it directly into getAArch64TargetFeatures() and not involve getAArch64TargetCPU() at all.

Could you elaborate on your comments?
Which two calls? Do you mean getAArch64TargetCPU and getAArch64ArchFeaturesFromMcpu?

Maybe getAArch64TargetCPU() should just return "generic" for null pointer?

Do you mean getAArch64TargetCPU should return "generic" when IsNative is null?

Which two calls? Do you mean getAArch64TargetCPU and getAArch64ArchFeaturesFromMcpu?

No, the two getAArch64TargetCPU calls, but they don't even get called twice, and the value is write-only. Ignore that.

Do you mean getAArch64TargetCPU should return "generic" when IsNative is null?

Something like that, yes. The problem here is that you're leaking out a specific fact about the cpu which should be handled by the function and not left as an exercise to the reader, in this case, the Diag message. Every other Diag or usage of that function will have to use the same convoluted pattern if they were to account for the "native" handling.

However, if you just return "native", then both getAArch64MicroArchFeaturesFromMcpu() and D.Diag() will be able to use it as any other CPU name. If getAArch64MicroArchFeaturesFromMcpu() can't cope with "native", than I think it should be changed to do so.

Makes sense?

cheers,
--renato

I think I understand some of your concerns, but I'm not sure I fully understand what you are suggesting.

I think I can use macro aarch64 to have getAArch64TargetCPU return "native" when the compiler is not run on an AArch64 platform, but it doesn't sound like that was what you had in mind?

if (CPU == "native")
#ifdef aarch64

return llvm::sys::getHostCPUName();

#else

return "native"; // or "return CPU"

#endif

This requires fewer changes than the patch I proposed and it will print "native" in the error message, which is what I am looking for.

I think I can use macro aarch64 to have getAArch64TargetCPU return "native" when the compiler is not run on an AArch64 platform, but it doesn't sound like that was what you had in mind?

Not at all. :)

It seems like a simple thing really, but it piles up pretty quickly.

The command line:

$ clang -arch arm64 -mtune=native -c hello.c

doesn't make sense on an x86 machine at all. "native" means it should understand the host as an ARM64 hardware, which it's not. This should bail on error.

You haven't provided any tests (you should really, what should work and what should return errors), so I can't tell just by looking at the code what you expect the new behaviour to be. I'm assuming you still want things to break, but you want your error message to have a "native" in it instead of crashing the compiler.

Back to the code...

Passing the pointer to a boolean is leaking logic from getAArch64TargetCPU out. If you know inside that the CPU is "native", why not just return that instead? Would that break other things? If so, these other things should be fixed. Fudging the parameters and keeping every other caller in the dark (because you're overriding the bool pointer) is certainly not the way we want to go, because that would mean two different behaviours depending on how you call the function.

If the function returns a string with the CPU name, and the CPU name is "native", than logic dictates that the return value should be std::string("native");

If the function wants to add additional logic to grab the native CPU name and return that, it's extra. If it can't find the CPU name, it should return "native", not "generic" or "cyclone".

Makes sense?

cheers,
--renato

I think I can use macro aarch64 to have getAArch64TargetCPU return "native" when the compiler is not run on an AArch64 platform, but it doesn't sound like that was what you had in mind?

Not at all. :)

It seems like a simple thing really, but it piles up pretty quickly.

The command line:

$ clang -arch arm64 -mtune=native -c hello.c
 

doesn't make sense on an x86 machine at all. "native" means it should understand the host as an ARM64 hardware, which it's not. This should bail on error.

I agree and that is what I'm trying to do. It should error out with something like "native is not supported".

You haven't provided any tests (you should really, what should work and what should return errors), so I can't tell just by looking at the code what you expect the new behaviour to be. I'm assuming you still want things to break, but you want your error message to have a "native" in it instead of crashing the compiler.

That's correct. The error message I'm looking for is something like the one I wrote in the summary:

error: the clang compiler does not support 'native'

I didn't include a test case because I didn't know how to write a test that passes on an aarch64 host and fails on anything else. Do you know of any test cases in trunk that pass or fail depending on which host it is running on?

Back to the code...

Passing the pointer to a boolean is leaking logic from getAArch64TargetCPU out. If you know inside that the CPU is "native", why not just return that instead? Would that break other things? If so, these other things should be fixed. Fudging the parameters and keeping every other caller in the dark (because you're overriding the bool pointer) is certainly not the way we want to go, because that would mean two different behaviours depending on how you call the function.

Wouldn't it break on an aarch64 host? With "-mtune=native", the current code in trunk will get the host cpu name (which I believe is currently always "generic" for aarch64). But if I apply this patch, it will error out because "native" is not a valid cpu name.

Sorry, there were mistakes in my comments.

What I meant to say is that changing getAArch64TargetCPU to return "native" would break the case where clang is being run on an aarch64 host. The current code in trunk will get the host cpu name (which I believe is currently always "generic" for aarch64) if "-mtune=native" is on the command line. If we change getAArch64TargetCPU to return "native", clang will error out because "native" is not a valid cpu name.

I didn't include a test case because I didn't know how to write a test that passes on an aarch64 host and fails on anything else. Do you know of any test cases in trunk that pass or fail depending on which host it is running on?

It's possible to do something with the python config, though I don't know how.

Wouldn't it break on an aarch64 host? With "-mtune=native", the current code in trunk will get the host cpu name (which I believe is currently always "generic" for aarch64). But if I apply this patch, it will error out because "native" is not a valid cpu name.

That's my point. It shouldn't break.

If you can get the CPU name, return it. If not, return "native". If you're on AArch64 and you can't get the CPU name, that's a bug that needs fixing. If it's always "generic", that's another piece of code that needs fixing. If whatever getCPU function you use doesn't return a valid name, and the name chosen was "native", you should return "native".

--renato

If you can get the CPU name, return it. If not, return "native". If you're on AArch64 and you can't get the CPU name, that's a bug that needs fixing. If it's always "generic", that's another piece of code that needs fixing. If whatever getCPU function you use doesn't return a valid name, and the name chosen was "native", you should return "native".

I'm guessing this is what you are suggesting:

In getAArch64TargetCPU, if it finds out the cpu name passed via -mtune or -mcpu is "native",

  1. Call llvm::sys::getHostCPUName to get the host CPU name.
  2. Check the host CPU name to see if it is a valid AArch64 CPU. A CPU is valid if it is one of these CPUs: cyclone, cortex-a53, cortex-a57, cortex-a72, or generic (this is the set of CPUs that are valid in DecodeAArch64Mcpu).
  3. If the host CPU is valid, return the CPU name. Otherwise, return "native".

In getAArch64TargetCPU, if it finds out the cpu name passed via -mtune or -mcpu is "native",

  1. Call llvm::sys::getHostCPUName to get the host CPU name.
  2. Check the host CPU name to see if it is a valid AArch64 CPU. A CPU is valid if it is one of these CPUs: cyclone, cortex-a53, cortex-a57, cortex-a72, or generic (this is the set of CPUs that are valid in DecodeAArch64Mcpu).
  3. If the host CPU is valid, return the CPU name. Otherwise, return "native".

Something like that, yes. Only return native if -mcpu/tune was native to begin with, and decode CPU names using the already existing mechanisms, not creating a new StringSwitch.

thanks,
--renato

ahatanak updated this revision to Diff 46725.Feb 2 2016, 4:39 PM

Sorry for taking so long to get back to this patch.

I've updated the patch based on your review comments. I defined a new helper function getAArch64FeaturesFromCPU which gets the target features from the cpu name and checks whether the cpu is a valid aarch64 cpu. This function is called from getAArch64TargetCPU and DecodeAArch64Mcpu to avoid duplicating the code. I also added a test case.

Hi,

I think it's clear now to me that this strategy isn't going to work. What you need is to add support for AArch64 in the TargetParser and simplify the name matching in the same way we did for ARM. There really is no other meaningful way of doing this.

Please check with Bradley and Alexandros, as they were the ones dealing with this last.

cheers,
--renato

lib/Driver/Tools.cpp
1026

You don't need the IsValid flag, since if the returned vector is empty, it has the same semantics.

Also, you're returning the vector by value, whereas the pattern used in this file is to pass a reference to the vector as an argument.

1042

You don't need to return a pair. Pass the vector by reference.

You may return a boolean for the IsValid, if there is a case where an empty vector is valid (no default properties). But as it stands, you don't really need to.

1047

Avoid using pairs as return type. This is really meant for extremely connected concepts, not to make C++ look like Perl.

1073

You could have used a different variable and avoided the ternary operator below.

1077

That's a misuse of this function. You're building a vector, filling it with stuff, copying it back to a temp and throwing it away just for the CPU name detection.

This is clearly a job for the TargetParser.

rengolin requested changes to this revision.Feb 3 2016, 1:27 AM
rengolin added a reviewer: rengolin.
This revision now requires changes to proceed.Feb 3 2016, 1:27 AM

If this patch was abandoned, please "Abandon" it. If not, please implement it using the AArch64 TargetParser.

Sorry for not replying for a long time. I'll get back to this soon.

ahatanak abandoned this revision.Sep 15 2016, 4:38 PM

The crash I was trying to fix was fixed in https://reviews.llvm.org/D23643.