Page MenuHomePhabricator

Place the BlockAddress type in the program address space
Needs ReviewPublic

Authored by arichardson on Jun 30 2018, 6:26 AM.

Details

Summary

While this should not matter for most architectures (where the program
address space is 0), it is important for CHERI. We use address space 200
for all of our code pointers and without this change we assert in
SelectionDAG handling of BlockAddress nodes.

It is also useful for AVR: previously programs targeting
AVR that attempt to read their own machine code
via a pointer to a label would instead read from RAM
using a pointer relative to the the start of program flash.

Diff Detail

Event Timeline

arichardson created this revision.Jun 30 2018, 6:26 AM
bjope added a comment.Jul 2 2018, 5:43 AM

I don't know much about the BlockAddress concept. The LangRef says things like "always has an i8* type" and "this may be passed around as an opaque pointer sized value". But I guess it would be weird if the size doesn't match the size of pointers in the program address space, so the patch makes sense to me.

I assume that this can't be reproduced for any in-tree target?
If you can't find an in-tree reproducer, then maybe you can describe the problem a little bit more instead. Such as which assert you hit, and maybe a small stack trace. That might help when trying to motivate this patch in the future.

I don't know much about the BlockAddress concept. The LangRef says things like "always has an i8* type" and "this may be passed around as an opaque pointer sized value". But I guess it would be weird if the size doesn't match the size of pointers in the program address space, so the patch makes sense to me.

I assume that this can't be reproduced for any in-tree target?
If you can't find an in-tree reproducer, then maybe you can describe the problem a little bit more instead. Such as which assert you hit, and maybe a small stack trace. That might help when trying to motivate this patch in the future.

Yes I'm not sure I can make a test for this with any of the existing targets. I'll see if I can get something with AVR since that sets program address space to 1.

Nice patch, this looks useful!

Yes I'm not sure I can make a test for this with any of the existing targets. I'll see if I can get something with AVR since that sets program address space to 1.

Here's a test for you that does it.

There's another bug in LLParser that stops nonzero program address spaces from working; if the function referenced in a block address is not known in the first pass of the LLParser (for example, when the blockaddress exists earlier in the IR file than the function definition, the LLParser must insert a forward reference for the function. It does this by creating a new global variable, but it unconditionally left the global variable in the default address space of zero.

The diff I have included has a fix for this.

I've also amended the LangRef docs so that they would be accurate under the new patch.

diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 06e092fb9fc..deac223d1a1 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -3275,7 +3275,16 @@ Addresses of Basic Blocks
 ``blockaddress(@function, %block)``
 
 The '``blockaddress``' constant computes the address of the specified
-basic block in the specified function, and always has an ``i8*`` type.
+basic block in the specified function.
+
+It always has an ``i8 addrspace(P)*`` type, where ``P`` is the program
+memory address space specified in the data layout. For targets that place
+code and data in the same address space (Von-Neumann architectures) a block
+address will have the same address space as data pointers, usually
+``addrspace(0)``. Block addresses on targets that have different data and
+code address spaces (Harvard architectures) will always be in the program
+memory address space specified in the target's data layout.
+
 Taking the address of the entry block is illegal.
 
 This value only has defined behavior when used as an operand to the
diff --git a/lib/AsmParser/LLParser.cpp b/lib/AsmParser/LLParser.cpp
index 5fe1e125d48..6581436c20f 100644
--- a/lib/AsmParser/LLParser.cpp
+++ b/lib/AsmParser/LLParser.cpp
@@ -3154,9 +3154,13 @@ bool LLParser::ParseValID(ValID &ID, PerFunctionState *PFS) {
                                               std::map<ValID, GlobalValue *>()))
               .first->second.insert(std::make_pair(std::move(Label), nullptr))
               .first->second;
-      if (!FwdRef)
+      if (!FwdRef) {
         FwdRef = new GlobalVariable(*M, Type::getInt8Ty(Context), false,
-                                    GlobalValue::InternalLinkage, nullptr, "");
+                                  GlobalValue::InternalLinkage, nullptr, "",
+                                  nullptr, GlobalValue::NotThreadLocal,
+                                  M->getDataLayout().getProgramAddressSpace());
+      }
+
       ID.ConstantVal = FwdRef;
       ID.Kind = ValID::t_Constant;
       return false;
diff --git a/test/CodeGen/AVR/block-address-is-in-progmem-space.ll b/test/CodeGen/AVR/block-address-is-in-progmem-space.ll
new file mode 100644
index 00000000000..8e6e3a71062
--- /dev/null
+++ b/test/CodeGen/AVR/block-address-is-in-progmem-space.ll
@@ -0,0 +1,51 @@
+; RUN: llc -mcpu=atmega328 < %s -march=avr | FileCheck %s
+
+; This test verifies that the pointer to a basic block
+; should always be a pointer in address space 1.
+;
+; If this were not the case, then programs targeting
+; AVR that attempted to read their own machine code
+; via a pointer to a label would actually read from RAM
+; using a pointer relative to the the start of program flash.
+;
+; This would cause a load of uninitialized memory, not even
+; touching the program's machine code as otherwise desired.
+
+target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
+
+; CHECK-LABEL: load_with_no_forward_reference
+define i8 @load_with_no_forward_reference(i8 %a, i8 %b) {
+second:
+  ; CHECK:      ldi r30, .Ltmp0+2
+  ; CHECK-NEXT: ldi r31, .Ltmp0+4
+  ; CHECK: lpm r24, Z
+  %bar = load i8, i8 addrspace(1)* blockaddress(@function_with_no_forward_reference, %second)
+  ret i8 %bar
+}
+
+; CHECK-LABEL: load_from_local_label
+define i8 @load_from_local_label(i8 %a, i8 %b) {
+entry:
+  %result1 = add i8 %a, %b
+
+  br label %second
+
+; CHECK-LABEL: .Ltmp1:
+second:
+  ; CHECK:      ldi r30, .Ltmp1+2
+  ; CHECK-NEXT: ldi r31, .Ltmp1+4
+  ; CHECK-NEXT: lpm r24, Z
+  %result2 = load i8, i8 addrspace(1)* blockaddress(@load_from_local_label, %second)
+  ret i8 %result2
+}
+
+; A function with no forward reference, right at the end
+; of the file.
+define i8 @function_with_no_forward_reference(i8 %a, i8 %b) {
+entry:
+  %result = add i8 %a, %b
+  br label %second
+second:
+  ret i8 0
+}
+
dylanmckay requested changes to this revision.Nov 15 2018, 11:05 PM
This revision now requires changes to proceed.Nov 15 2018, 11:05 PM

Rebase on latest master and merge suggestions

Herald added a project: Restricted Project. · View Herald Transcript
Herald added subscribers: Jim, hiraditya. · View Herald Transcript

clang-format

theraven accepted this revision.Sep 2 2020, 1:56 AM

Looks good to me. This bakes in the assumption that function pointers and basic block addresses are always in the same address space. That seems reasonable to me but it might be worth documenting in the DataLayout docs about the program address space.

arichardson edited the summary of this revision. (Show Details)Sep 7 2020, 2:28 AM

@dylanmckay does this change look good to you now?

arsenm added inline comments.Sep 9 2020, 8:02 AM
llvm/lib/AsmParser/LLParser.cpp
3394

Why wouldn't this come from the parent function? You should be able to mix functions with different address spaces in the same module

bjope added inline comments.Mon, Oct 5, 4:42 AM
llvm/lib/AsmParser/LLParser.cpp
3394

(Maybe @arichardson got a different reason, but sharing my point-of-view here anyway.)

While it's possible to annotate calls and functions definitions with non-zero program address spaces, I think one need to be consistent. I don't think we really support multiple program address spaces (is there an actual use case for supporting that?).

I'm also not exactly sure what you mean by "parent function". The addrspce in the resulting pointer type need to match the addrspace of the function referenced in the first argument of the blockaddress. And that function has not been defined yet, since we are inside the "!F" clause.

I guess we have to trust the datalayout if the function hasn't been defined yet (or use some kind of forward ref and backtrack to fill in addrspace to get the correct type later). I wonder if we'd get some kind of type error later if we assume that datalayout is correct here, and we find a different addrspace when finding the function definition later?

(We also got the usual problem that if datalayout is set by a datalayout definition that comes later in the ll file we haven't parsed the datalayout yet. But if I remember correclty that is a general problem also for the function definitions etc.)