1 files changed, 658 insertions, 0 deletions
diff --git a/meta/recipes-devtools/gcc/gcc-10.1/0003-aarch64-Mitigate-SLS-for-BLR-instruction.patch b/meta/recipes-devtools/gcc/gcc-10.1/0003-aarch64-Mitigate-SLS-for-BLR-instruction.patch
new file mode 100644
index 0000000000..716a367172
--- /dev/null
+++ b/meta/recipes-devtools/gcc/gcc-10.1/0003-aarch64-Mitigate-SLS-for-BLR-instruction.patch
@@ -0,0 +1,658 @@
+Upstream-Status: Backport
+Signed-off-by: Ross Burton <ross.burton@arm.com>
+
+From a5e7efc40ed841934c1d913f39476afa17d8e5f7 Mon Sep 17 00:00:00 2001
+From: Matthew Malcomson <matthew.malcomson@arm.com>
+Date: Thu, 9 Jul 2020 09:11:59 +0100
+Subject: [PATCH 3/3] aarch64: Mitigate SLS for BLR instruction
+
+This patch introduces the mitigation for Straight Line Speculation past
+the BLR instruction.
+
+This mitigation replaces BLR instructions with a BL to a stub which uses
+a BR to jump to the original value.  These function stubs are then
+appended with a speculation barrier to ensure no straight line
+speculation happens after these jumps.
+
+When optimising for speed we use a set of stubs for each function since
+this should help the branch predictor make more accurate predictions
+about where a stub should branch.
+
+When optimising for size we use one set of stubs for all functions.
+This set of stubs can have human readable names, and we are using
+`__call_indirect_x<N>` for register x<N>.
+
+When BTI branch protection is enabled the BLR instruction can jump to a
+`BTI c` instruction using any register, while the BR instruction can
+only jump to a `BTI c` instruction using the x16 or x17 registers.
+Hence, in order to ensure this transformation is safe we mov the value
+of the original register into x16 and use x16 for the BR.
+
+As an example when optimising for size:
+a
+    BLR x0
+instruction would get transformed to something like
+    BL __call_indirect_x0
+where __call_indirect_x0 labels a thunk that contains
+__call_indirect_x0:
+    MOV X16, X0
+    BR X16
+    <speculation barrier>
+
+The first version of this patch used local symbols specific to a
+compilation unit to try and avoid relocations.
+This was mistaken since functions coming from the same compilation unit
+can still be in different sections, and the assembler will insert
+relocations at jumps between sections.
+
+On any relocation the linker is permitted to emit a veneer to handle
+jumps between symbols that are very far apart.  The registers x16 and
+x17 may be clobbered by these veneers.
+Hence the function stubs cannot rely on the values of x16 and x17 being
+the same as just before the function stub is called.
+
+Similar can be said for the hot/cold partitioning of single functions,
+so function-local stubs have the same restriction.
+
+This updated version of the patch never emits function stubs for x16 and
+x17, and instead forces other registers to be used.
+
+Given the above, there is now no benefit to local symbols (since they
+are not enough to avoid dealing with linker intricacies).  This patch
+now uses global symbols with hidden visibility each stored in their own
+COMDAT section.  This means stubs can be shared between compilation
+units while still avoiding the PLT indirection.
+
+This patch also removes the `__call_indirect_x30` stub (and
+function-local equivalent) which would simply jump back to the original
+location.
+
+The function-local stubs are emitted to the assembly output file in one
+chunk, which means we need not add the speculation barrier directly
+after each one.
+This is because we know for certain that the instructions directly after
+the BR in all but the last function stub will be from another one of
+these stubs and hence will not contain a speculation gadget.
+Instead we add a speculation barrier at the end of the sequence of
+stubs.
+
+The global stubs are emitted in COMDAT/.linkonce sections by
+themselves so that the linker can remove duplicates from multiple object
+files.  This means they are not emitted in one chunk, and each one must
+include the speculation barrier.
+
+Another difference is that since the global stubs are shared across
+compilation units we do not know that all functions will be targeting an
+architecture supporting the SB instruction.
+Rather than provide multiple stubs for each architecture, we provide a
+stub that will work for all architectures -- using the DSB+ISB barrier.
+
+This mitigation does not apply for BLR instructions in the following
+places:
+- Some accesses to thread-local variables use a code sequence with a BLR
+  instruction.  This code sequence is part of the binary interface between
+  compiler and linker. If this BLR instruction needs to be mitigated, it'd
+  probably be best to do so in the linker. It seems that the code sequence
+  for thread-local variable access is unlikely to lead to a Spectre Revalation
+  Gadget.
+- PLT stubs are produced by the linker and each contain a BLR instruction.
+  It seems that at most only after the last PLT stub a Spectre Revalation
+  Gadget might appear.
+
+Testing:
+  Bootstrap and regtest on AArch64
+    (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
+  Used a temporary hack(1) in gcc-dg.exp to use these options on every
+  test in the testsuite, a slight modification to emit the speculation
+  barrier after every function stub, and a script to check that the
+  output never emitted a BLR, or unmitigated BR or RET instruction.
+  Similar on an aarch64-none-elf cross-compiler.
+
+1) Temporary hack emitted a speculation barrier at the end of every stub
+function, and used a script to ensure that:
+  a) Every RET or BR is immediately followed by a speculation barrier.
+  b) No BLR instruction is emitted by compiler.
+
+gcc/ChangeLog:
+
+	* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
+	New declaration.
+	* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
+	stub registers class.
+	(aarch64_class_max_nregs): Likewise.
+	(aarch64_register_move_cost): Likewise.
+	(aarch64_sls_shared_thunks): Global array to store stub labels.
+	(aarch64_sls_emit_function_stub): New.
+	(aarch64_create_blr_label): New.
+	(aarch64_sls_emit_blr_function_thunks): New.
+	(aarch64_sls_emit_shared_blr_thunks): New.
+	(aarch64_asm_file_end): New.
+	(aarch64_indirect_call_asm): New.
+	(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
+	(TARGET_ASM_FUNCTION_EPILOGUE): Use
+	aarch64_sls_emit_blr_function_thunks.
+	* config/aarch64/aarch64.h (STB_REGNUM_P): New.
+	(enum reg_class): Add STUB_REGS class.
+	(machine_function): Introduce `call_via` array for
+	function-local stub labels.
+	* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
+	aarch64_indirect_call_asm to emit code when hardening BLR
+	instructions.
+	* config/aarch64/constraints.md (Ucr): New constraint
+	representing registers for indirect calls.  Is GENERAL_REGS
+	usually, and STUB_REGS when hardening BLR instruction against
+	SLS.
+	* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
+	is also a general register.
+
+gcc/testsuite/ChangeLog:
+
+	* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
+	* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.
+---
+ gcc/config/aarch64/aarch64-protos.h                |   1 +
+ gcc/config/aarch64/aarch64.c                       | 225 ++++++++++++++++++++-
+ gcc/config/aarch64/aarch64.h                       |  15 ++
+ gcc/config/aarch64/aarch64.md                      |  11 +-
+ gcc/config/aarch64/constraints.md                  |   9 +
+ gcc/config/aarch64/predicates.md                   |   3 +-
+ .../aarch64/sls-mitigation/sls-miti-blr-bti.c      |  40 ++++
+ .../aarch64/sls-mitigation/sls-miti-blr.c          |  33 +++
+ 8 files changed, 328 insertions(+), 9 deletions(-)
+ create mode 100644 gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
+ create mode 100644 gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
+
+diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
+index ee0ffde..839f801 100644
+--- a/gcc/config/aarch64/aarch64-protos.h
++++ b/gcc/config/aarch64/aarch64-protos.h
+@@ -782,6 +782,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
+ tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
+ 
+ const char *aarch64_sls_barrier (int);
++const char *aarch64_indirect_call_asm (rtx);
+ extern bool aarch64_harden_sls_retbr_p (void);
+ extern bool aarch64_harden_sls_blr_p (void);
+ 
+diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
+index 2389d49..0f7bba3 100644
+--- a/gcc/config/aarch64/aarch64.c
++++ b/gcc/config/aarch64/aarch64.c
+@@ -10605,6 +10605,9 @@ aarch64_label_mentioned_p (rtx x)
+ enum reg_class
+ aarch64_regno_regclass (unsigned regno)
+ {
++  if (STUB_REGNUM_P (regno))
++    return STUB_REGS;
++
+   if (GP_REGNUM_P (regno))
+     return GENERAL_REGS;
+ 
+@@ -10939,6 +10942,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
+   unsigned int nregs, vec_flags;
+   switch (regclass)
+     {
++    case STUB_REGS:
+     case TAILCALL_ADDR_REGS:
+     case POINTER_REGS:
+     case GENERAL_REGS:
+@@ -13155,10 +13159,12 @@ aarch64_register_move_cost (machine_mode mode,
+     = aarch64_tune_params.regmove_cost;
+ 
+   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
+-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
++  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
++      || to == STUB_REGS)
+     to = GENERAL_REGS;
+ 
+-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
++  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
++      || from == STUB_REGS)
+     from = GENERAL_REGS;
+ 
+   /* Make RDFFR very expensive.  In particular, if we know that the FFR
+@@ -22957,6 +22963,215 @@ aarch64_sls_barrier (int mitigation_required)
+     : "";
+ }
+ 
++static GTY (()) tree aarch64_sls_shared_thunks[30];
++static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
++const char *indirect_symbol_names[30] = {
++    "__call_indirect_x0",
++    "__call_indirect_x1",
++    "__call_indirect_x2",
++    "__call_indirect_x3",
++    "__call_indirect_x4",
++    "__call_indirect_x5",
++    "__call_indirect_x6",
++    "__call_indirect_x7",
++    "__call_indirect_x8",
++    "__call_indirect_x9",
++    "__call_indirect_x10",
++    "__call_indirect_x11",
++    "__call_indirect_x12",
++    "__call_indirect_x13",
++    "__call_indirect_x14",
++    "__call_indirect_x15",
++    "", /* "__call_indirect_x16",  */
++    "", /* "__call_indirect_x17",  */
++    "__call_indirect_x18",
++    "__call_indirect_x19",
++    "__call_indirect_x20",
++    "__call_indirect_x21",
++    "__call_indirect_x22",
++    "__call_indirect_x23",
++    "__call_indirect_x24",
++    "__call_indirect_x25",
++    "__call_indirect_x26",
++    "__call_indirect_x27",
++    "__call_indirect_x28",
++    "__call_indirect_x29",
++};
++
++/* Function to create a BLR thunk.  This thunk is used to mitigate straight
++   line speculation.  Instead of a simple BLR that can be speculated past,
++   we emit a BL to this thunk, and this thunk contains a BR to the relevant
++   register.  These thunks have the relevant speculation barries put after
++   their indirect branch so that speculation is blocked.
++
++   We use such a thunk so the speculation barriers are kept off the
++   architecturally executed path in order to reduce the performance overhead.
++
++   When optimizing for size we use stubs shared by the linked object.
++   When optimizing for performance we emit stubs for each function in the hope
++   that the branch predictor can better train on jumps specific for a given
++   function.  */
++rtx
++aarch64_sls_create_blr_label (int regnum)
++{
++  gcc_assert (STUB_REGNUM_P (regnum));
++  if (optimize_function_for_size_p (cfun))
++    {
++      /* For the thunks shared between different functions in this compilation
++	 unit we use a named symbol -- this is just for users to more easily
++	 understand the generated assembly.  */
++      aarch64_sls_shared_thunks_needed = true;
++      const char *thunk_name = indirect_symbol_names[regnum];
++      if (aarch64_sls_shared_thunks[regnum] == NULL)
++	{
++	  /* Build a decl representing this function stub and record it for
++	     later.  We build a decl here so we can use the GCC machinery for
++	     handling sections automatically (through `get_named_section` and
++	     `make_decl_one_only`).  That saves us a lot of trouble handling
++	     the specifics of different output file formats.  */
++	  tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
++				  get_identifier (thunk_name),
++				  build_function_type_list (void_type_node,
++							    NULL_TREE));
++	  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
++					   NULL_TREE, void_type_node);
++	  TREE_PUBLIC (decl) = 1;
++	  TREE_STATIC (decl) = 1;
++	  DECL_IGNORED_P (decl) = 1;
++	  DECL_ARTIFICIAL (decl) = 1;
++	  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
++	  resolve_unique_section (decl, 0, false);
++	  aarch64_sls_shared_thunks[regnum] = decl;
++	}
++
++      return gen_rtx_SYMBOL_REF (Pmode, thunk_name);
++    }
++
++  if (cfun->machine->call_via[regnum] == NULL)
++    cfun->machine->call_via[regnum]
++      = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ());
++  return cfun->machine->call_via[regnum];
++}
++
++/* Helper function for aarch64_sls_emit_blr_function_thunks and
++   aarch64_sls_emit_shared_blr_thunks below.  */
++static void
++aarch64_sls_emit_function_stub (FILE *out_file, int regnum)
++{
++  /* Save in x16 and branch to that function so this transformation does
++     not prevent jumping to `BTI c` instructions.  */
++  asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum);
++  asm_fprintf (out_file, "\tbr\tx16\n");
++}
++
++/* Emit all BLR stubs for this particular function.
++   Here we emit all the BLR stubs needed for the current function.  Since we
++   emit these stubs in a consecutive block we know there will be no speculation
++   gadgets between each stub, and hence we only emit a speculation barrier at
++   the end of the stub sequences.
++
++   This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook.  */
++void
++aarch64_sls_emit_blr_function_thunks (FILE *out_file)
++{
++  if (! aarch64_harden_sls_blr_p ())
++    return;
++
++  bool any_functions_emitted = false;
++  /* We must save and restore the current function section since this assembly
++     is emitted at the end of the function.  This means it can be emitted *just
++     after* the cold section of a function.  That cold part would be emitted in
++     a different section.  That switch would trigger a `.cfi_endproc` directive
++     to be emitted in the original section and a `.cfi_startproc` directive to
++     be emitted in the new section.  Switching to the original section without
++     restoring would mean that the `.cfi_endproc` emitted as a function ends
++     would happen in a different section -- leaving an unmatched
++     `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
++     in the standard text section.  */
++  section *save_text_section = in_section;
++  switch_to_section (function_section (current_function_decl));
++  for (int regnum = 0; regnum < 30; ++regnum)
++    {
++      rtx specu_label = cfun->machine->call_via[regnum];
++      if (specu_label == NULL)
++	continue;
++
++      targetm.asm_out.print_operand (out_file, specu_label, 0);
++      asm_fprintf (out_file, ":\n");
++      aarch64_sls_emit_function_stub (out_file, regnum);
++      any_functions_emitted = true;
++    }
++  if (any_functions_emitted)
++    /* Can use the SB if needs be here, since this stub will only be used
++      by the current function, and hence for the current target.  */
++    asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true));
++  switch_to_section (save_text_section);
++}
++
++/* Emit shared BLR stubs for the current compilation unit.
++   Over the course of compiling this unit we may have converted some BLR
++   instructions to a BL to a shared stub function.  This is where we emit those
++   stub functions.
++   This function is for the stubs shared between different functions in this
++   compilation unit.  We share when optimizing for size instead of speed.
++
++   This function is called through the TARGET_ASM_FILE_END hook.  */
++void
++aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
++{
++  if (! aarch64_sls_shared_thunks_needed)
++    return;
++
++  for (int regnum = 0; regnum < 30; ++regnum)
++    {
++      tree decl = aarch64_sls_shared_thunks[regnum];
++      if (!decl)
++	continue;
++
++      const char *name = indirect_symbol_names[regnum];
++      switch_to_section (get_named_section (decl, NULL, 0));
++      ASM_OUTPUT_ALIGN (out_file, 2);
++      targetm.asm_out.globalize_label (out_file, name);
++      /* Only emits if the compiler is configured for an assembler that can
++	 handle visibility directives.  */
++      targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
++      ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function");
++      ASM_OUTPUT_LABEL (out_file, name);
++      aarch64_sls_emit_function_stub (out_file, regnum);
++      /* Use the most conservative target to ensure it can always be used by any
++	 function in the translation unit.  */
++      asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
++      ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl);
++    }
++}
++
++/* Implement TARGET_ASM_FILE_END.  */
++void
++aarch64_asm_file_end ()
++{
++  aarch64_sls_emit_shared_blr_thunks (asm_out_file);
++  /* Since this function will be called for the ASM_FILE_END hook, we ensure
++     that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
++     for FreeBSD) still gets called.  */
++#ifdef TARGET_ASM_FILE_END
++  TARGET_ASM_FILE_END ();
++#endif
++}
++
++const char *
++aarch64_indirect_call_asm (rtx addr)
++{
++  gcc_assert (REG_P (addr));
++  if (aarch64_harden_sls_blr_p ())
++    {
++      rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
++      output_asm_insn ("bl\t%0", &stub_label);
++    }
++  else
++   output_asm_insn ("blr\t%0", &addr);
++  return "";
++}
++
+ /* Target-specific selftests.  */
+ 
+ #if CHECKING_P
+@@ -23507,6 +23722,12 @@ aarch64_libgcc_floating_mode_supported_p
+ #undef TARGET_MD_ASM_ADJUST
+ #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+ 
++#undef TARGET_ASM_FILE_END
++#define TARGET_ASM_FILE_END aarch64_asm_file_end
++
++#undef TARGET_ASM_FUNCTION_EPILOGUE
++#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
++
+ struct gcc_target targetm = TARGET_INITIALIZER;
+ 
+ #include "gt-aarch64.h"
+diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
+index 8e0fc37..7331450 100644
+--- a/gcc/config/aarch64/aarch64.h
++++ b/gcc/config/aarch64/aarch64.h
+@@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version;
+ #define GP_REGNUM_P(REGNO)						\
+   (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM))
+ 
++/* Registers known to be preserved over a BL instruction.  This consists of the
++   GENERAL_REGS without x16, x17, and x30.  The x30 register is changed by the
++   BL instruction itself, while the x16 and x17 registers may be used by
++   veneers which can be inserted by the linker.  */
++#define STUB_REGNUM_P(REGNO) \
++  (GP_REGNUM_P (REGNO) \
++   && (REGNO) != R16_REGNUM \
++   && (REGNO) != R17_REGNUM \
++   && (REGNO) != R30_REGNUM) \
++
+ #define FP_REGNUM_P(REGNO)			\
+   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
+ 
+@@ -667,6 +677,7 @@ enum reg_class
+ {
+   NO_REGS,
+   TAILCALL_ADDR_REGS,
++  STUB_REGS,
+   GENERAL_REGS,
+   STACK_REG,
+   POINTER_REGS,
+@@ -689,6 +700,7 @@ enum reg_class
+ {						\
+   "NO_REGS",					\
+   "TAILCALL_ADDR_REGS",				\
++  "STUB_REGS",					\
+   "GENERAL_REGS",				\
+   "STACK_REG",					\
+   "POINTER_REGS",				\
+@@ -708,6 +720,7 @@ enum reg_class
+ {									\
+   { 0x00000000, 0x00000000, 0x00000000 },	/* NO_REGS */		\
+   { 0x00030000, 0x00000000, 0x00000000 },	/* TAILCALL_ADDR_REGS */\
++  { 0x3ffcffff, 0x00000000, 0x00000000 },	/* STUB_REGS */		\
+   { 0x7fffffff, 0x00000000, 0x00000003 },	/* GENERAL_REGS */	\
+   { 0x80000000, 0x00000000, 0x00000000 },	/* STACK_REG */		\
+   { 0xffffffff, 0x00000000, 0x00000003 },	/* POINTER_REGS */	\
+@@ -862,6 +875,8 @@ typedef struct GTY (()) machine_function
+   struct aarch64_frame frame;
+   /* One entry for each hard register.  */
+   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
++  /* One entry for each general purpose register.  */
++  rtx call_via[SP_REGNUM];
+   bool label_is_assembled;
+ } machine_function;
+ #endif
+diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
+index dda04ee..43da754 100644
+--- a/gcc/config/aarch64/aarch64.md
++++ b/gcc/config/aarch64/aarch64.md
+@@ -1022,16 +1022,15 @@
+ )
+ 
+ (define_insn "*call_insn"
+-  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
++  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf"))
+ 	 (match_operand 1 "" ""))
+    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
+    (clobber (reg:DI LR_REGNUM))]
+   ""
+   "@
+-  blr\\t%0
++  * return aarch64_indirect_call_asm (operands[0]);
+   bl\\t%c0"
+-  [(set_attr "type" "call, call")]
+-)
++  [(set_attr "type" "call, call")])
+ 
+ (define_expand "call_value"
+   [(parallel
+@@ -1050,13 +1049,13 @@
+ 
+ (define_insn "*call_value_insn"
+   [(set (match_operand 0 "" "")
+-	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
++	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf"))
+ 		      (match_operand 2 "" "")))
+    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
+    (clobber (reg:DI LR_REGNUM))]
+   ""
+   "@
+-  blr\\t%1
++  * return aarch64_indirect_call_asm (operands[1]);
+   bl\\t%c1"
+   [(set_attr "type" "call, call")]
+ )
+diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
+index d993268..8cc6f50 100644
+--- a/gcc/config/aarch64/constraints.md
++++ b/gcc/config/aarch64/constraints.md
+@@ -24,6 +24,15 @@
+ (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
+   "@internal Registers suitable for an indirect tail call")
+ 
++(define_register_constraint "Ucr"
++    "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS"
++  "@internal Registers to be used for an indirect call.
++   This is usually the general registers, but when we are hardening against
++   Straight Line Speculation we disallow x16, x17, and x30 so we can use
++   indirection stubs.  These indirection stubs cannot use the above registers
++   since they will be reached by a BL that may have to go through a linker
++   veneer.")
++
+ (define_register_constraint "w" "FP_REGS"
+   "Floating point and SIMD vector registers.")
+ 
+diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
+index 215fcec..1754b1e 100644
+--- a/gcc/config/aarch64/predicates.md
++++ b/gcc/config/aarch64/predicates.md
+@@ -32,7 +32,8 @@
+ 
+ (define_predicate "aarch64_general_reg"
+   (and (match_operand 0 "register_operand")
+-       (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
++       (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS
++		    || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
+ 
+ ;; Return true if OP a (const_int 0) operand.
+ (define_predicate "const0_operand"
+diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
+new file mode 100644
+index 0000000..b1fb754
+--- /dev/null
++++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
+@@ -0,0 +1,40 @@
++/* { dg-do compile } */
++/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
++/*
++   Ensure that the SLS hardening of BLR leaves no BLR instructions.
++   Here we also check that there are no BR instructions with anything except an
++   x16 or x17 register.  This is because a `BTI c` instruction can be branched
++   to using a BLR instruction using any register, but can only be branched to
++   with a BR using an x16 or x17 register.
++  */
++typedef int (foo) (int, int);
++typedef void (bar) (int, int);
++struct sls_testclass {
++    foo *x;
++    bar *y;
++    int left;
++    int right;
++};
++
++/* We test both RTL patterns for a call which returns a value and a call which
++   does not.  */
++int blr_call_value (struct sls_testclass x)
++{
++  int retval = x.x(x.left, x.right);
++  if (retval % 10)
++    return 100;
++  return 9;
++}
++
++int blr_call (struct sls_testclass x)
++{
++  x.y(x.left, x.right);
++  if (x.left % 10)
++    return 100;
++  return 9;
++}
++
++/* { dg-final { scan-assembler-not {\tblr\t} } } */
++/* { dg-final { scan-assembler-not {\tbr\tx(?!16|17)} } } */
++/* { dg-final { scan-assembler {\tbr\tx(16|17)} } } */
++
+diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
+new file mode 100644
+index 0000000..88bafff
+--- /dev/null
++++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
+@@ -0,0 +1,33 @@
++/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
++/* Ensure that the SLS hardening of BLR leaves no BLR instructions.
++   We only test that all BLR instructions have been removed, not that the
++   resulting code makes sense.  */
++typedef int (foo) (int, int);
++typedef void (bar) (int, int);
++struct sls_testclass {
++    foo *x;
++    bar *y;
++    int left;
++    int right;
++};
++
++/* We test both RTL patterns for a call which returns a value and a call which
++   does not.  */
++int blr_call_value (struct sls_testclass x)
++{
++  int retval = x.x(x.left, x.right);
++  if (retval % 10)
++    return 100;
++  return 9;
++}
++
++int blr_call (struct sls_testclass x)
++{
++  x.y(x.left, x.right);
++  if (x.left % 10)
++    return 100;
++  return 9;
++}
++
++/* { dg-final { scan-assembler-not {\tblr\t} } } */
++/* { dg-final { scan-assembler {\tbr\tx[0-9][0-9]?} } } */
+-- 
+2.7.4
+