Anatomy of Xen Alternative Infrastructure

Frankly speaking I think the name "alternative" is a bit unclear to outsider. It's used to patch kernel raw machine code during runtime. Why is it useful? It gives you a chance to selectively patch machine code according to CPU features and vendors.

Xen borrows a stripped down version of alternative infrastructure from Linux kernel. It's more concise and easier to understand, because Xen applies alternative instructions before SMP initialisation and it doesn't support altering instructions after everything else is up and running. The implementation in Linux is more complex as it has more functionalities. The core principle remains the same in Xen, however.

This infrastructure is only used in x86 architecture at the moment so files are placed under x86 folders. There are only two files, xen/arch/x86/alternative.c and xen/include/asm-x86/alternative.h.

Let's look at the header file first.

struct alt_instr {
    s32 instr_offset;       /* original instruction */
    s32 repl_offset;        /* offset to replacement instruction */
    u16 cpuid;              /* cpuid bit set for replacement */
    u8  instrlen;           /* length of original instruction */
    u8  replacementlen;     /* length of new instruction, <= instrlen */
};

#define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"

#define b_replacement(number)   "663"#number
#define e_replacement(number)   "664"#number

#define alt_slen "662b-661b"
#define alt_rlen(number) e_replacement(number)"f-"b_replacement(number)"f"

#define ALTINSTR_ENTRY(feature, number)                                       \
        " .long 661b - .\n"                             /* label           */ \
        " .long " b_replacement(number)"f - .\n"        /* new instruction */ \
        " .word " __stringify(feature) "\n"             /* feature bit     */ \
        " .byte " alt_slen "\n"                         /* source len      */ \
        " .byte " alt_rlen(number) "\n"                 /* replacement len */

#define DISCARD_ENTRY(number)                           /* rlen <= slen */    \
        " .byte 0xff + (" alt_rlen(number) ") - (" alt_slen ")\n"

#define ALTINSTR_REPLACEMENT(newinstr, feature, number) /* replacement */     \
        b_replacement(number)":\n\t" newinstr "\n" e_replacement(number) ":\n\t"

/* alternative assembly primitive: */
#define ALTERNATIVE(oldinstr, newinstr, feature)                        \
        OLDINSTR(oldinstr)                                              \
        ".pushsection .altinstructions,\"a\"\n"                         \
        ALTINSTR_ENTRY(feature, 1)                                      \
        ".popsection\n"                                                 \
        ".pushsection .discard,\"aw\",@progbits\n"                      \
        DISCARD_ENTRY(1)                                                \
        ".popsection\n"                                                 \
        ".pushsection .altinstr_replacement, \"ax\"\n"                  \
        ALTINSTR_REPLACEMENT(newinstr, feature, 1)                      \
        ".popsection"

ALTINSTR_ENTRY is the equivelant of struct alt_instr in assembly.

For example, stac and clac are defined using alternative mechanism.

static always_inline void clac(void)
{
    /* Note: a barrier is implicit in alternative() */
    alternative(ASM_NOP3, ___stringify(__ASM_CLAC), X86_FEATURE_SMAP);
}

alternative is a wrapper of ALTERNATIVE. So in effect this inline function defines 3 NOPs first, because machine code of clac is 3 bytes long. Then an alternative instruction entry is created in .altinstructions section. A discard entry created in .discard section. Finally the alternative instructions used to replace the original ones are stored in .altinstr_replacement section.

To extend this snippet into assembly code.

asm volatile (ALTERNATIVE(ASM_NOP3, __stringify(__ASM_CLAC), X86_FEATURE_SMAP) : : : "memory");

asm volatile (
    "661:\n\t" ASM_NOP3 "\n662:\n"

    ".pushsection .altinstructions,\"a\"\n"                         \
    " .long 661b - .\n"                             /* label           */ \
    " .long 6631f - .\n"        /* new instruction */ \
    " .word " __stringify(X86_FEATURE_SMAP) "\n"             /* feature bit     */ \
    " .byte 662b - 661b\n"                         /* source len      */ \
    " .byte 6641f - 6631f\n"                 /* replacement len */
    ".popsection\n"                                                 \

    ".pushsection .discard,\"aw\",@progbits\n"                      \
    " .byte 0xff + (6641f - 6631f) - (662b - 661b)\n"
    ".popsection\n"                                                 \

    ".pushsection .altinstr_replacement, \"ax\"\n"                  \
    "6631:\n\t" __stringify(__ASM_CLAC) "\n6641:\n\t"   
    ".popsection"

: : : "memory")

When Xen boots up, it iterates through .altinstructions sections, picks up each entry and patches call sites if the required feature bit is met. See alternative.c:apply_alternatives. If the required feature is not available, those instructions remain NOPs.

If you inspect the object file that contains functions that are implemented with alternative mechanism (for example, usercopy.c calls stac and clac), you can see:

$ objdump -h xen/arch/x86/usercopy.o
xen/arch/x86/usercopy.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 0 .text         000001e9  0000000000000000  0000000000000000  00000040  2**2
                 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
 1 .data         00000000  0000000000000000  0000000000000000  0000022c  2**2
                 CONTENTS, ALLOC, LOAD, DATA
 2 .bss          00000000  0000000000000000  0000000000000000  0000022c  2**2
                 ALLOC
 3 .altinstructions 00000048  0000000000000000  0000000000000000  0000022c  2**0
                 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
 4 .discard      00000006  0000000000000000  0000000000000000  00000274  2**0
                 CONTENTS, ALLOC, LOAD, DATA
 5 .altinstr_replacement 00000012  0000000000000000  0000000000000000  0000027a  2**0
                 CONTENTS, ALLOC, LOAD, READONLY, CODE

Disassembling .altinstr_replacement section yields:

$ objdump -d -j .altinstr_replacement xen/arch/x86/usercopy.o

xen/arch/x86/usercopy.o:     file format elf64-x86-64

Disassembly of section .altinstr_replacement:

0000000000000000 <.altinstr_replacement>:
   0:   0f 01                   (bad)
   2:   cb                      lret
   3:   0f 01                   (bad)
   5:   ca 0f 01                lret   $0x10f
   8:   cb                      lret
   9:   0f 01                   (bad)
   b:   ca 0f 01                lret   $0x10f
   e:   cb                      lret
   f:   0f 01                   (bad)
  11:   ca                      .byte 0xca

0f 01 ca and 0f 01 cb are machine code for clac and stac.

Use gdb to look at call site of stac:

$ gdb xen/arch/x86/usercopy.o
(gdb) disas /r __copy_from_user_ll
Dump of assembler code for function __copy_from_user_ll:
   0x000000000000003c <+0>: 55  push   %rbp
   0x000000000000003d <+1>: 48 89 e5    mov    %rsp,%rbp
   0x0000000000000040 <+4>: 89 d1   mov    %edx,%ecx
   0x0000000000000042 <+6>: 66 66 90    data32 xchg %ax,%ax
   0x0000000000000045 <+9>: 48 89 c8    mov    %rcx,%rax
   0x0000000000000048 <+12>:    48 83 f9 0f cmp    $0xf,%rcx

The line 66 66 90 is in fact ASM_NOP3 if you look at its definition.

If the requested feature is not present, ASM_NOP3 remains untouched. Otherwise it's replaced with 0f 01 cb.

The patching procedure can be seen in alternative.c. It's quite straightforward -- just plain memcpy.

This is it. This post mainly targets beginners who are interested in tricks in low level programming. It does requires certain level of understanding of the tools though. Fortunately the manuals of those tools are excellent so I won't go into details on how to use those tools.