Automating Recon with Python: A Practical Primer

Manual reconnaissance is slow and inconsistent. This post walks through building a basic but solid Python script that automates subdomain enumeration, port scanning orchestration, and output normalization — the unglamorous foundation of any serious engagement.

The Problem with Manual Recon

You have a target. You start running tools one at a time, piping output into text files, maybe grepping for ports, maybe missing something because you forgot a flag. Two hours later you have a pile of disconnected data and no clear picture of the attack surface.

The fix isn’t a fancy platform. It’s a script that runs your existing tools in sequence, normalizes their output, and gives you a single structured file to work from.

Project Structure

recon/
├── recon.py          # Main script
├── modules/
│   ├── subdomain.py  # Subdomain enumeration
│   └── portscan.py   # Nmap wrapper
└── output/           # Results land here

Subdomain Enumeration

We’ll wrap subfinder (install separately) and parse its output:

import subprocess
import json
from pathlib import Path

def enumerate_subdomains(domain: str, output_dir: Path) -> list[str]:
    """
    Run subfinder against the target domain.
    Returns a list of discovered subdomains.
    """
    outfile = output_dir / f"{domain}_subdomains.txt"

    result = subprocess.run(
        ["subfinder", "-d", domain, "-silent", "-o", str(outfile)],
        capture_output=True,
        text=True,
        timeout=120
    )

    if result.returncode != 0:
        print(f"[!] subfinder error: {result.stderr.strip()}")
        return []

    if not outfile.exists():
        return []

    subdomains = [line.strip() for line in outfile.read_text().splitlines() if line.strip()]
    print(f"[+] Found {len(subdomains)} subdomains for {domain}")
    return subdomains

Port Scanning Wrapper

Nmap’s XML output is machine-readable. Parse it instead of text:

import xml.etree.ElementTree as ET

def scan_host(host: str, output_dir: Path) -> dict:
    """
    Run a fast Nmap service scan and parse the results.
    Returns a dict of {port: {'state': str, 'service': str, 'version': str}}.
    """
    xml_out = output_dir / f"{host}_nmap.xml"

    subprocess.run([
        "nmap",
        "-sV",          # Service/version detection
        "--open",       # Only open ports
        "-T4",          # Aggressive timing
        "-oX", str(xml_out),
        host
    ], capture_output=True, timeout=300)

    if not xml_out.exists():
        return {}

    return _parse_nmap_xml(xml_out)


def _parse_nmap_xml(xml_path: Path) -> dict:
    tree = ET.parse(xml_path)
    root = tree.getroot()
    ports = {}

    for port_el in root.findall(".//port"):
        state = port_el.find("state")
        service = port_el.find("service")

        if state is None or state.get("state") != "open":
            continue

        portid = port_el.get("portid")
        ports[portid] = {
            "state":   "open",
            "service": service.get("name", "unknown") if service is not None else "unknown",
            "version": service.get("version", "")      if service is not None else ""
        }

    return ports

Putting It Together

import json
import sys
from datetime import datetime
from pathlib import Path

def main():
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <domain>")
        sys.exit(1)

    domain     = sys.argv[1]
    timestamp  = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_dir = Path("output") / f"{domain}_{timestamp}"
    output_dir.mkdir(parents=True, exist_ok=True)

    print(f"[*] Starting recon for {domain}")
    print(f"[*] Output directory: {output_dir}")

    # Step 1: Enumerate subdomains
    subdomains = enumerate_subdomains(domain, output_dir)

    # Step 2: Scan each subdomain
    results = {}
    for host in subdomains[:20]:  # Cap at 20 for demo
        print(f"[*] Scanning {host}...")
        results[host] = scan_host(host, output_dir)

    # Step 3: Write normalized JSON output
    report = {
        "domain":     domain,
        "timestamp":  timestamp,
        "subdomains": subdomains,
        "scan_results": results
    }

    report_path = output_dir / "report.json"
    report_path.write_text(json.dumps(report, indent=2))
    print(f"\n[+] Report written to {report_path}")


if __name__ == "__main__":
    main()

Running It

python recon.py example.com

Output lands in output/example.com_20240115_143022/report.json — a structured JSON file you can query with jq, import into a database, or feed into the next stage of your pipeline.

What’s Next

This is the skeleton. The real value comes from plugging in more tools: DNS brute-forcing with dnsx, HTTP probing with httpx, screenshot capture with gowitness. Each module follows the same pattern — run a binary, parse its output, add to the report dict.

The script isn’t glamorous. Neither is recon. But having a consistent, repeatable process beats improvising every time.