Automating Recon with Python: A Practical Primer
Manual reconnaissance is slow and inconsistent. This post walks through building a basic but solid Python script that automates subdomain enumeration, port scanning orchestration, and output normalization — the unglamorous foundation of any serious engagement.
The Problem with Manual Recon
You have a target. You start running tools one at a time, piping output into text files, maybe grepping for ports, maybe missing something because you forgot a flag. Two hours later you have a pile of disconnected data and no clear picture of the attack surface.
The fix isn’t a fancy platform. It’s a script that runs your existing tools in sequence, normalizes their output, and gives you a single structured file to work from.
Project Structure
recon/
├── recon.py # Main script
├── modules/
│ ├── subdomain.py # Subdomain enumeration
│ └── portscan.py # Nmap wrapper
└── output/ # Results land here
Subdomain Enumeration
We’ll wrap subfinder (install separately) and parse its output:
import subprocess
import json
from pathlib import Path
def enumerate_subdomains(domain: str, output_dir: Path) -> list[str]:
"""
Run subfinder against the target domain.
Returns a list of discovered subdomains.
"""
outfile = output_dir / f"{domain}_subdomains.txt"
result = subprocess.run(
["subfinder", "-d", domain, "-silent", "-o", str(outfile)],
capture_output=True,
text=True,
timeout=120
)
if result.returncode != 0:
print(f"[!] subfinder error: {result.stderr.strip()}")
return []
if not outfile.exists():
return []
subdomains = [line.strip() for line in outfile.read_text().splitlines() if line.strip()]
print(f"[+] Found {len(subdomains)} subdomains for {domain}")
return subdomains
Port Scanning Wrapper
Nmap’s XML output is machine-readable. Parse it instead of text:
import xml.etree.ElementTree as ET
def scan_host(host: str, output_dir: Path) -> dict:
"""
Run a fast Nmap service scan and parse the results.
Returns a dict of {port: {'state': str, 'service': str, 'version': str}}.
"""
xml_out = output_dir / f"{host}_nmap.xml"
subprocess.run([
"nmap",
"-sV", # Service/version detection
"--open", # Only open ports
"-T4", # Aggressive timing
"-oX", str(xml_out),
host
], capture_output=True, timeout=300)
if not xml_out.exists():
return {}
return _parse_nmap_xml(xml_out)
def _parse_nmap_xml(xml_path: Path) -> dict:
tree = ET.parse(xml_path)
root = tree.getroot()
ports = {}
for port_el in root.findall(".//port"):
state = port_el.find("state")
service = port_el.find("service")
if state is None or state.get("state") != "open":
continue
portid = port_el.get("portid")
ports[portid] = {
"state": "open",
"service": service.get("name", "unknown") if service is not None else "unknown",
"version": service.get("version", "") if service is not None else ""
}
return ports
Putting It Together
import json
import sys
from datetime import datetime
from pathlib import Path
def main():
if len(sys.argv) != 2:
print(f"Usage: {sys.argv[0]} <domain>")
sys.exit(1)
domain = sys.argv[1]
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_dir = Path("output") / f"{domain}_{timestamp}"
output_dir.mkdir(parents=True, exist_ok=True)
print(f"[*] Starting recon for {domain}")
print(f"[*] Output directory: {output_dir}")
# Step 1: Enumerate subdomains
subdomains = enumerate_subdomains(domain, output_dir)
# Step 2: Scan each subdomain
results = {}
for host in subdomains[:20]: # Cap at 20 for demo
print(f"[*] Scanning {host}...")
results[host] = scan_host(host, output_dir)
# Step 3: Write normalized JSON output
report = {
"domain": domain,
"timestamp": timestamp,
"subdomains": subdomains,
"scan_results": results
}
report_path = output_dir / "report.json"
report_path.write_text(json.dumps(report, indent=2))
print(f"\n[+] Report written to {report_path}")
if __name__ == "__main__":
main()
Running It
python recon.py example.com
Output lands in output/example.com_20240115_143022/report.json — a structured JSON file you can query with jq, import into a database, or feed into the next stage of your pipeline.
What’s Next
This is the skeleton. The real value comes from plugging in more tools: DNS brute-forcing with dnsx, HTTP probing with httpx, screenshot capture with gowitness. Each module follows the same pattern — run a binary, parse its output, add to the report dict.
The script isn’t glamorous. Neither is recon. But having a consistent, repeatable process beats improvising every time.