Passive Reconnaissance Techniques

What Is Passive Reconnaissance

Passive reconnaissance is the process of gathering information about a target without directly interacting with the target's systems. Unlike active scanning, passive techniques do not send packets to the target, do not trigger intrusion detection systems, and leave no trace in the target's logs. All information is collected from third-party sources.

This distinction matters because passive recon can be conducted before an engagement formally begins (depending on scope agreements) and carries minimal risk of detection. It is the safest form of intelligence gathering and should always be performed before any active techniques.

💡

Passive vs. Active: the key difference

If your research involves querying the target's servers directly (e.g., sending DNS queries to their nameserver, connecting to their web server, or scanning their IP addresses), that is active reconnaissance. Passive recon only uses intermediary sources -- search engines, public databases, cached records, and third-party APIs -- that the target cannot observe.

DNS Enumeration

DNS enumeration through passive sources reveals subdomains, IP addresses, mail servers, and infrastructure details without querying the target's DNS servers directly. Several public databases aggregate historical DNS data that you can query freely.

Passive DNS Databases

Services like SecurityTrails, VirusTotal, and DNSDumpster maintain databases of historical DNS resolutions. When any user in the world resolves a domain, these services may record the result. This means you can discover subdomains and IP mappings without performing any DNS queries yourself.

# Using subfinder for passive subdomain enumeration
# subfinder queries multiple passive sources automatically
subfinder -d example.com -o subdomains.txt

# Using amass in passive mode (no DNS queries to the target)
amass enum -passive -d example.com -o amass-results.txt

# Using crt.sh (Certificate Transparency) via curl
curl -s "https://crt.sh/?q=%25.example.com&output=json" | \
  jq -r '.[].name_value' | sort -u

Reverse DNS and IP Range Discovery

Once you know some IP addresses belonging to the target, reverse DNS lookups and IP range analysis can reveal additional hosts. ARIN, RIPE, and other Regional Internet Registries maintain public records of IP allocations.

# Look up IP allocation details from ARIN (North America)
# Visit: https://search.arin.net/rdap/

# Look up IP allocation from RIPE (Europe)
# Visit: https://apps.db.ripe.net/db-web-ui/query

# Use BGP data to find related IP ranges
# Hurricane Electric BGP Toolkit: https://bgp.he.net/

💡

Subdomain discovery is critical

Many organizations secure their main website but neglect subdomains. Development servers (dev.example.com), staging environments (staging.example.com), internal tools (jira.example.com), and forgotten services (old.example.com) are common targets for attackers. Passive subdomain enumeration often reveals dozens or hundreds of subdomains that significantly expand the attack surface.

Certificate Transparency Logs

Certificate Transparency (CT) is a framework that requires Certificate Authorities to publicly log every SSL/TLS certificate they issue. These logs are searchable and contain the domain names (including subdomains) that each certificate covers. This makes CT logs one of the most reliable passive sources for subdomain discovery.

How Certificate Transparency Works

When an organization requests an SSL/TLS certificate for mail.example.com, the Certificate Authority must submit the certificate to one or more public CT logs before (or shortly after) issuance. The certificate includes the domain names it covers, either as the Common Name (CN) or in the Subject Alternative Name (SAN) extension. Anyone can search these logs.

Searching CT Logs

# Search crt.sh for all certificates issued for a domain
# In a browser: https://crt.sh/?q=example.com

# Using the crt.sh API to find subdomains
curl -s "https://crt.sh/?q=%25.example.com&output=json" | \
  jq -r '.[].name_value' | \
  sed 's/\*\.//g' | \
  sort -u

# Example output:
# example.com
# www.example.com
# mail.example.com
# vpn.example.com
# dev.example.com
# staging.example.com
# api.example.com

Wildcard certificates (*.example.com) will appear in the results but do not reveal specific subdomain names. However, individual certificates for specific subdomains are far more common and informative.

⚠️

CT logs are permanent.

Once a certificate is logged, the entry cannot be removed. This means that even certificates for internal or development servers that were briefly exposed to the internet remain discoverable forever. Organizations should be aware that requesting public certificates for internal domains permanently reveals those domain names.

Web Archive Analysis

The Wayback Machine (web.archive.org) and similar web archiving services take periodic snapshots of websites. These archives preserve historical versions of web pages, including content that has since been removed, modified, or taken offline.

What Web Archives Reveal

Removed content -- pages, documents, or files that the target deleted but are still archived
Historical technology stack -- older versions of a site may reveal different technologies, frameworks, or CMS platforms
Employee directories -- archived "About" or "Team" pages list employees who may no longer appear on the current site
Configuration leaks -- archived error pages, debug output, or exposed files
Site structure changes -- how the site's URL structure has evolved may reveal forgotten endpoints

Using the Wayback Machine

# Access archived versions of a site in a browser:
# https://web.archive.org/web/*/example.com

# Use the Wayback CDX API to list all archived URLs
curl -s "http://web.archive.org/cdx/search/cdx?url=example.com/*&output=text&fl=original&collapse=urlkey" | \
  sort -u | head -50

# Wayback Machine also has a "Changes" view to see
# what changed between snapshots -- useful for identifying
# when sensitive content was exposed and when it was removed

waybackurls Tool

# Install waybackurls (Go-based tool)
go install github.com/tomnomnom/waybackurls@latest

# Fetch all archived URLs for a domain
waybackurls example.com | sort -u > archived-urls.txt

# Filter for potentially interesting file types
waybackurls example.com | grep -E "\.(php|asp|aspx|jsp|json|xml|conf|env|bak|sql|log)$"

Metadata Extraction

Documents published on a target's website (PDFs, Word documents, spreadsheets, presentations) contain metadata that their authors often do not realize is embedded. This metadata can reveal usernames, software versions, internal file paths, printer names, email addresses, and operating system details.

ExifTool for Metadata Extraction

# Install ExifTool
sudo apt install libimage-exiftool-perl

# Extract metadata from a downloaded document
exiftool document.pdf

# Example output:
# File Name                       : document.pdf
# Creator                         : John Smith
# Producer                        : Microsoft Word 2019
# Create Date                     : 2024:03:15 14:22:33
# Author                          : jsmith
# Last Modified By                : Jane Doe
# Company                         : Example Corporation

# Extract metadata from all PDFs in a directory
exiftool *.pdf

# Output only specific fields
exiftool -Author -Creator -Producer -Company document.pdf

Bulk Document Discovery and Analysis

Combine search engine dorking with metadata extraction for a powerful workflow. First, find documents published on the target's website, download them, and then extract metadata from each one.

# Step 1: Find documents using Google dorks
# site:example.com filetype:pdf
# site:example.com filetype:docx
# site:example.com filetype:xlsx
# site:example.com filetype:pptx

# Step 2: Download discovered documents
wget -nd -r -l 1 -A pdf https://example.com/documents/

# Step 3: Extract metadata from all downloaded files
exiftool -csv *.pdf > metadata-report.csv

💡

What metadata reveals

Usernames extracted from document metadata often match Active Directory usernames or email address prefixes. Software versions reveal what tools the organization uses. Internal file paths (e.g., C:\Users\jsmith\Projects\Internal\) expose naming conventions and directory structures. GPS coordinates in images can reveal physical locations.

Technology Fingerprinting

Technology fingerprinting identifies the software, frameworks, content management systems, web servers, JavaScript libraries, and third-party services used by a target's website. This information helps you research known vulnerabilities specific to the identified technologies.

Browser Extensions

Wappalyzer and BuiltWith are browser extensions that automatically detect technologies when you visit a website. They analyze HTTP headers, HTML source code, JavaScript variables, cookies, and other artifacts to identify the technology stack.

Wappalyzer -- open-source, available for Chrome and Firefox; detects CMS platforms, JavaScript frameworks, analytics tools, web servers, programming languages, and more
BuiltWith -- commercial service with a free tier; provides detailed technology profiles and historical technology usage data

Command-Line Fingerprinting

# Analyze HTTP response headers (without directly scanning the target)
# Use third-party cached results:

# Netcraft Site Report
# https://sitereport.netcraft.com/?url=example.com

# BuiltWith Technology Lookup
# https://builtwith.com/example.com

# Wappalyzer Technology Lookup
# https://www.wappalyzer.com/lookup/example.com

# WhatWeb (command-line tool, note: this DOES contact the target)
# For passive-only work, rely on cached results from the services above
whatweb --aggression 1 https://example.com

⚠️

Passive vs. semi-passive fingerprinting

Visiting a target's website in your browser to let Wappalyzer analyze it is technically a direct connection to the target -- making it semi-passive rather than purely passive. For strictly passive analysis, use third-party lookup services (BuiltWith, Netcraft) that serve cached data without generating traffic to the target.

What Technology Fingerprinting Reveals

Web server -- Apache, Nginx, IIS, and their versions (version-specific vulnerabilities)
CMS platforms -- WordPress, Drupal, Joomla (each has well-known attack vectors)
JavaScript frameworks -- React, Angular, Vue.js, jQuery versions
Server-side languages -- PHP, ASP.NET, Node.js, Python, Ruby
Third-party services -- CDNs (Cloudflare, Akamai), analytics (Google Analytics), advertising networks
Security tools -- WAFs (Web Application Firewalls), CAPTCHA providers, bot detection

Email Harvesting

Discovering valid email addresses for a target organization enables phishing simulations, credential testing, and social engineering assessments. Passive email harvesting uses publicly available sources rather than direct contact with the target's mail server.

Sources of Email Addresses

Search engines -- Google dorks like "@example.com" or site:example.com email
LinkedIn -- employee names combined with a known email format yield valid addresses
GitHub commits -- commit authors often use their corporate email
PGP key servers -- public key repositories associate email addresses with cryptographic keys
Data breach databases -- services like Have I Been Pwned reveal which addresses appeared in breaches (use ethically)
WHOIS records -- domain registrations may list administrative email addresses

Using theHarvester

# theHarvester automates email discovery across multiple sources
theHarvester -d example.com -b google,bing,linkedin,crtsh -l 500

# Example output:
# [*] Emails found:
# john.smith@example.com
# jane.doe@example.com
# admin@example.com
# support@example.com
# hr@example.com
#
# [*] Hosts found:
# www.example.com: 93.184.216.34
# mail.example.com: 93.184.216.35
# vpn.example.com: 93.184.216.36

Determining Email Format

Once you have a few confirmed email addresses, you can determine the organization's email naming convention. Common formats include:

# Common corporate email formats:
# first.last@company.com        (john.smith@example.com)
# firstlast@company.com         (johnsmith@example.com)
# first_last@company.com        (john_smith@example.com)
# flast@company.com             (jsmith@example.com)
# first@company.com             (john@example.com)
# first.l@company.com           (john.s@example.com)

# Once the pattern is identified, combine with employee names
# from LinkedIn to generate a list of likely valid addresses

🎉

Verification without active probing

Services like Hunter.io and EmailHippo can verify whether an email address exists without sending an actual email to the target. This keeps your reconnaissance passive while confirming the accuracy of harvested addresses.

Documenting Your Findings

Passive reconnaissance produces large volumes of raw data. Without proper documentation and organization, valuable findings get lost and the effort is wasted. A structured approach to documentation turns raw data into actionable intelligence.

What to Document

Source -- where each piece of information was found (URL, tool, database)
Timestamp -- when the information was collected (data can change or become stale)
Confidence level -- how reliable is the source? Cross-referenced data is higher confidence
Category -- infrastructure, personnel, technology, credentials, etc.
Security relevance -- why does this finding matter? What attack vector does it enable?

Organizing Your Notes

# Recommended directory structure for passive recon findings
mkdir -p recon/{domains,emails,metadata,screenshots,technology,archive}

# domains/      - subdomain lists, DNS records, IP ranges
# emails/       - harvested email addresses, format analysis
# metadata/     - extracted document metadata, usernames
# screenshots/  - archived web pages, social media profiles
# technology/   - identified tech stacks, versions, configurations
# archive/      - raw tool output, API responses, logs

Tools like CherryTree, Obsidian, or even a well-structured Markdown file can serve as your research notebook. The key is consistency -- use the same format for every finding so you can search and cross-reference efficiently.

💡

Take screenshots of everything

Web content changes frequently. A social media post, job listing, or exposed document you found today may be deleted tomorrow. Always take timestamped screenshots of important findings. For web pages, use the Wayback Machine's "Save Page Now" feature to create a permanent archive.

Summary

In this tutorial, you learned passive reconnaissance techniques that gather intelligence without alerting the target:

Passive vs. active -- passive recon uses third-party sources exclusively, leaving no trace in the target's logs
DNS enumeration -- passive DNS databases and tools like subfinder and amass reveal subdomains and infrastructure without direct queries
Certificate transparency -- CT logs permanently record every issued SSL/TLS certificate, revealing subdomains and infrastructure
Web archive analysis -- the Wayback Machine preserves historical content including removed pages, old configurations, and employee directories
Metadata extraction -- documents on the target's website embed usernames, software versions, internal paths, and organizational details
Technology fingerprinting -- tools like Wappalyzer and BuiltWith identify the full technology stack for targeted vulnerability research
Email harvesting -- combining search engines, LinkedIn, and automated tools to discover valid email addresses and naming patterns
Documentation -- structured, timestamped records with source attribution turn raw data into actionable intelligence

🎉

Great progress!

You now have a solid understanding of passive reconnaissance. These techniques form the foundation of any professional engagement and should always be completed before moving on to active scanning. The information you gather here directly shapes your approach to active reconnaissance and exploitation.