What Is OSINT

Open Source Intelligence (OSINT) is the practice of collecting and analyzing information from publicly available sources to produce actionable intelligence. In the context of ethical hacking and penetration testing, OSINT is the first phase of reconnaissance -- gathering as much information as possible about a target before any active interaction with their systems.

The term "open source" here does not refer to open-source software. It means the information is publicly accessible -- anyone can find it through legitimate channels without needing to bypass access controls, exploit vulnerabilities, or break any laws.

💡
Why OSINT matters in security

Attackers routinely use OSINT before launching targeted attacks. By performing OSINT on your own organization, you can discover what information is publicly exposed and take steps to reduce your attack surface before a malicious actor exploits it.

OSINT sources include but are not limited to:

  • Search engines -- Google, Bing, DuckDuckGo, and specialized search engines
  • Social media platforms -- LinkedIn, Twitter/X, Facebook, Instagram, GitHub
  • Public records -- WHOIS databases, DNS records, certificate transparency logs
  • Web archives -- The Wayback Machine and cached versions of websites
  • Code repositories -- GitHub, GitLab, Bitbucket (public repositories)
  • Job postings -- reveal internal technologies, software stacks, and security tools
  • Government databases -- company registrations, court records, patent filings

Types of Open Source Intelligence

OSINT can be categorized by the type of information being collected and the source it comes from. Understanding these categories helps you structure your research and ensures thorough coverage during an engagement.

Technical OSINT

Technical OSINT focuses on the digital infrastructure of a target. This includes IP address ranges, domain names, subdomains, DNS records, mail server configurations, SSL/TLS certificates, web technologies in use, and exposed services. This type of intelligence directly feeds into the scanning and enumeration phases of a penetration test.

Organizational OSINT

Organizational OSINT involves gathering information about a company's structure, key personnel, business relationships, and internal processes. Employee names and roles (especially IT staff), organizational hierarchies, partner companies, and office locations can all be valuable for social engineering attacks or for understanding the scope of an engagement.

Personal OSINT

Personal OSINT targets individuals -- typically key employees identified during organizational research. Email addresses, social media profiles, public posts, conference presentations, and personal websites can reveal password patterns, security questions answers, or information useful for spear-phishing campaigns.

⚠️
Respect privacy boundaries.

Even though OSINT uses publicly available information, collecting personal data about individuals requires careful ethical consideration. During authorized penetration tests, only collect personal information that is within the agreed scope. Never use OSINT techniques to stalk, harass, or dox individuals.

Search Engine Techniques

Search engines index vast amounts of publicly accessible data. Advanced search operators allow you to filter results with precision, revealing information that basic searches would miss. This technique is commonly known as "Google dorking" or "Google hacking," though it works across multiple search engines.

Essential Google Search Operators

# Find pages on a specific domain
site:example.com

# Search for specific file types
site:example.com filetype:pdf

# Find pages with specific words in the title
intitle:"index of" site:example.com

# Find pages with specific words in the URL
inurl:admin site:example.com

# Search for exact phrases
"employee handbook" site:example.com

# Exclude results from a specific site
password reset -site:example.com

# Find cached versions of a page
cache:example.com

Google Dorking for Security Research

Google dorks can reveal misconfigurations, exposed sensitive files, and information that was not intended to be public. During an authorized assessment, these queries can quickly identify low-hanging fruit.

# Find exposed configuration files
site:example.com filetype:env OR filetype:cfg OR filetype:conf

# Find directory listings
intitle:"index of" site:example.com

# Find exposed log files
site:example.com filetype:log

# Find login pages
site:example.com inurl:login OR inurl:signin OR inurl:admin

# Find exposed database files
site:example.com filetype:sql OR filetype:db OR filetype:sqlite

# Find documents that may contain sensitive information
site:example.com filetype:xlsx OR filetype:docx confidential
💡
The Google Hacking Database (GHDB)

The Exploit Database maintains the Google Hacking Database at exploit-db.com/google-hacking-database, which contains thousands of proven search queries organized by category. It is a valuable reference for discovering what types of sensitive information can be found through search engines.

Social Media OSINT

Social media platforms are rich sources of intelligence. Employees often share information about their workplace, technologies they use, projects they work on, and even security practices -- sometimes without realizing the implications.

LinkedIn

LinkedIn is arguably the most valuable social media platform for OSINT in a professional context. It reveals organizational structure, employee roles, technology stacks (from job postings and employee profiles), and business relationships.

  • Employee enumeration -- identify IT staff, security team members, and executives
  • Technology identification -- skills listed on profiles reveal internal tools and platforms
  • Job postings -- open positions describe the exact technologies, certifications, and tools the company uses
  • Email pattern discovery -- once you know employee names, you can guess the email format (e.g., first.last@company.com)

GitHub and Code Repositories

Developers frequently push code to public repositories that contains sensitive information. Searching an organization's GitHub presence can reveal API keys, internal IP addresses, database credentials, infrastructure details, and proprietary code.

# Search GitHub for potential secrets in an organization's repos
# (use the GitHub search interface or GitHub API)

org:example-company password
org:example-company api_key
org:example-company secret
org:example-company internal

Twitter/X, Reddit, and Forums

Technical staff often discuss work-related problems on public forums. Stack Overflow questions, Reddit posts, and tweets can reveal internal architecture details, software versions, and security misconfigurations. Searching for a company's domain name or product names across these platforms can surface useful leads.

WHOIS and DNS Lookups

Domain registration records and DNS configurations are fundamental OSINT sources for technical reconnaissance. They reveal infrastructure details, hosting providers, email configurations, and sometimes administrative contact information.

WHOIS Lookups

WHOIS queries reveal who registered a domain, when it was registered, when it expires, the registrar used, and sometimes the registrant's name, email, phone number, and physical address. Many domains now use privacy protection services, but older registrations or domains in certain TLDs may still expose this data.

# Command-line WHOIS lookup
whois example.com

# Example output (abbreviated)
Domain Name: EXAMPLE.COM
Registrar: Example Registrar, Inc.
Creation Date: 1995-08-14T04:00:00Z
Registrar Expiration Date: 2025-08-13T04:00:00Z
Name Server: NS1.EXAMPLE.COM
Name Server: NS2.EXAMPLE.COM

DNS Record Enumeration

DNS records map domain names to IP addresses and services. Different record types reveal different information about the target's infrastructure.

# Query all DNS record types
dig example.com ANY

# Query specific record types
dig example.com A          # IPv4 addresses
dig example.com AAAA       # IPv6 addresses
dig example.com MX         # Mail servers
dig example.com NS         # Name servers
dig example.com TXT        # Text records (SPF, DKIM, verification tokens)
dig example.com CNAME      # Canonical name aliases
dig example.com SOA        # Start of authority

# Attempt a zone transfer (often blocked, but worth trying)
dig axfr example.com @ns1.example.com
💡
What DNS records reveal

MX records identify mail servers and their hosting providers. TXT records often contain SPF records listing authorized sending IPs, DKIM keys, and third-party verification tokens (Google Workspace, Microsoft 365, etc.) that reveal what services the organization uses. NS records identify the DNS hosting provider.

OSINT Tools Overview

While manual research is essential, specialized OSINT tools automate the collection and correlation of information. Here are some of the most widely used tools in the OSINT community.

theHarvester

theHarvester is a command-line tool that gathers email addresses, subdomains, hosts, employee names, open ports, and banners from different public sources including search engines, PGP key servers, and the Shodan database.

# Install theHarvester
sudo apt install theharvester

# Search for emails and subdomains associated with a domain
theHarvester -d example.com -b google,bing,linkedin -l 200

# Use all available data sources
theHarvester -d example.com -b all

Maltego

Maltego is a graphical link analysis tool that maps relationships between pieces of information. It uses "transforms" to automatically query data sources and visualize connections between domains, IP addresses, email addresses, people, and organizations. The Community Edition is free, while the commercial version offers more transforms and features.

Recon-ng

Recon-ng is a modular web reconnaissance framework written in Python. It provides a command-line interface similar to Metasploit and supports modules for DNS enumeration, contact harvesting, credential discovery, and more.

# Install Recon-ng
sudo apt install recon-ng

# Launch the framework
recon-ng

# Create a workspace for your project
workspaces create example-project

# Add a target domain
db insert domains example.com

# Search for available modules
marketplace search domains

# Install and run a module
marketplace install recon/domains-hosts/hackertarget
modules load recon/domains-hosts/hackertarget
run

Other Notable Tools

  • Shodan -- a search engine for internet-connected devices; reveals open ports, services, and banners across the internet
  • Censys -- similar to Shodan, with a focus on TLS certificates and web server data
  • SpiderFoot -- automated OSINT collection with over 200 data source modules
  • FOCA -- extracts metadata from documents (PDF, DOCX, XLSX) found on a target's website
  • Amass -- comprehensive subdomain enumeration using multiple techniques and data sources

OSINT Methodology

Effective OSINT follows a structured methodology rather than random searching. A disciplined approach ensures thorough coverage and produces organized, actionable results.

Step 1: Define Objectives

Before starting any collection, clearly define what you are looking for and why. In a penetration test, your scope document dictates what is in bounds. Common objectives include mapping the target's external infrastructure, identifying employee email addresses for phishing simulations, or discovering exposed credentials.

Step 2: Identify Sources

Based on your objectives, determine which data sources are most likely to yield relevant results. Technical objectives call for DNS, WHOIS, and Shodan. People-focused objectives call for LinkedIn, social media, and public records.

Step 3: Collect Data

Systematically query each source and record your findings. Use both automated tools and manual searching -- tools are fast but can miss context that a human researcher would catch. Save raw data, screenshots, and timestamps for everything you find.

Step 4: Analyze and Correlate

Cross-reference findings from different sources. An email address found on a breached credentials database combined with a LinkedIn profile showing the same person is an IT administrator creates a high-risk finding. Individual data points become intelligence when correlated.

Step 5: Document and Report

Organize your findings into a structured report. For each finding, document the source, the date collected, the potential security impact, and recommended mitigations. Clear documentation ensures your work is reproducible and actionable for the client.

🎉
Methodology tip

Keep a running log of every query you execute and every source you check, even if it yields no results. This prevents duplicated effort, proves thoroughness to your client, and helps you refine your approach over time.

OSINT operates in a legal gray area that varies by jurisdiction. While the information itself is publicly available, how you collect it, store it, and use it may be subject to laws and regulations.

Legal Framework

  • Terms of Service -- scraping social media platforms or search engines may violate their terms of service, even if the data is public
  • GDPR and privacy laws -- in the EU, collecting personal data (names, emails, photos) requires a legal basis even if the data is publicly available
  • Computer access laws -- accessing data through unintended means (even if no password is required) may violate laws like the CFAA in the US
  • Data retention -- storing collected personal data creates obligations under data protection regulations

Ethical Guidelines

  • Stay within scope -- only collect information relevant to your authorized engagement
  • Minimize personal data collection -- do not harvest personal information beyond what is necessary
  • Secure your findings -- OSINT reports contain sensitive information; encrypt them and limit access
  • Responsible disclosure -- if you discover exposed credentials or sensitive data during research, follow responsible disclosure practices
  • No deception -- do not create fake profiles or impersonate others to extract information (this crosses from passive OSINT into social engineering, which requires separate authorization)
⚠️
Always have written authorization.

Before conducting OSINT against any organization, ensure you have a signed agreement that explicitly authorizes information gathering. Even passive reconnaissance can raise legal concerns if conducted without permission. For practice, use intentionally vulnerable targets, CTF challenges, or your own infrastructure.

Summary

In this tutorial, you learned the fundamentals of Open Source Intelligence:

  • OSINT definition -- collecting and analyzing publicly available information to produce actionable intelligence
  • Types of OSINT -- technical, organizational, and personal intelligence each serve different objectives
  • Search engine techniques -- Google dorking with advanced operators like site:, filetype:, intitle:, and inurl:
  • Social media research -- LinkedIn, GitHub, and forums reveal technology stacks, employee details, and internal infrastructure
  • WHOIS and DNS -- domain registration and DNS records map infrastructure and identify hosting providers
  • OSINT tools -- theHarvester, Maltego, Recon-ng, Shodan, and others automate collection at scale
  • Structured methodology -- define objectives, identify sources, collect, analyze, and document
  • Legal and ethical boundaries -- always operate within scope, respect privacy laws, and secure your findings
🎉
Well done!

You now understand the foundations of OSINT and how it fits into the reconnaissance phase of ethical hacking. Mastering OSINT will make you a more effective security professional by helping you see your targets the way an attacker would -- through publicly available information.