What Is Google Dorking

Google dorking (also called Google hacking) is a reconnaissance technique that uses advanced search operators to find information that standard searches would never surface. By combining specific operators, you can instruct Google to return only results matching precise criteria -- exposing files, directories, login portals, error messages, and configuration data that was never meant to be publicly discoverable.

The technique was popularized by Johnny Long in the early 2000s when he began cataloging search queries that revealed security vulnerabilities. His work became the Google Hacking Database (GHDB), which is now maintained by Exploit-DB and contains thousands of proven dorks organized by category.

💡
Why dorking matters for security

Before spending hours on port scanning or vulnerability testing, a few well-crafted Google dorks can reveal exposed admin panels, leaked credentials, misconfigured servers, and sensitive documents in minutes. It is often the fastest way to identify low-hanging fruit during the reconnaissance phase of an authorized assessment.

⚠️
Authorization required.

Only use Google dorking against targets you are explicitly authorized to test. Using these techniques against systems without permission may violate computer access laws in your jurisdiction. For practice, use your own domains, intentionally vulnerable labs, or CTF challenges.

Core Search Operators

Google supports a set of advanced operators that filter results with precision. These are the building blocks of every dork. Understanding each operator individually is essential before combining them into complex queries.

site:

Restricts results to a specific domain or subdomain. This is the most frequently used operator in security assessments because it limits your search to the target's web presence.

# All indexed pages on a domain
site:example.com

# Only results from a specific subdomain
site:dev.example.com

# All subdomains (exclude the main site)
site:*.example.com -site:www.example.com

filetype: / ext:

Searches for specific file extensions. filetype: and ext: are interchangeable on Google. This operator is critical for finding documents, configuration files, backups, and database dumps.

# Find PDF documents
site:example.com filetype:pdf

# Find configuration files
site:example.com filetype:env OR filetype:ini OR filetype:conf OR filetype:cfg

# Find backup files
site:example.com filetype:bak OR filetype:old OR filetype:backup

# Find database dumps
site:example.com filetype:sql OR filetype:db OR filetype:sqlite

# Find spreadsheets that may contain sensitive data
site:example.com filetype:xlsx OR filetype:csv

intitle:

Matches pages where the specified text appears in the HTML <title> tag. Web servers and applications generate predictable titles for directory listings, error pages, and admin interfaces -- making this operator highly effective for finding them.

# Find directory listings (Apache/Nginx auto-generated index)
intitle:"index of" site:example.com

# Find error pages that leak server information
intitle:"404 Not Found" site:example.com
intitle:"500 Internal Server Error" site:example.com

# Find admin or management panels
intitle:"admin" OR intitle:"dashboard" OR intitle:"control panel" site:example.com

inurl:

Matches pages where the specified text appears anywhere in the URL. Login pages, admin panels, and API endpoints often have predictable URL patterns.

# Find login pages
inurl:login OR inurl:signin OR inurl:auth site:example.com

# Find admin interfaces
inurl:admin OR inurl:administrator OR inurl:wp-admin site:example.com

# Find API endpoints
inurl:api OR inurl:/v1/ OR inurl:/v2/ site:example.com

# Find phpMyAdmin or database tools
inurl:phpmyadmin OR inurl:adminer site:example.com

intext:

Searches the body text of pages. Useful for finding specific strings like error messages, software version numbers, or configuration values that appear in page content.

# Find pages leaking PHP errors
intext:"Warning: mysql_" site:example.com
intext:"Fatal error:" site:example.com

# Find pages revealing server software versions
intext:"Apache/2.4" OR intext:"nginx/" site:example.com

# Find exposed email addresses
intext:"@example.com" site:example.com

cache:

Shows Google's cached version of a page. This is useful when a page has been taken down or modified -- the cached version may still contain the original sensitive information.

# View the cached version of a specific page
cache:example.com/admin/config.php

Exclusion operator (-)

The minus sign excludes results matching a term. This is essential for filtering out noise and irrelevant results from your queries.

# Find all subdomains except www
site:*.example.com -site:www.example.com

# Find documents but exclude marketing PDFs
site:example.com filetype:pdf -"brochure" -"annual report"

# Exclude known safe pages to surface unusual ones
site:example.com -inurl:blog -inurl:news -inurl:careers

Wildcard (*) and exact match ("")

# Wildcard matches any word(s)
"password is *" site:example.com

# Exact phrase match
"internal use only" site:example.com
"not for public distribution" site:example.com

Operator Chaining

The real power of Google dorking comes from combining multiple operators into a single query. Each operator narrows the result set further, allowing you to pinpoint exactly what you are looking for.

Combining operators with AND (implicit)

Google treats spaces between terms as implicit AND. Every term and operator in your query must be present in the result. There is no explicit AND operator.

# Directory listings containing "password" on the target
intitle:"index of" "password" site:example.com

# Configuration files with database credentials
site:example.com filetype:env "DB_PASSWORD"

# PDF documents marked as confidential
site:example.com filetype:pdf "confidential" OR "internal only"

# Exposed log files containing error information
site:example.com filetype:log "error" OR "warning" OR "failed"

Using OR for alternatives

The OR operator (must be uppercase) or the pipe symbol | matches either term. This lets you search for multiple variations in a single query.

# Find any type of configuration file
site:example.com filetype:env OR filetype:yml OR filetype:yaml OR filetype:toml OR filetype:json "password" OR "secret" OR "key"

# Find any common admin URL pattern
site:example.com inurl:admin OR inurl:manage OR inurl:dashboard OR inurl:cpanel

Parentheses for grouping

While Google's support for parentheses is limited compared to programming languages, you can use them to group OR conditions for cleaner queries.

# Group file types
site:example.com (filetype:sql OR filetype:db) "password"

# Group URL patterns
site:example.com (inurl:backup OR inurl:dump OR inurl:export) filetype:sql

Dork Categories for Security Assessments

During an authorized assessment, you will typically run dorks from several categories. Working through each category systematically ensures thorough coverage of the target's exposed attack surface.

Exposed Files and Documents

Organizations frequently leave sensitive documents accessible on their web servers. These can include internal reports, employee lists, network diagrams, and password spreadsheets.

# Sensitive document types
site:example.com filetype:xlsx OR filetype:docx OR filetype:pptx "confidential" OR "internal" OR "restricted"

# Network diagrams and infrastructure docs
site:example.com filetype:pdf OR filetype:vsd "network diagram" OR "topology" OR "infrastructure"

# Backup archives
site:example.com filetype:zip OR filetype:tar OR filetype:gz OR filetype:7z

# Source code files
site:example.com filetype:py OR filetype:php OR filetype:js -site:github.com

Directory Listings

When a web server is configured to show directory contents (auto-indexing), it generates pages with predictable titles. These listings can expose entire directory trees of files that were never meant to be browsable.

# Apache and Nginx directory listings
intitle:"index of" site:example.com

# Directory listings containing specific file types
intitle:"index of" "backup" site:example.com
intitle:"index of" ".env" site:example.com
intitle:"index of" ".git" site:example.com
intitle:"index of" "wp-config" site:example.com
⚠️
Do not download or access files without authorization.

Finding an exposed directory or file through Google does not give you permission to download or access it. During an authorized engagement, document the finding and report it. Only access the content if your scope of work explicitly permits it.

Login Pages and Admin Panels

Finding login pages reveals application entry points. Combined with default credential testing (if in scope), these can lead to significant findings.

# Generic login pages
site:example.com intitle:"login" OR intitle:"sign in" OR intitle:"log in"

# Content management systems
site:example.com inurl:wp-login.php OR inurl:wp-admin
site:example.com inurl:administrator OR inurl:admin/login

# Database management tools
site:example.com intitle:"phpMyAdmin" OR intitle:"Adminer"

# Network devices and infrastructure
site:example.com intitle:"Cisco" intitle:"login"
site:example.com intitle:"RouterOS" OR intitle:"FortiGate"

Error Messages and Debug Information

Error pages and debug output can leak server software versions, file paths, database names, internal IP addresses, and stack traces -- all valuable for an attacker planning their next move.

# PHP errors revealing file paths and versions
site:example.com "Warning:" "on line" filetype:php
site:example.com "Fatal error:" "on line"
site:example.com "Parse error:" "syntax error"

# Database errors revealing backend technology
site:example.com "mysql_fetch" OR "pg_query" OR "ORA-"
site:example.com "SQLSTATE" OR "syntax error at or near"

# Stack traces and debug output
site:example.com "stack trace" OR "traceback" OR "Exception in"
site:example.com "Debug" "mode" OR inurl:debug

Exposed Credentials and Secrets

Misconfigured servers and careless deployments can expose environment files, configuration files, and even plaintext credentials through search engine indexing.

# Environment files with credentials
site:example.com filetype:env "DB_PASSWORD" OR "API_KEY" OR "SECRET"

# Configuration files with passwords
site:example.com filetype:yml "password:" OR "passwd:" OR "secret:"
site:example.com filetype:json "api_key" OR "apikey" OR "token"
site:example.com filetype:xml "password" OR "credential"

# Exposed .git directories (source code + history)
site:example.com inurl:".git" intitle:"index of"

# Exposed SSH keys
site:example.com filetype:pem OR filetype:ppk "PRIVATE KEY"

Server and Technology Identification

Identifying the specific software, frameworks, and versions running on the target helps you search for known vulnerabilities and tailor your attack approach.

# Identify web server software
site:example.com intitle:"Apache" "server at"
site:example.com "powered by" ("WordPress" OR "Drupal" OR "Joomla" OR "Laravel")

# Find specific software versions
site:example.com "PHP/" OR "Apache/" OR "nginx/" inurl:phpinfo
site:example.com intitle:"phpinfo()" "PHP Version"

# Identify API documentation (reveals endpoints and structure)
site:example.com inurl:swagger OR inurl:api-docs OR intitle:"API Documentation"
site:example.com filetype:json inurl:swagger OR inurl:openapi

The Google Hacking Database (GHDB)

The Google Hacking Database is a curated collection of search queries maintained at exploit-db.com/google-hacking-database. It contains thousands of dorks organized into categories, each one proven to reveal specific types of vulnerable or exposed information.

GHDB Categories

  • Footholds -- dorks that find entry points into systems (login pages, open services)
  • Files Containing Usernames -- queries that surface documents or pages listing user accounts
  • Sensitive Directories -- exposed directory listings containing configuration, backup, or admin files
  • Web Server Detection -- dorks that identify specific web server software and versions
  • Vulnerable Files -- queries targeting files known to contain exploitable vulnerabilities
  • Vulnerable Servers -- dorks that find servers running software with known security flaws
  • Error Messages -- queries that surface debug output, stack traces, and verbose error pages
  • Files Containing Passwords -- dorks that locate exposed credentials in configuration files, logs, and documents
  • Sensitive Online Shopping Info -- queries targeting e-commerce platforms with exposed data
  • Advisories and Vulnerabilities -- dorks related to specific CVEs and security advisories
💡
Using the GHDB effectively

Do not blindly run every dork in the database. Study the categories relevant to your engagement, understand what each query looks for and why, then adapt the dorks to your specific target by adding the site: operator. Many GHDB entries are generic -- you need to scope them to be useful in an assessment.

Beyond Google

While Google has the largest index, other search engines index different content and support their own advanced operators. Using multiple engines increases your coverage during reconnaissance.

Bing

Bing supports many similar operators and sometimes indexes pages that Google has removed from its results. Notably, Bing's ip: operator lets you find all websites hosted on a specific IP address -- something Google does not support.

# Bing: Find all sites on an IP address
ip:93.184.216.34

# Bing: Find specific file types
site:example.com filetype:env

# Bing: Search by language or region
site:example.com language:en

DuckDuckGo

DuckDuckGo supports basic operators like site: and filetype:. It does not track searches, so your reconnaissance queries are not logged to your search profile -- useful for OPSEC-conscious assessments.

Shodan and Censys

Shodan and Censys are specialized search engines that index internet-facing devices and services rather than web page content. They can reveal open ports, SSL certificates, running services, and device banners that Google would never find. These are covered in detail in separate tutorials.

Automating Google Dorking

Running dorks manually is effective for targeted searches, but automated tools can execute hundreds of queries systematically. Use these responsibly -- excessive automated queries will trigger Google's rate limiting and CAPTCHA challenges.

Pagodo (Passive Google Dork)

Pagodo automates Google dorking by pulling dorks from the GHDB and running them against a target domain. It respects rate limits and logs all findings.

# Install Pagodo
git clone https://github.com/opsdisk/pagodo.git
cd pagodo
pip install -r requirements.txt

# First, update the local GHDB dork list
python ghdb_scraper.py -s

# Run dorks against a target (with delays to avoid rate limiting)
python pagodo.py -d example.com -g dorks.txt -l 100 -s -e 35 -j 1.1

Google Dorking with Recon-ng

The Recon-ng framework includes modules that leverage Google search for reconnaissance. It integrates dorking into a broader workflow alongside other information gathering techniques.

# In Recon-ng, after setting up a workspace
modules load recon/domains-hosts/google_site_web
options set SOURCE example.com
run

Manual scripting approach

For simple automation, you can script a list of dorks and process them sequentially. Always add delays between queries to avoid triggering anti-bot protections.

# dorks.txt -- one dork per line, scoped to your target
site:example.com filetype:env
site:example.com filetype:sql
site:example.com intitle:"index of"
site:example.com inurl:admin
site:example.com inurl:login
site:example.com filetype:log
site:example.com filetype:bak OR filetype:old
site:example.com "phpinfo()"
site:example.com filetype:xml "password"
site:example.com intitle:"dashboard" OR intitle:"control panel"
💡
Rate limiting and CAPTCHA

Google actively detects automated queries and will temporarily block your IP or require CAPTCHA solving. Professional tools like Pagodo implement delays and jitter between requests. Never use high-speed scraping -- it will get you blocked and may violate Google's Terms of Service. For large-scale assessments, consider using the Google Custom Search API, which allows programmatic access with defined rate limits.

Defensive Countermeasures

Understanding Google dorking from the attacker's perspective allows you to defend against it. As a security professional, you should regularly dork your own organization to find and fix exposures before an attacker does.

robots.txt

The robots.txt file tells search engine crawlers which paths to avoid indexing. While this is not a security control (attackers can ignore it), it prevents accidental indexing of sensitive paths by legitimate crawlers.

# /robots.txt -- prevent indexing of sensitive directories
User-agent: *
Disallow: /admin/
Disallow: /config/
Disallow: /backup/
Disallow: /logs/
Disallow: /api/internal/
Disallow: /.git/
Disallow: /.env
⚠️
robots.txt is not a security measure.

Do not rely on robots.txt to protect sensitive content. It is a public file that anyone can read, and ironically, it tells attackers exactly which directories you consider sensitive. Always use proper access controls (authentication, firewall rules, network segmentation) to protect sensitive resources. Use robots.txt only as an additional layer to prevent accidental indexing.

Meta noindex tag

Adding a noindex meta tag to sensitive pages instructs search engines not to include them in search results, even if they are crawled.

<!-- Add to the <head> of pages that should not appear in search results -->
<meta name="robots" content="noindex, nofollow">

Server hardening

  • Disable directory listing -- configure your web server to return 403 instead of showing directory contents
  • Block sensitive file types -- configure Nginx or Apache to deny access to .env, .git, .sql, .bak, .log, and other files that should never be served
  • Remove error verbosity -- configure applications to show generic error pages in production instead of stack traces, file paths, or version numbers
  • Enforce authentication -- protect admin panels, API documentation, database tools, and internal dashboards with authentication and IP restrictions
  • Audit your indexed content -- regularly search site:yourdomain.com and review what Google has indexed; use Google Search Console to request removal of sensitive URLs

Google Search Console removal

If sensitive content has already been indexed, you can request its removal through Google Search Console. This removes the URL from search results but does not delete the actual content from your server -- you must also fix the underlying exposure.

Practical Dorking Workflow

During an authorized engagement, follow this structured workflow to ensure thorough and organized results.

Step 1: Scope enumeration

Start with broad site: queries to understand the target's web presence. Identify subdomains, major sections, and the technologies in use.

# Map the target's web presence
site:example.com
site:*.example.com
site:example.com inurl:api

Step 2: Sensitive file discovery

Search for file types that commonly contain sensitive data. Work through each file type systematically.

site:example.com filetype:env
site:example.com filetype:sql
site:example.com filetype:log
site:example.com filetype:bak OR filetype:old
site:example.com filetype:conf OR filetype:cfg
site:example.com filetype:pem "PRIVATE KEY"

Step 3: Infrastructure exposure

Look for exposed directories, admin panels, and debug endpoints.

intitle:"index of" site:example.com
site:example.com inurl:admin OR inurl:login
site:example.com intitle:"phpinfo()"
site:example.com inurl:swagger OR inurl:api-docs

Step 4: Error and leak detection

Search for error messages, debug output, and inadvertent information disclosure.

site:example.com "Warning:" "on line"
site:example.com "stack trace" OR "traceback"
site:example.com "DB_PASSWORD" OR "API_KEY"
site:example.com "internal use only" OR "do not distribute"

Step 5: Document findings

For each finding, record the dork that produced it, the URL discovered, a screenshot of the result, the potential security impact, and a recommended remediation. Organize findings by severity for your report.

🎉
Pro tip: build a personal dork library

Over time, maintain a text file of dorks that have produced results in past engagements. Organize them by category and keep notes on what each one typically finds. A curated personal library is more effective than running the entire GHDB because it contains queries proven to work in real-world assessments.

Google dorking exists in a legal gray area. The search queries themselves are legal -- you are using a public search engine. However, what you do with the results can cross legal boundaries.

What is generally legal

  • Running search queries on public search engines
  • Viewing search result snippets and cached pages
  • Dorking your own domains and infrastructure
  • Dorking targets covered by a signed authorization agreement
  • Documenting and reporting exposed information to the affected organization

What can be illegal

  • Accessing systems or downloading files found through dorking without authorization
  • Using discovered credentials to log into accounts
  • Exploiting vulnerabilities found through Google dorking without permission
  • Automated mass dorking that violates Google's Terms of Service
  • Collecting personal data in violation of GDPR or other privacy regulations
⚠️
Finding exposed data does not authorize access.

If you discover an exposed database dump or admin panel through Google dorking, you may be committing a crime by accessing it -- even if no password was required. The legal standard in many jurisdictions is whether you were authorized to access the system, not whether it was technically protected. Always operate within your written scope of work.

Summary

In this tutorial, you learned how to use Google dorking for authorized security assessments:

  • Core operators -- site:, filetype:, intitle:, inurl:, intext:, cache:, exclusion (-), and wildcards
  • Operator chaining -- combining multiple operators for precise, targeted queries
  • Dork categories -- exposed files, directory listings, login pages, error messages, credentials, and server identification
  • The GHDB -- a curated database of thousands of proven dorks organized by category
  • Beyond Google -- using Bing, DuckDuckGo, Shodan, and Censys for broader coverage
  • Automation -- tools like Pagodo and Recon-ng for systematic dorking at scale
  • Defensive measures -- robots.txt, noindex tags, server hardening, and content auditing
  • Legal boundaries -- always have written authorization; finding data does not authorize accessing it
🎉
Well done!

You now have a comprehensive understanding of Google dorking as a reconnaissance technique. Combined with OSINT, passive recon, and active scanning, dorking gives you a powerful toolkit for the information gathering phase of any authorized security assessment. Practice on your own domains and CTF targets to build speed and intuition before using these skills in professional engagements.