How to prepare for a penetration test in 9 simple steps

5 min


Internet is a useful source of information. Unfortunately, it also stores plenty of our personal data, making us a tasty morsel for potential e-thieves. I decided to write an article about a few methods of gathering the data which can be used to build a profile of a potential target – and to prepare for a penetration test. 

A penetration test is a form of information security assurance. Sometimes, it begins with an extensive reconnaissance phase. But how to prepare for a penetration test? There are some open source intelligence (OSINT) techniques which can unveil a lot of data stored online. For example some information about companies which are characterized by a large presence online. Below, I’ll present a few steps you should go through to prepare for a penetration test.

Step 1: Begin with HaveIBeenPwned and WHOIS

Email addresses are one of the most vulnerable, online pieces of information about you. Whenever you are registering on a website or creating an account in an app – you are asked to provide your email address. And it often happens that websites or apps are being hacked. In result – email addresses and all the data about the users may be leaked. There are some methods to verify whether given email address has been intercepted.  

One of the most popular is HaveIBeenPwned. It tells you if your email has been found to be a part of a breach.

If you need to prepare for a penetration test, it’s good to use HaveIBeenPwned
Source: – another search engine. It can help you look through a provided website address to find IP history information, domain expiry date and even phone numbers which can be used in social engineering attacks. is a good tool when you need to prepare for a penetration test

Both tools are rather simple so it’s definitely worth using them at the very beginning of preparation for a penetration test.

Step 2: Perform advanced searching with Google Hacking

A more advanced tool is the Exploit Database and Google Hacking. First one is a “CVE compliant archive of public exploits and corresponding vulnerable software, developed for use by penetration testers and vulnerability researchers”. And Google Hacking is a part of Exploit Database – “categorized index of Internet search engine queries designed to uncover interesting, and usually sensitive, information made publicly available on the Internet. In most cases, this information was never meant to be made public but due to any number of factors this information was linked in a web document that was crawled by a search engine which subsequently followed that link and indexed the sensitive information”. It involves using advanced operators in the Google search engine to locate specific strings of text within search results.

Simple example: intext:”please find attached” “login” | password ext:pdf

It identifies interesting files (log files for example) which contain sensitive information and full system path of the application using search queries like these presented below.

One of the more advanced tools which is often used before conducting penetration test is Exploit Database

Google Hacking is a part of Exploit Database – a great source of information to prepare for a penetration test
Google Hacking Database at Exploit Database

It’s a great tool which can help you check whether your website is safe and if all the sensitive data is properly hidden on a website.

See also: How to make your software GDPR-ready?

Step 3: Check out robots.txt file for hidden, interesting directories

Most frameworks, content management systems or online shops have well-defined directory structures. That’s why normally the admin directory is under a /admin or a /administration request. If it’s not a case, the robots.txt  file will most probably contain the directory name you are looking for. That’s why it’s worth using this simple trick to obtain a directory name.

Robots.txt file is normally stored in the main server catalog and it can be really helpful
Example of the robots.txt from The Software House website

The robots.txt is a file stored in the main server catalog. It helps to hide some directories on a website from robot search. There are two ways of hiding directories with the robots.txt file.

If you want to disallow robots from indexing the whole website, you should use a command:
User-agent: *
Disallow: /

If you want to disallow robots from indexing a particular directory (ie. “images”), you should use a command:
User-agent: *
Disallow: /images

When you are about to prepare for a penetration test, you should check the robots.txt file to see if any potentially interesting directories have been hidden. If the website administrator decided to hide these folders – it can mean that some important (or classified) pieces of information are stored there.

See also: Introduction to cryptography

Step 4: Look through the LinkedIn profile of the company

Most often, the weakest passwords in companies belong to the non-tech management employees. That’s why LinkedIn may be a good source of information. Searching through this website will help you identify directors, senior managers and some other, non-technical staff members. Then you can verify whether their passwords are strong enough. Searching through the “About us” page on the company website can lead you to find an easy target.

Based on the discovery of a couple of emails, a standard format for usernames can be derived. Sometimes it’s very helpful to use a password reset functionality.

Step 5: Perform IP address-related checks

Using reverse IP lookups, you can identify additional targets to poke around. Bing has an excellent search feature which uses IP. It’s capable of finding the websites which are hosted by a specific IP address. Using “IP: ***.***.***.***” in Bing browser may help you find which website is hosted by the provided IP address.

Thanks to Bing you can verify which websites are hosted by the given IP address

Step 6: Enumerate subdomains

Subdomain enumeration is one of the most important steps in assessing and discovering assets which have been exposed online by the client. It may have been done either deliberately as part of their business or accidentally due to a misconfiguration.

Subdomain enumeration can be done using variety of tools like dnsrecon, subbrute or Alternatively, you can perform it using Google’s site operator or through websites like dnsdumpster or

You can obtain subdomain names through dnsrecon brute force to prepare for a penetration test
Example of using dnsrecon shows how to obtain subdomain names through brute force

Step 7: Check out HTTP status codes and response headers

Doesn’t matter if it’s a valid page, a non-existing page, a redirecting page or a simple directory name – whenever you’re investigating it, look for some subtle typos, extra spaces or redundant values in the response headers. Why is it so important? HTTP Header stores a lot of sensitive information, such as cookie strings or web application technologies. This kind of data can be used when troubleshooting or… whilst planning an attack against a web server.

Burp Suite can help you when you need to perform troubleshooting
Source: Burp Suite

Step 8: Make use of Shodan and Censys

Both Shodan and Censys are the tools which may help to find files, IP addresses, exposed services and error messages. Programmers at Shodan and Censys have painstakingly scanned the Internet. They enumerated services and categorised their findings making them searchable through simple keywords.

Shodan can be used to check which device is connected to the Internet, who uses it and where it is located. Censys allows users to discover the devices, networks, and infrastructure on the Internet and monitor how it changes over time.

Shodan shows which devices are connected to the Internet

Step 9: Browse the site’s HTML

Content like images, JS and CSS files may be hosted on S3 buckets owned by the client. Buckets are simple storage services. They allow storing the objects through a web service.

It may be possible to identify if the client uses cloud infra to host static/dynamic content while performing standard reconnaissance.

In such cases finding buckets which are used by a client can be really rewarding. Especially if a client has misconfigured permissions on the buckets.

Tools like DigiNinja’s Bucket Finder can be used to automate the search process by brute forcing names of buckets. This tool requires a well-curated list of bucket names and potentially full URLs to be effective.

A private bucket (like the example below) will not disclose files and resources.

Private bucket should not disclose files and resources

A public bucket shows the names of files and resources (like one of the examples below). These files can then be downloaded using full URLs.

On the other hand, public bucket is normally showing the names of files


Open source intelligence (OSINT) also known as “reconnaissance” is the first step of a penetration test. It’s an ever-growing and continuously enhancing field of study. Presented techniques are only a tip of the iceberg, but these nine steps are important parts of aforementioned reconnaissance. Using these simple techniques may help you build a profile of a target, reveal several weaknesses and prepare for a penetration test. After performing all of the checks above – you can step forward to a regular pen-test. But this is a broader subject for a separate article.

Want to know even more about the subject of testing? Check our open-source end-to-end test automation tool.

What do you want to achieve?

You can upload a file (optional)

Upload file

File should be .pdf, .doc, .docx, .rtf, .jpg, .jpeg, .png format, max size 5 MB

0 % of

or contact us directly at [email protected]

This site is protected by reCAPTCHA and the Google
Privacy Policy and Terms of Service apply.


Thank you!

Your message has been sent. We’ll get back to you in 24 hours.

Back to page

We’ll get back to you in 24 hours

to get to know each other and address your needs as quick as possible.


We'll work together on possible scenarios

for the software development strategy in sync with your goals.


We’ll turn the strategy into an actionable plan

and provide you with experienced development teams to execute it.

Our work was featured in:

Tech Crunch
Business Insider

Aplikujesz do

The Software House

CopiedTekst skopiowany!

Nie zapomnij dodać klauzuli:

Kopiuj do schowka

Jakie będą kolejne kroki?


Rozmowa telefoniczna

Krótka rozmowa o twoim doświadczeniu,
umiejętnościach i oczekiwaniach.

Test task

Zadanie testowe

Praktyczne zadanie sprawdzające dokładnie
poziom twoich umiejętności.


Spotkanie w biurze

Rozmowa w biurze The Software House,
pozwalająca nam się lepiej poznać.

Response 200

Response 200

Ostateczna odpowiedź i propozycja
finansowa (w ciągu kilku dni od spotkania).