Back to all blogposts

How to prepare for a penetration test in 9 simple steps

Monika Sadlok

Monika Sadlok

QA Specialist

Have you ever heard about HavelBeenPwned, WHOIS or Google Hacking? Today I’m going to show you those tools and also you will get to know some methods on how to protect your email address and how to check if your address is hacked.

Email addresses are one of the most vulnerable, online pieces of information about you. Whenever you are registering on a website or creating an account in an app – you are asked to provide your email address. And it often happens that websites or apps are being hacked. As a result – email addresses and all the data about the users may be leaked.

Step 1: Begin with HaveIBeenPwned and WHOIS

There are some methods to verify whether a given email address has been intercepted.  

One of the most popular is HaveIBeenPwned. It tells you if your email has been found to be a part of a breach.

If you need to prepare for a penetration test, it’s good to use HaveIBeenPwned
Source: https://haveibeenpwned.com

Who.is – another search engine. It can help you look through a provided website address to find IP history information, domain expiry date and even phone numbers that can be used in social engineering attacks.

Who.is is a good tool when you need to prepare for a penetration test
Source: who.is

Both tools are rather simple so it’s definitely worth using them at the very beginning of preparation for a penetration test.

Step 2: Perform advanced searching with Google Hacking

A more advanced tool is the Exploit Database and Google Hacking.

And Google Hacking is a part of Exploit Database – “categorized index of Internet search engine queries designed to uncover interesting, and usually sensitive, information made publicly available on the Internet. In most cases, this information was never meant to be made public but due to any number of factors this information was linked in a web document that was crawled by a search engine which subsequently followed that link and indexed the sensitive information”.

It involves using advanced operators in the Google search engine to locate specific strings of text within search results.

Simple example: intext:”please find attached” “login” | password ext:pdf

It identifies interesting files (log files for example) which contain sensitive information and the full system path of the application using search queries like these presented below.

One of the more advanced tools which is often used before conducting penetration test is Exploit Database

Google Hacking is a part of Exploit Database – a great source of information to prepare for a penetration test
Google Hacking Database at Exploit Database

It’s a great tool that can help you check whether your website is safe and if all the sensitive data is properly hidden on a website.

See also: How to make your software GDPR-ready?

Step 3: Check out robots.txt file for hidden, interesting directories

Most frameworks, content management systems or online shops have well-defined directory structures. That’s why normally the admin directory is under a /admin or a /administration request. If it’s not a case, the robots.txt  file will most probably contain the directory name you are looking for. That’s why it’s worth using this simple trick to obtain a directory name.

Robots.txt file is normally stored in the main server catalog and it can be really helpful
Example of the robots.txt from The Software House website

The robots.txt is a file stored in the main server catalogue. It helps to hide some directories on a website from robot search. There are two ways of hiding directories with the robots.txt file.

If you want to disallow robots from indexing the whole website, you should use a command:

  • User-agent: *
  • Disallow: /

If you want to disallow robots from indexing a particular directory (ie. “images”), you should use a command:

  • User-agent: *
  • Disallow: /images

When you are about to prepare for a penetration test, you should check the robots.txt file to see if any potentially interesting directories have been hidden. If the website administrator decided to hide these folders – it can mean that some important (or classified) pieces of information are stored there.

See also: Introduction to cryptography

Step 4: Look through the LinkedIn profile of the company

Most often, the weakest passwords in companies belonging to the non-tech management employees. That’s why LinkedIn may be a good source of information. Searching through this website will help you identify directors, senior managers and some other, non-technical staff members. Then you can verify whether their passwords are strong enough. Searching through the “About Us” page on the company website can lead you to find an easy target.

Based on the discovery of a couple of emails, a standard format for usernames can be derived. Sometimes it’s very helpful to use a password reset functionality.

Step 5: Perform IP address-related checks

Using reverse IP lookups, you can identify additional targets to poke around. Bing has an excellent search feature which uses IP. It’s capable of finding the websites which are hosted by a specific IP address. Using “IP: ***.***.***.***” in Bing browser may help you find which website is hosted by the provided IP address.

Thanks to Bing you can verify which websites are hosted by the given IP address

Step 6: Enumerate subdomains

Subdomain enumeration is one of the most important steps in assessing and discovering assets that have been exposed online by the client. It may have been done either deliberately as part of their business or accidentally due to a misconfiguration.

Subdomain enumeration can be done using variety of tools like dnsrecon, subbrute or knock.py. Alternatively, you can perform it using Google’s site operator or through websites like dnsdumpster or virustotal.com.

You can obtain subdomain names through dnsrecon brute force to prepare for a penetration test
An example of using dnsrecon shows how to obtain subdomain names through brute force

Step 7: Check out HTTP status codes and response headers

Doesn’t matter if it’s a valid page, a non-existing page, a redirecting page or a simple directory name – whenever you’re investigating it, look for some subtle typos, extra spaces or redundant values in the response headers.

Why is it so important? HTTP Header stores a lot of sensitive information, such as cookie strings or web application technologies. This kind of data can be used when troubleshooting or… whilst planning an attack against a web server.

Burp Suite can help you when you need to perform troubleshooting
Source: Burp Suite

Step 8: Make use of Shodan and Censys

Both Shodan and Censys are the tools that may help to find files, IP addresses, exposed services and error messages. Programmers at Shodan and Censys have painstakingly scanned the Internet. They enumerated services and categorised their findings making them searchable through simple keywords.

Shodan can be used to check which device is connected to the Internet, who uses it and where it is located.

Censys allows users to discover the devices, networks, and infrastructure on the Internet and monitor how it changes over time.

Shodan shows which devices are connected to the Internet

Step 9: Browse the site’s HTML

Content like images, JS and CSS files may be hosted on S3 buckets owned by the client. Buckets are simple storage services. They allow storing the objects through a web service.

It may be possible to identify if the client uses cloud infra to host static/dynamic content while performing standard reconnaissance.

In such cases finding buckets which are used by a client can be really rewarding. Especially if a client has misconfigured permissions on the buckets.

Tools like DigiNinja’s Bucket Finder can be used to automate the search process by brute-forcing names of buckets. This tool requires a well-curated list of bucket names and potentially full URLs to be effective.

A private bucket (like the example below) will not disclose files and resources.

Private bucket should not disclose files and resources

A public bucket shows the names of files and resources (like one of the examples below). These files can then be downloaded using full URLs.

On the other hand, public bucket is normally showing the names of files

See also: Personal cybersecurity best practices – checklist

For the end

Open-source intelligence (OSINT) also known as “reconnaissance” is the first step of a penetration test. It’s an ever-growing and continuously enhancing the field of study.

Presented techniques are only the tip of the iceberg, but these nine steps are important parts of the aforementioned reconnaissance. Using these simple techniques may help you build a profile of a target, reveal several weaknesses and prepare for a penetration test. After performing all of the checks above – you can step forward to a regular pen-test. But this is a broader subject for a separate article.

Want to know even more about the subject of testing?

Check our open-source end-to-end test automation tool.

Just released!
The State of Frontend 2024

Performance is the #1 challenge in 2024. 6028+ answers analyzed.

Read now

The Software House is promoting EU projects and driving innovation with the support of EU funds

What would you like to do?

    Your personal data will be processed in order to handle your question, and their administrator will be The Software House sp. z o.o. with its registered office in Gliwice. Other information regarding the processing of personal data, including information on your rights, can be found in our Privacy Policy.

    This site is protected by reCAPTCHA and the Google
    Privacy Policy and Terms of Service apply.

    We regard the TSH team as co-founders in our business. The entire team from The Software House has invested an incredible amount of time to truly understand our business, our users and their needs.

    Eyass Shakrah

    Co-Founder of Pet Media Group

    Thanks

    Thank you for your inquiry!

    We'll be back to you shortly to discuss your needs in more detail.