Internet is a useful source of information. Unfortunately, it also stores plenty of our personal data, making us a tasty morsel for potential e-thieves. I decided to write an article about a few methods of gathering the data which can be used to build a profile of a potential target – and to prepare for a penetration test.
A penetration test is a form of information security assurance. Sometimes, it begins with an extensive reconnaissance phase. But how to prepare for a penetration test? There are some open source intelligence (OSINT) techniques which can unveil a lot of data stored online. For example some information about companies which are characterized by a large presence online. Below, I’ll present a few steps you should go through to prepare for a penetration test.
Step 1: Begin with HaveIBeenPwned and WHOIS
Email addresses are one of the most vulnerable, online pieces of information about you. Whenever you are registering on a website or creating an account in an app – you are asked to provide your email address. And it often happens that websites or apps are being hacked. In result – email addresses and all the data about the users may be leaked. There are some methods to verify whether given email address has been intercepted.
One of the most popular is HaveIBeenPwned. It tells you if your email has been found to be a part of a breach.
Who.is – another search engine. It can help you look through a provided website address to find IP history information, domain expiry date and even phone numbers which can be used in social engineering attacks.
Both tools are rather simple so it’s definitely worth using them at the very beginning of preparation for a penetration test.
Step 2: Perform advanced searching with Google Hacking
A more advanced tool is the Exploit Database and Google Hacking. First one is a “CVE compliant archive of public exploits and corresponding vulnerable software, developed for use by penetration testers and vulnerability researchers”. And Google Hacking is a part of Exploit Database – “categorized index of Internet search engine queries designed to uncover interesting, and usually sensitive, information made publicly available on the Internet. In most cases, this information was never meant to be made public but due to any number of factors this information was linked in a web document that was crawled by a search engine which subsequently followed that link and indexed the sensitive information”. It involves using advanced operators in the Google search engine to locate specific strings of text within search results.
Simple example: intext:”please find attached” “login” | password ext:pdf
It identifies interesting files (log files for example) which contain sensitive information and full system path of the application using search queries like these presented below.
It’s a great tool which can help you check whether your website is safe and if all the sensitive data is properly hidden on a website.
Step 3: Check out robots.txt file for hidden, interesting directories
Most frameworks, content management systems or online shops have well-defined directory structures. That’s why normally the admin directory is under a /admin or a /administration request. If it’s not a case, the robots.txt file will most probably contain the directory name you are looking for. That’s why it’s worth using this simple trick to obtain a directory name.
The robots.txt is a file stored in the main server catalog. It helps to hide some directories on a website from robot search. There are two ways of hiding directories with the robots.txt file.
If you want to disallow robots from indexing the whole website, you should use a command:
If you want to disallow robots from indexing a particular directory (ie. “images”), you should use a command:
When you are about to prepare for a penetration test, you should check the robots.txt file to see if any potentially interesting directories have been hidden. If the website administrator decided to hide these folders – it can mean that some important (or classified) pieces of information are stored there.
Step 4: Look through the LinkedIn profile of the company
Most often, the weakest passwords in companies belong to the non-tech management employees. That’s why LinkedIn may be a good source of information. Searching through this website will help you identify directors, senior managers and some other, non-technical staff members. Then you can verify whether their passwords are strong enough. Searching through the “About us” page on the company website can lead you to find an easy target.
Based on the discovery of a couple of emails, a standard format for usernames can be derived. Sometimes it’s very helpful to use a password reset functionality.
Step 5: Perform IP address-related checks
Using reverse IP lookups, you can identify additional targets to poke around. Bing has an excellent search feature which uses IP. It’s capable of finding the websites which are hosted by a specific IP address. Using “IP: ***.***.***.***” in Bing browser may help you find which website is hosted by the provided IP address.
Step 6: Enumerate subdomains
Subdomain enumeration is one of the most important steps in assessing and discovering assets which have been exposed online by the client. It may have been done either deliberately as part of their business or accidentally due to a misconfiguration.
Subdomain enumeration can be done using variety of tools like dnsrecon, subbrute or knock.py. Alternatively, you can perform it using Google’s site operator or through websites like dnsdumpster or virustotal.com.
Step 7: Check out HTTP status codes and response headers
Doesn’t matter if it’s a valid page, a non-existing page, a redirecting page or a simple directory name – whenever you’re investigating it, look for some subtle typos, extra spaces or redundant values in the response headers. Why is it so important? HTTP Header stores a lot of sensitive information, such as cookie strings or web application technologies. This kind of data can be used when troubleshooting or… whilst planning an attack against a web server.
Step 8: Make use of Shodan and Censys
Both Shodan and Censys are the tools which may help to find files, IP addresses, exposed services and error messages. Programmers at Shodan and Censys have painstakingly scanned the Internet. They enumerated services and categorised their findings making them searchable through simple keywords.
Shodan can be used to check which device is connected to the Internet, who uses it and where it is located. Censys allows users to discover the devices, networks, and infrastructure on the Internet and monitor how it changes over time.
Step 9: Browse the site’s HTML
Content like images, JS and CSS files may be hosted on S3 buckets owned by the client. Buckets are simple storage services. They allow storing the objects through a web service.
It may be possible to identify if the client uses cloud infra to host static/dynamic content while performing standard reconnaissance.
In such cases finding buckets which are used by a client can be really rewarding. Especially if a client has misconfigured permissions on the buckets.
Tools like DigiNinja’s Bucket Finder can be used to automate the search process by brute forcing names of buckets. This tool requires a well-curated list of bucket names and potentially full URLs to be effective.
A private bucket (like the example below) will not disclose files and resources.
A public bucket shows the names of files and resources (like one of the examples below). These files can then be downloaded using full URLs.
Open source intelligence (OSINT) also known as “reconnaissance” is the first step of a penetration test. It’s an ever-growing and continuously enhancing field of study. Presented techniques are only a tip of the iceberg, but these nine steps are important parts of aforementioned reconnaissance. Using these simple techniques may help you build a profile of a target, reveal several weaknesses and prepare for a penetration test. After performing all of the checks above – you can step forward to a regular pen-test. But this is a broader subject for a separate article.