Open Source Intelligence with theHarvester

Another interesting tool for gathering informations, which can be used in combination with Recon-ng, is theHarvester.
Even if this tool is not as complex as Recon-ng, it helps to harvest a huge quantity of data in an automated way by using web search engines and social networks. By doing so, this information gathering suite allows to understand target footprints on the Internet, so it is useful to know what an attacker can see on the web about a certain company.

Installation

If you are using Kali Linux, theHarvester is already a part of your arsenal. Otherwise you can get the latest version from author repository using git clone and install the tool on your favourite Linux distro: https://github.com/laramies/theHarvester.

Usage

In Kali Linux, theHarvester can be started by navigating in the applications menu by clicking on Applications > Information Gathering > OSINT Analysis > theharvester like shown in the following image:

Same thing can be done by clicking on the “Show application” menu:

Another possibility is launching it by simply opening the Terminal and typing theharvester. In any case, we are prompted with the tool banner, version, author informations and usage instructions:

The instructions are pretty clear: we have a series of parameters to set as arguments through which we can customize the search. For each of them there is a description of what they do; the most importants are “-d” and “-b” which are mandatory and determine respectively the target domain about which we want to gather informations and the data sources we want to use to find them (the list of the sources that can be set is reported in the description).
Some data sources require an API key to work: while the acquisition of some of them is free, like the Bing one, other require the payment of a fee, like the Shodan one.

We will see now an example of information gathering activity performed on the National Institute of Standards and Technology (NIST) domain.
Since Google is one of the mentioned data sources we can use for the search, we can start with a simple request by asking all hosts and emails that Google Search Engine can find in the first 100 results for the domain “nist.gov” (email addresses written here are fictional for privacy reasons):

root@kali:~# theharvester -d nist.gov -b google -l 100
......................................................
[-] Searching in Google:
	Searching 0 results...
	Searching 100 results...


[+] Emails found:
------------------
bugsbunny@nist.gov
daffyduck@nist.gov
foghornleghorn@nist.gov

[+] Hosts found in search engines:
------------------------------------
[-] Resolving hostnames IPs... 
52.71.87.193:acvp.nist.gov
132.163.4.217:csrc.nist.gov
132.163.4.217:face.nist.gov
129.6.89.132:inside.nist.gov
132.163.4.217:itl.nist.gov
129.6.13.177:nvd.nist.gov
129.6.13.111:nvlpubs.nist.gov
52.71.217.42:pages.nist.gov
24.56.178.140:time.nist.gov
129.6.13.178:web.nvd.nist.gov
129.6.24.30:webbook.nist.gov
132.163.4.18:www.glb.nist.gov
132.163.4.18:www.nist.gov

As reported above, the tool has quickly found emails, hostnames and has also resolved IP addresses.

Another interesting feature is the capability to check for virtual hosts: through DNS resolution, the tool verifies if a certain IP address is associated with multiple hostnames. This is a really important information because the Security for a given host on that IP depends not only on its Security level, but also from how securely are configured the others hosted on that same IP. In fact, if an attacker comprimises one of them and gains access to the underlying server, then he can easily reach every other virtual host.
To launch a virtual host search we just need to add “-v”:

root@kali:~# theharvester -d nist.gov -b google -l 100 -v
.........................................................
[+] Virtual hosts:
==================
132.163.4.217:csrc.nist.gov
132.163.4.217:trecvid.nist.gov
132.163.4.217:www.atp.nist.gov
132.163.4.217:www.itl.nist.gov
132.163.4.217:fire.nist.gov
132.163.4.217:www.iapws.org
132.163.4.217:www.boulder.nist.gov
132.163.4.217:itl.nist.gov
132.163.4.217:trec.nist.gov
132.163.4.217:duc.nist.gov
132.163.4.217:www.baldrige.nist.gov
132.163.4.217:zing.ncsl.nist.gov
132.163.4.217:biometrics.nist.gov
132.163.4.217:baldrige.nist.gov
132.163.4.217:ovrt.nist.gov
132.163.4.217:cryogenics.nist.gov
132.163.4.217:www.cryogenics.nist.gov
132.163.4.217:www-nlpir.nist.gov
132.163.4.217:ieee1451.nist.gov
132.163.4.217:w3.antd.nist.gov
132.163.4.217:tides.nist.gov
132.163.4.217:blea.doc.gov
132.163.4.217:motion.aptd.nist.gov
132.163.4.217:www.antd.nist.gov
132.163.4.217:iapws.org
129.6.13.111:nvlpubs.nist
129.6.13.111:gsi.nist
129.6.13.111:museum.nist
129.6.13.111:nvlpubs.nist.gov
129.6.13.111:gsi.nist.gov
129.6.13.111:museum.nist.gov
129.6.13.111:srdata.nist.gov
129.6.24.30:webbook.nist.gov
132.163.4.162:www.nist.gov
132.163.4.162:www.itl.nist.gov
132.163.4.162:tf.nist.gov
132.163.4.162:cnst.nist.gov
132.163.4.162:www.baldrige
132.163.4.162:fire.nist.gov
132.163.4.162:www.bldrdoc.gov
132.163.4.162:nvl.nist.gov
132.163.4.162:www.glb.nist.gov
132.163.4.162:gsi.nist.gov
132.163.4.162:math.nist.gov
132.163.4.162:ieee1451.nist.gov
132.163.4.162:www.tf.nist.gov

theHarvester is also able to acquire names of persons related to the target domain by crawling social networks such as LinkedIn; this can be done by simply using as data source the argument “linkedin” (fictional names are reported here for privacy reasons):

root@kali:~# theharvester -d nist.gov -b linkedin
.................................................
[-] Searching in Linkedin..
	Searching 100 results..
Users from Linkedin:
====================
Bugs Bunny
Daffy Duck
Foghorn Leghorn
.................................................

Once you have emails and names you can try to associate them to find a correspondence.

It is ok to have results printed on the terminal standard output, but when we are dealing with a big amount of data it is nice to report them in a file for later use. theHarvester offers the opportunity to save results on both XML and HTML formats by specifying file name with “-f” option.
Before launching the command, it is always a good practice to create a folder where we can store gathered data about the target:

root@kali:~# mkdir NIST
root@kali:~# cd NIST
root@kali:~/NIST#

Then we can start the search and this time we use “-b all” which harvest informations using Google, PGP key server, Bing and Exalead:

root@kali:~/NIST# theharvester -d nist.gov -b all -l 100 -v -f results.html
........................................................................
[+] Saving files...
Files saved!

If the files are correctly saved we get the “Files saved!” message and we find them inside our current folder:

root@kali:~/NIST# ls
results.xml  results.html 

Finally we can open the HTML file with our favourite web browser:

As shown in the above image, we get a nice graph reporting the percentage of gathered data for each category part of our search: emails, hosts and virtual hosts. After that we just get a list of all the elements for each category (only a few lines are displayed here).

Even if these are the parameters I use the most, feel free to play with the others.

Conclusions

theHarvester is a valuable tool for OSINT which allows to quickly discover a good amount of data, especially email addresses. Remember that you need to verify informations: for example, it could be that an employer is not working anymore on a certain company, but his email address is still present on the web and so it will be returned in the results.
Automatic tools are useful, but still their outputs need to be correctly managed and interpreted.