Open Source Web Reconnaissance with Recon-ng

During a penetration test, a big part of the success in the exploitation phase depends from how good the information gathering was performed. Since this activity, especially when dealing with a huge amount of informations, is time consuming, it is a good idea to rely on tools which make reconnaissance in automated way.

Recon-ng is an incredibly powerful tool for Open Source Intelligence Gathering (OSINT); actually, it is a reconnaissance framework written in Python built with a Metasploit like usage model (we will see what Metasploit is further on, for now it is enough to know that it is the most famous penetration testing framework).
Reconnaissance is considered as the activity of acquiring open source informations, i.e. available on the Internet, about a target in a passive way (passive reconnaissance); conversely, discovery is the activity which permits to acquire informations by sending packets directly to the target (active reconnaissance). Even if Recon-ng is mainly a passive reconnaissance framework, it includes also some elements for discovery and exploitation.

Installation

Since we will use a lot of tools during the next posts, I highly suggest to set up a Virtual Machine with a Penetration Testing distribution installed on. Personally I use VMware Workstation 12 Player as hypervisor for server and desktop virtualization which is free and can be downloaded from the official website. Regarding operating systems, I use mainly Kali Linux, which is a Debian based distribution. This distro is very useful because it has a pretty good number of tools preinstalled and preconfigured leaving to the user a ready to use PT machine. I will not explain how to set up a VM since you can find a lot of tutorials about that on the web.

Anyway, you can still download Recon-ng on your favorite Linux distribution from author repository using git clone and installing required dependencies (this is also an option in Kali Linux in case you want the latest version available): https://bitbucket.org/LaNMaSteR53/recon-ng.

Usage

In Kali Linux, we can start Recon-ng in different ways. One is by navigating in the applications menu by clicking on Applications > Information Gathering > recon-ng like shown in the following image:

Same thing can be done by clicking on the “Show application” menu:

Another possibility is launching it by simply opening the Terminal and typing recon-ng. In any case, we are prompted with the framework banner, version and number of modules for each category:

Modules are the core of the framework and in the current version there are five categories:

  • Recon modules - for reconnaissance activities;
  • Reporting modules - for reporting results on a file;
  • Import modules - for importing values from a file into a database table;
  • Exploitation modules - for explotation activities;
  • Discovery modules - for discovery activities.

The good thing is that everyone can implement his own module written in Python and integrate it inside the framework.
Since we are dealing with information gathering, we will focus on recon modules.
The framework accepts commands via command line; to have a list of the commands just type help and press enter:

[recon-ng][default] > help

Commands (type [help|?] <topic>):
---------------------------------
add             Adds records to the database
back            Exits the current context
delete          Deletes records from the database
exit            Exits the framework
help            Displays this menu
keys            Manages framework API keys
load            Loads specified module
pdb             Starts a Python Debugger session
query           Queries the database
record          Records commands to a resource file
reload          Reloads all modules
resource        Executes commands from a resource file
search          Searches available modules
set             Sets module options
shell           Executes shell commands
show            Shows various framework items
snapshots       Manages workspace snapshots
spool           Spools output to a file
unset           Unsets module options
use             Loads specified module
workspaces      Manages workspaces

To display a list of all available modules for each category we can use the show command:

show modules

Since right now we are only interested in recon modules, we can limit the search to them:

[recon-ng][default] > show modules recon

  Recon
  -----
    recon/companies-contacts/bing_linkedin_cache
    recon/companies-contacts/indeed
    recon/companies-contacts/jigsaw/point_usage
    recon/companies-contacts/jigsaw/purchase_contact
    recon/companies-contacts/jigsaw/search_contacts
    recon/companies-contacts/linkedin_auth
    recon/companies-multi/github_miner
    recon/companies-multi/whois_miner
    recon/contacts-contacts/mailtester
    recon/contacts-contacts/mangle
    recon/contacts-contacts/unmangle
    recon/contacts-credentials/hibp_breach
    recon/contacts-credentials/hibp_paste
    recon/contacts-domains/migrate_contacts
    recon/contacts-profiles/fullcontact
    recon/credentials-credentials/adobe
    recon/credentials-credentials/bozocrack
    recon/credentials-credentials/hashes_org
    recon/domains-contacts/metacrawler
    recon/domains-contacts/pgp_search
    recon/domains-contacts/whois_pocs
    recon/domains-credentials/pwnedlist/account_creds
    recon/domains-credentials/pwnedlist/api_usage
    recon/domains-credentials/pwnedlist/domain_creds
    recon/domains-credentials/pwnedlist/domain_ispwned
    recon/domains-credentials/pwnedlist/leak_lookup
    recon/domains-credentials/pwnedlist/leaks_dump
    recon/domains-domains/brute_suffix
    recon/domains-hosts/bing_domain_api
    recon/domains-hosts/bing_domain_web
    recon/domains-hosts/brute_hosts
    recon/domains-hosts/builtwith
    recon/domains-hosts/google_site_api
    recon/domains-hosts/google_site_web
    ...................................

The structure for each module is the following:

module-category/data-conversion/module-name

Consider, for example, recon/domains-hosts/google_site_web: this performs a recon activity using Google Search Engine to convert an information about a domain into data about hosts of that domain. Keep in mind that certain modules require valid API key to run; some keys can be acquired by simply registering on the related website.
To select a module we need the use command:

use recon/domains-hosts/google_site_web

Once the module is selected we can show informations about it:

[recon-ng][default][google_site_web] > show info

      Name: Google Hostname Enumerator
      Path: modules/recon/domains-hosts/google_site_web.py
    Author: Tim Tomes (@LaNMaSteR53)

Description:
  Harvests hosts from Google.com by using the 'site' search operator. Updates the 'hosts' table with
  the results.

Options:
  Name    Current Value  Required  Description
  ------  -------------  --------  -----------
  SOURCE  default        yes       source of input (see 'show info' for details)

Source Options:
  default        SELECT DISTINCT domain FROM domains WHERE domain IS NOT NULL
  <string>       string representing a single input
  <path>         path to a file containing a list of inputs
  query <sql>    database query returning one column of inputs

In this way we can read the description and take a look at the options we can set before running the recon activity. As you can see, the action performed by this module is pretty the same as the one explained in the article Information gathering with Google Search Engine, but this time it is done in an automated way.
In case we want to analyze module source code we can either use show source or navigate to /usr/share/recon-ng/modules/recon/domains-hosts where the python file google_site_web.py is located (note that folders structure reflects modules categories and data conversions).
Once all required options are set up through set command, the module can be executed with run.

We will see now an example of reconnaissance activity performed on the National Institute of Standards and Technology (NIST) domain.
Before starting, we need to introduce the concept of workspace: Recon-ng allows to define a workspace for each target subject of reconnaissance; by doing this, it will create a database containing all gathered informations about the target itself. This is the reason why in the “framework help” shown before there is the query command, which allows to examine the DB using Standard Query Language (SQL), and also why import modules are present.

We start by creating a new workspace:

workspaces add NIST

After that, the command line shows the change from the default workspace to the new one. Then we need to associate a domain with the created workspace and finally we can check that everything is set up correctly by listing domains with show:

[recon-ng][default] > workspaces add NIST
[recon-ng][NIST] > add domains nist.gov
[recon-ng][NIST] > show domains

  +---------------------------------+
  | rowid |  domain  |    module    |
  +---------------------------------+
  | 1     | nist.gov | user_defined |
  +---------------------------------+

[*] 1 rows returned

Same result can be obtained with:

[recon-ng][NIST] > query select * from domains

This can be checked also by querying the database with an external tool; the DB is located in the following folder:

~/.recon-ng/workspaces/NIST

Here there is a file called data.db which is the database for NIST workspace; to explore the DB we can use the tool sqlite3 already installed in Kali Linux:

root@kali:~/.recon-ng/workspaces/NIST# sqlite3 data.db 
SQLite version 3.13.0 2016-05-18 10:57:30
Enter ".help" for usage hints.
sqlite> select * from domains;
nist.gov|user_defined

To exit from the program, just type .exit.

We can also add a company name:

[recon-ng][NIST] > add companies
company (TEXT): NIST
description (TEXT): National Institute of Standards and Technology
[recon-ng][NIST] > show companies

  +---------------------------------------------------------------------------------+
  | rowid | company |                  description                   |    module    |
  +---------------------------------------------------------------------------------+
  | 1     | NIST    | National Institute of Standards and Technology | user_defined |
  +---------------------------------------------------------------------------------+

[*] 1 rows returned

Adding domains and companies is the initial step because they are inputs used by modules to perform information gathering. To check all modules using these two informations as a starting point we can leverage the search command:

[recon-ng][NIST] > search domains-
[*] Searching for 'domains-'...

  Recon
  -----
    recon/domains-contacts/metacrawler
    recon/domains-contacts/pgp_search
    recon/domains-contacts/whois_pocs
    recon/domains-credentials/pwnedlist/account_creds
    recon/domains-credentials/pwnedlist/api_usage
    recon/domains-credentials/pwnedlist/domain_creds
    recon/domains-credentials/pwnedlist/domain_ispwned
    recon/domains-credentials/pwnedlist/leak_lookup
    recon/domains-credentials/pwnedlist/leaks_dump
    recon/domains-domains/brute_suffix
    recon/domains-hosts/bing_domain_api
    recon/domains-hosts/bing_domain_web
    recon/domains-hosts/brute_hosts
    recon/domains-hosts/builtwith
    recon/domains-hosts/google_site_api
    recon/domains-hosts/google_site_web
    recon/domains-hosts/hackertarget
    recon/domains-hosts/netcraft
    recon/domains-hosts/shodan_hostname
    recon/domains-hosts/ssl_san
    recon/domains-hosts/vpnhunter
    recon/domains-vulnerabilities/ghdb
    recon/domains-vulnerabilities/punkspider
    recon/domains-vulnerabilities/xssed
    recon/domains-vulnerabilities/xssposed

[recon-ng][NIST] > search companies-
[*] Searching for 'companies-'...

  Recon
  -----
    recon/companies-contacts/bing_linkedin_cache
    recon/companies-contacts/indeed
    recon/companies-contacts/jigsaw/point_usage
    recon/companies-contacts/jigsaw/purchase_contact
    recon/companies-contacts/jigsaw/search_contacts
    recon/companies-contacts/linkedin_auth
    recon/companies-multi/github_miner
    recon/companies-multi/whois_miner

Suppose we want to start populating our DB with hostnames related to nist.gov domain usign google_site_web module seen before; to check parameters required to run it we can display module options:

[recon-ng][NIST][google_site_web] > show options

  Name    Current Value  Required  Description
  ------  -------------  --------  -----------
  SOURCE  default        yes       source of input (see 'show info' for details)

Since we have already set the domain, the “Current Value” which says “default” is taken directly from the DB. Then, we can just run the module and after a little while we get the results:

[recon-ng][NIST][google_site_web] > run

--------
NIST.GOV
--------
[*] Searching Google for: site:nist.gov
[*] [host] www.nsrl.nist.gov (<blank>)
[*] [host] gams.nist.gov (<blank>)
[*] [host] physics.nist.gov (<blank>)
[*] [host] face.nist.gov (<blank>)
[*] [host] scap.nist.gov (<blank>)
[*] [host] patapsco.nist.gov (<blank>)
[*] [host] nvd.nist.gov (<blank>)
[*] [host] kinetics.nist.gov (<blank>)
[*] [host] srdata.nist.gov (<blank>)
[*] [host] www.cftt.nist.gov (<blank>)
[*] [host] cccbdb.nist.gov (<blank>)
[*] [host] museum.nist.gov (<blank>)
[*] [host] thermosymposium.nist.gov (<blank>)
[*] [host] www.atp.nist.gov (<blank>)
[*] [host] www.ctcms.nist.gov (<blank>)
[*] [host] usgcb.nist.gov (<blank>)
[*] [host] www.nist.gov (<blank>)
[*] [host] trecvid.nist.gov (<blank>)
[*] [host] stonewall.nist.gov (<blank>)
.......................................
-------
SUMMARY
-------
[*] 73 total (73 new) hosts found.

In this case we discovered 73 hosts related to the domain; we can show the list of discovered host:

[recon-ng][NIST][google_site_web] > show hosts

  +-----------------------------------------------------------------------------------------------------------------+
  | rowid |              host              | ip_address | region | country | latitude | longitude |      module     |
  +-----------------------------------------------------------------------------------------------------------------+
  | 1     | www.nsrl.nist.gov              |            |        |         |          |           | google_site_web |
  | 2     | gams.nist.gov                  |            |        |         |          |           | google_site_web |
  | 3     | physics.nist.gov               |            |        |         |          |           | google_site_web |
  | 4     | face.nist.gov                  |            |        |         |          |           | google_site_web |
  | 5     | scap.nist.gov                  |            |        |         |          |           | google_site_web |
  | 6     | patapsco.nist.gov              |            |        |         |          |           | google_site_web |
  | 7     | nvd.nist.gov                   |            |        |         |          |           | google_site_web |
  | 8     | kinetics.nist.gov              |            |        |         |          |           | google_site_web |
  | 9     | srdata.nist.gov                |            |        |         |          |           | google_site_web |
  | 10    | www.cftt.nist.gov              |            |        |         |          |           | google_site_web |
  | 11    | cccbdb.nist.gov                |            |        |         |          |           | google_site_web |
  | 12    | museum.nist.gov                |            |        |         |          |           | google_site_web |
  | 13    | thermosymposium.nist.gov       |            |        |         |          |           | google_site_web |
  | 14    | www.atp.nist.gov               |            |        |         |          |           | google_site_web |
  | 15    | www.ctcms.nist.gov             |            |        |         |          |           | google_site_web |
  | 16    | usgcb.nist.gov                 |            |        |         |          |           | google_site_web |
  | 17    | www.nist.gov                   |            |        |         |          |           | google_site_web |
  | 18    | trecvid.nist.gov               |            |        |         |          |           | google_site_web |
  | 19    | stonewall.nist.gov             |            |        |         |          |           | google_site_web |
.....................................................................................................................

As the table shows, we have empty columns ready to store additional informations for each host: these can be populated by hand or by running other modules using host informations we just gathered:

[recon-ng][NIST][google_site_web] > search hosts-
[*] Searching for 'hosts-'...

  Recon
  -----
    recon/hosts-domains/migrate_hosts
    recon/hosts-hosts/bing_ip
    recon/hosts-hosts/freegeoip
    recon/hosts-hosts/ipinfodb
    recon/hosts-hosts/resolve
    recon/hosts-hosts/reverse_resolve
    recon/hosts-hosts/ssltools
    recon/hosts-locations/migrate_hosts
    recon/hosts-ports/shodan_ip

We can find IP addresses for each host by running recon/hosts-hosts/resolve module, while the geolocation can be acquired with recon/hosts-hosts/freegeoip:

[recon-ng][NIST][freegeoip] > show hosts 

  +------------------------------------------------------------------------------------------------------------------------------------------+
  | rowid |              host              |   ip_address  |         region         |    country    | latitude | longitude |      module     |
  +------------------------------------------------------------------------------------------------------------------------------------------+
  | 1     | www.nsrl.nist.gov              | 129.6.24.57   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 2     | gams.nist.gov                  | 129.6.24.27   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 3     | physics.nist.gov               | 129.6.13.152  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 4     | face.nist.gov                  | 132.163.4.217 | Boulder, Colorado      | United States | 39.9668  | -105.2092 | google_site_web |
  | 5     | scap.nist.gov                  | 129.6.13.177  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 6     | patapsco.nist.gov              | 129.6.13.93   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 7     | nvd.nist.gov                   | 129.6.13.177  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 8     | kinetics.nist.gov              | 129.6.24.48   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 9     | srdata.nist.gov                | 129.6.13.111  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 10    | www.cftt.nist.gov              | 129.6.24.57   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 11    | cccbdb.nist.gov                | 129.6.13.59   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 12    | museum.nist.gov                | 129.6.13.111  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 13    | thermosymposium.nist.gov       | 132.163.4.124 | Boulder, Colorado      | United States | 39.9668  | -105.2092 | google_site_web |
  | 14    | www.atp.nist.gov               | 132.163.4.217 | Boulder, Colorado      | United States | 39.9668  | -105.2092 | google_site_web |
  | 15    | www.ctcms.nist.gov             | 129.6.24.51   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 16    | usgcb.nist.gov                 | 129.6.13.177  | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
  | 17    | www.nist.gov                   | 132.163.4.18  | Boulder, Colorado      | United States | 39.9668  | -105.2092 | google_site_web |
  | 18    | trecvid.nist.gov               | 132.163.4.217 | Boulder, Colorado      | United States | 39.9668  | -105.2092 | google_site_web |
  | 19    | stonewall.nist.gov             | 129.6.13.93   | Gaithersburg, Maryland | United States | 39.1403  | -77.222   | google_site_web |
..............................................................................................................................................

As shown, in minutes we have acquired tons of informations about target hosts.
Now we can lower the search level by digging even deeper: what about looking for contact informations such as names and email addresses? We can achieve this objective by running recon/domains-contacts/pgp_search: in fact as the description reports, this module searches the MIT public PGP key server for email addresses of the given domain. After module has been executed, we can display results stored in the DB (of course names and addresses in the following table are fictional for privacy reasons):

[recon-ng][NIST][pgp_search] > show contacts

  +----------------------------------------------------------------------------------------------------------------------------------------------+
  | rowid |   first_name  | middle_name |    last_name     |             email             |        title        | region | country |   module   |
  +----------------------------------------------------------------------------------------------------------------------------------------------+
  | 1     | Bugs          |             | Bunny            | bugs.bunny@nist.gov           | PGP key association |        |         | pgp_search |
  | 2     | Foghorn       |             | Leghorn          | foghorn.leghorn@nist.gov      | PGP key association |        |         | pgp_search |
  | 3     | Daffy         |             | Duck             | daffy.duck@nist.gov           | PGP key association |        |         | pgp_search |

This is not over yet: we can also search if those contacts have been involved in a databreach, like Adobe one in 2013. For this purpose there are two interesting modules, recon/contacts-credentials/hibp_breach and recon/contacts-credentials/hibp_paste: the first one leverages haveibeenpwned.com API to determine if email addresses are associated with breached credentials, while the other one uses the API to determine if email addresses have been published to various paste sites.

You can check if your email address has been compromised in data breaches by simply going on the Have I Been Pwned? (HIBP) website and launching a search. This service collects and analyzes database dumps and pastes leaked by data breaches happened over the years regarding millions of accounts.

All these informations can be useful during next phases of the attack, especially for Social Engineering (we will look into this technique in future articles).

Once collected enough informations, it is useful to report them in a document. Fortunately, Recon-ng offers modules to report results in different formats:

[recon-ng][NIST] > show modules reporting

  Reporting
  ---------
    reporting/csv
    reporting/html
    reporting/json
    reporting/list
    reporting/pushpin
    reporting/xlsx
    reporting/xml

For example, we can choose to save the returns in an HTML page file:

[recon-ng][NIST] > use reporting/html
[recon-ng][NIST][html] > show options

  Name      Current Value                                 Required  Description
  --------  -------------                                 --------  -----------
  CREATOR                                                 yes       creator name for the report footer
  CUSTOMER                                                yes       customer name for the report header
  FILENAME  /root/.recon-ng/workspaces/NIST/results.html  yes       path and filename for report output
  SANITIZE  True                                          yes       mask sensitive data in the report

[recon-ng][NIST][html] > set CREATOR Spread Security
CREATOR => Spread Security
[recon-ng][NIST][html] > set CUSTOMER NIST
CUSTOMER => NIST
[recon-ng][NIST][html] > run
[*] Report generated at '/root/.recon-ng/workspaces/NIST/results.html'.

Results can be then visualized using a common web browser:

Conclusions

Recon-ng is a valuable framework for reconnaissance which has a really good system for storing and managing data for later use.
We have seen only a small part of its real capabilities, so take your time to explore and experiment with it to take advantage of its true power.