Knowing what personal information about employees and customers is available to potential attackers helps organizations determine the risk associated with data collected by malicious actors. Social engineering and direct attacks against authentication controls are enabled by services and databases maintained below the surface web. Organizations must understand what information is valuable to malicious actors as well as how and where to look for it.
Understanding the data available also has another benefit. This intelligence can quickly alert an organization to a recent breach or one in progress.
Social Engineering and VPI
Social engineering is a popular approach used by malicious actors (MAs) that leverages the user attack surface to achieve attack objectives. As with all attacks, the reconnaissance phase of a social engineering attack requires the collection of information about the targets; information about employees and their families is a valuable piece of MA reconnaissance. It enables the MA to learn enough about targets to create highly effective lures or to impersonate managers and supervisors (masquerading). Personal information valuable to MAs is already readily available. If not, it very likely will be.
Valuable personal information (VPI) used by malicious actors includes a long list of information:
- Name (full, maiden, mother’s maiden, alias)
- Personal ID number (SSN, passport, driver’s license, financial accounts, credit cards)
- Address (street, email)
- Personal characteristics (photographs, biometrics)
- Hobbies and interests
- Employment information (start date, salary, department, business role, manager, health insurance, education, certifications)
- Other (date of birth, place of birth, race, religion, weight, geographical indicators)
Usually, no one piece of information is enough to create a strong social engineering lure or to masquerade. MAs need enough information to convince the victims they are whom they claim to be or to touch chords of interest in the victim.
Breach Detection
Once a breach has occurred, the information obtained is often found for sale on the web. Knowing this information exists and is related to a specific organization can quickly alert a security team that a breach has occurred or is in process. This is better than waiting months for a customer or a security researcher to bring it to their attention. If not for sale or freely available, yet, conversations between MAs on the web in “secret†chat rooms might indicate breach planning (against your organization or your industry) or someone discussing breach results.
Web Layers
The web consists of three layers, as shown in the iceberg analogy in Figure 1. The part of the web we work with every day is the surface web. It is what Google indexes daily. However, it is only a tiny part of what is available… if you look deep enough.
The Deep Web
The deep web “…is any website that cannot be readily accessed through any conventional search engine…†(Shiels, 2019). Conventional search engines, like Google, do not index everything with their crawlers. Crawlers are processes that move across the web, collect website information, and return it to central servers for indexing. According to a Cornell University course blog (2017), search engines like Google, Yahoo, and Bing must index information to provide it to their users. However, they do not index most of the web for various reasons:
- Many pages are not linked to pages scanned by crawlers. Crawlers locate known pages and then follow links on those pages to find additional content. If pages are not linked, they are not scanned.
- Many pages require authentication or filling out a search form and clicking submit. Both of these conditions block crawlers.
- Mainstream search engines typically ignore illegal or other unsuitable content.
- Website creators can use a robot.txt file or a “noindex†and “nofollow†HTML meta tags to tell the crawlers to ignore the entire site or just specific content. Google, Yahoo, and Bing will ignore these pages. However, other engines designed for the deep and the dark web layers often do not.
The deep web contains both legal and illegal content, including,
- Exchanges between informants/whistleblowers and journalists
- Exchanges between both illicit and legitimate groups that want to remain private
- Communication by political protestors and other groups attempting to bypass censorship or to maintain anonymity
- Materials and research created outside normal commercial or academic publishing channels. It is often shared internally but not with the public and includes proposals, working papers, evaluations, white papers, and financial, research, project or technical reports
- Stolen information available to the public
- Carelessly stored material on internet-accessible servers, like password lists, that are not linked to any other pages
Search engines designed to crawl and index the deep web include Torch, DuckDuckGoOpens a new window , and several others listed by James Lizowski in his article, Best Deep Web Search EnginesOpens a new window .
The Dark Web
The dark web is also a small part of the web. It is where MAs sell stolen personal information, intellectual property, and other information of value. It is also where an MA can purchase services needed to launch and maintain an attack. It is the badlands of the web where no one should tread without taking precautions.
Most deep web search engines cannot access detailed information on the dark web because the information has value to malicious actors and must be protected. Protection often takes the form of password-protected markets. Dark web markets usually provide information and services only after payment is made. For these markets, authentication is required. This prevents indexing.
Table 1 compares the three web layers.
Deep and Dark Web Intelligence
The lower layers of the web are excellent sources of threat intelligence. First, organizations can gather information about emerging threats against themselves, their industry, or organizations in general. This intelligence allows security teams to perform necessary risk assessments and manage risk proactively. However, no organization can stop all theft attempts.
The average time between a breach and its discovery is 206 days (Irwin, 2019). This provides time for malicious actors to gather large amounts of information and sell it or use it in ways detrimental to the victim organization, its employees, or its customers. Scanning the deep and dark webs can quickly provide information about breaches, including the stolen information of victim organizations and information.
Larger organizations can afford to engage threat intelligence vendors to perform scans for them. These scans also include correlation and analysis of information found, including (Henry, 2018)
- Exchange of PII
- Credential exchange
- Information reconnaissance
- Phishing attack coordination
- Discussion about trade secrets and sensitive assets
Other organizations should consider performing intelligence activities themselves. The rest of this article explains how to perform threat intelligence activities on the surface and deeper web layers: also known as open source intelligenceOpens a new window .
Accessing the Lower Web Layers
Before attempting to access the lower layers of the web, it is vital to take precautions. Failure to take preventive measures may result in the compromise of intelligence gathering (IG) systems and the connected network.
- Implement a VPN for the IG computer that shows its location somewhere other than where it actually sits.
- Use a virtual machine for IG. This allows quick eradication of potential malware picked up during information gathering. Do not rely on antivirus/antimalware solutions alone.
- Consider placing IG devices on a network segment isolated from the rest of your internal network.
- Turn off all script execution on the IG device.
- If you plan to subscribe to any services because you want to gather more in-depth intelligence, use an email address not associated with your organization. Use of Protonmail is an excellent way to create an anonymous account.
- Avoid using any programs and applications using the internet not directly related to IG.
- If using Windows 10, harden it by using the highest Microsoft security configuration framework level possible.
- Disconnect/disable all microphones, cameras, and other peripheral devices. Printers that are exploitable should also be disconnected.
Once you securely configure your IG device, install and load the Tor browserOpens a new window bundle. The Tor browser, part of the bundle, enables access to deep and dark web sites. Disable Javascript in the Tor browser settings. I now walk you through some examples of deep web access.
Example 1
In this example, I did a quick search for password files using two search engines: Google with Safari and DuckDuckGo in Tor. DuckDuckGo supports Google dorksOpens a new window . I entered the search string intext:password filetype:txt into both search engines. Figure 2 shows the first few Google results, and Figure 3 shows the first few DuckDuckGo results. They are quite different.
Figure 2 shows harmless information about commonly used passwords. If you look far enough, there are a few mistakenly stored passwords on the surface web. However, there is much more on the deep web.
The results in Figure 3 contain text files containing user IDs and passwords available on the deep web. These files, and many like them, were not created by MAs. Careless, authorized users created them.
Example 2
Figure 4 shows another result. It is a saved email with user IDs and passwords. I removed information that leads to identifying the application involved.
Many of the websites in deep and dark web layers do not use standard top-level domains (.com, .net, .edu,…) to remain “secret.†Instead, they use .onion, preceded by a usually meaningless character string not resolvable by DNS. A proxy service like the Tor bundle is needed to access the .onion websites. A sample onion URL is msydqstlz2kzerdg.onion. This is AHMIA.FI, a search engine that indexes hidden services on the dark web. In this case, the website is accessible via the onion URL and the FI URL. However, this is not usually the case. Directories like TheDarkWebLinks provides onion URLs for a wide variety of sites only accessible by using solutions like the Tor Project. A step-by-step explanation of how these URLs work and how websites are anonymously accessed with them is provided in a Stack Exchange postOpens a new window .
Example 3
And then there are the markets. OnionShop, for example, provides both hacking information and drugs. See Figure 5. This just one of many markets providing hacking advice and services.
Example 4
I tested a tool that allows you to search for a specific email or user ID. The results are shown in Figure 6. This is just one of many services providing searches through exploited database information. To get the actual passwords, of course, you must pay a fee, as shown in Figure 7.
These and other services make it easy for an MA to collect information on high-value targets when planning spear phishing attacks. Surface web services, like TruthFinderOpens a new window and BeenVerifedOpens a new window , also provide information to supplement what is found on the deeper web levels.
Example 5
Finally, there are many datasets for sale. One example is shown in Figure 8. It is the existence of these data sets that organizations can search. The details are not available without payment, but we know there was a breach and who was involved.
…and don’t forget the chat roomsOpens a new window .
Much of the threat intelligence available takes some time, and often a small investment, to collect. However, understanding what is happening on the darker side of the web is essential for managing risk.
Social Engineering Risk
Even if you are unable to find their information via deep and dark web searches specifically, that does not mean it is not out there. Remember that most of the data of value to malicious actors are kept behind password walls. It only takes a look at some of the most significant breaches to know that at least some of our employees are potentially vulnerable.
The most significant loss of personal information was the 2013 Yahoo breachOpens a new window , resulting in the loss of information on about 3 billion customers. This was bigger, but not more unsettling, than the 2018 Exactis breachOpens a new window that resulted in the loss of personal information of 340 million individuals. Figure 9 lists other notable breaches.
Another way MAs collect employee information is from social network scraping. Social network scraping services scan sites like Twitter looking for information useful to their customers. Although some social networks, like Facebook and LinkedIn, have controls in place to block this, there is still plenty of information available.
Services like the one shown in Figure 10 make it easy for an MA to hire assistance to access social network accounts for direct access to information instead of relying on general scraping.
Conclusion
Information about a least some of our employees is already available to MAs. If it isn’t, it only takes some bitcoin to get it. Further, it is possible to discover the results of a breach against our organizations or the planning for a future one by using the right tools and applying the right resources.
Understanding the deeper layers of the web and what is available there can help us do a more effective job of assessing threats, likelihoods of occurrence, and effectiveness of controls: including those that help us understand what is stored on our internet facing devices. It can also help understand the need for proper employee training.
Works Cited
Cornell University. (2017, October 18). Google Can’t Search the Deep Web, So How Do Deep Web Search Engines Work. Retrieved May 2019, from Cornell University: https://blogs.cornell.edu/info2040/2017/10/18/google-cant-search-the-deep-web-so-how-do-deep-web-search-engines-work/
Henry, J., (2018, August). 7 Ways to Identify Darknet Cybersecurity Risks. Retrieved May 2019, from IBM Security Intelligence: https://securityintelligence.com/7-ways-to-identify-darknet-cybersecurity-risks/
Irwin, L., (2019, March). How long does it take to detect a cyber attack? Retrieved May 2019, from IT Governance: https://www.itgovernanceusa.com/blog/how-long-does-it-take-to-detect-a-cyber-attack
Shiels, C., (2019, February). The Dark Web & the Deep Web: How to Access the Hidden Internet Today. Retrieved May 2019, from Digital.com: https://digital.com/blog/deep-dark-web/