Fork me on GitHub

Architecture of HSN1

Component architecture

Honey Spider Network consists of multiple components that needs to interact with each other. On overview of these components is explained in the following visualized component architecture.

HSN 1 components

Logical layer architecture

On an abstract level, all the components of the system can be assigned to five different layers:

  • Import Layer

  • Filtering Layer

  • Analysis Layer

  • Management Layer

  • Presentation Layer.

HSN 1 logical layers

Import Layer

The import layer is responsible for the initial acquisition of objects that are to be processed by the system. These imported "objects" are URLs but it is expected that in the future these may be also files of various type. URLs are reported to the system by various means, such as spam boxes, agents that parse proxy logs, external lists of URLs published on web pages, search engine queries, web forms, files etc. Each source is assigned priority and confidence levels. A priority presents the importance of an individual object (URL) compared against other objects and influences how quickly an object is processed by the system. A confidence level represents the trust placed in the source that entered the object. URLs with a high confidence level are less likely to be marked by the system as benign without extensive testing. An important concept is the idea of a Contracted URL. These are special URLs that will always be checked extensively on a periodic basis, regardless of the number of URLs and their priority levels entered into the system from other sources.

Filtering Layer

Filtering in the system primarily involves the maintenance of black, grey and white lists. The whitelist filter contains URLs (file hashes in the future as well) that have been deemed benign, and are not to be inspected for a specific period. The blacklist contains URLs that have been checked as malicious and are therefore not needed to be checked again for a specific period. A greylist contains URLs that have been identified as suspicious and are not needed to be checked again for a specific period. In all three cases, these periods can be defined permanently or in the form of a TTL.

A hit count is maintained for each entry in the white/black/grey lists. This reflects how often an object entry has been submitted. A high hit count for an object may shorten its TTL parameter and increase priority levels for processing. The rationale behind this is that we want to have as current information about the status of an object as possible.

Each list is actively monitored through DNS queries, which are intended to track IP changes to domain names. This is an additional feature that could help to track fast-flux domains in the future, through custom made scripts that could interface with the system.

Analysis Layer

The analysis layer is the layer where the low interaction and high interaction honeypots operate. It is responsible for the detection of suspect and malicious sites, exploits used and malware downloaded. The primary elements that make up this layer are the low interaction component, high interaction component, external analysis module and a URL localizer.

The low interaction component consists of the Heritrix crawler, with enhanced JavaScript support through Rhino, implemented as a separate process. All web queries are sent through a proxy which is Squid with appllications that offers antivirus detection through ClamAV. All queries are also monitored by Snort.

For most cases, it is unlikely that the low interaction component will obtain actual malicious code. However it is feasible to detect both obfuscated code and malicious JavaScript used to launch an exploit. Therefore both suspect and malicious sites may be identified. Note that the fact code is obfuscated does not mean it is malicious. Many sites simply obfuscate code to hinder any copy attempts.

The high interaction component is responsible for the in-depth analysis of submitted objects. This in-depth analysis is necessary for the detection of previously unknown exploits and malware. The down-side of such analysis is that the process is expected to be much slower than in the case of low interaction component. Capture-HPC is the tool of choice for this component, with the necessity of extending it to cooperate with our framework. It allows the automated "driving" of various browsers in a virtual environment that runs a real operating system. It is capable of monitoring the file system, registry and processes on a kernel level of the honeypot system, as well as log all network traffic. This network traffic will be automatically analyzed by a virusscanner and by Snort. For our purpose the high interaction component requires some modifications, that will allow integration with our framework and the addition of a "suspicious" state.

Both the low interaction and high interaction components have their local processing queues, maintained by their local managers. Once a URL has been processed, it is submitted back to the central processing queue, where further decisions are made as to how a URL should be processed.

Whenever a suspect or malicious URL or file has been identified, it may be submitted to an external third-party site for further analysis and correlation of results. This could include: VirusTotal, Anubis, CW Sandbox and stopbadware.org.

Management Layer

The Management Layer controls the flow of objects in the system. It is responsible for tasks such as maintaining the configuration of the system, controlling the import process, management of the object (URL) queues, scheduling, allocation of objects to internal and external processing units (such as the low interaction and high interaction components), idle time management, fault monitoring and the final object classification.

Every object processed by the system has meta-data associated with it, which is assigned by the Management layer of the HoneySpider system. An object tag contains the following information:

  • A Confidence level

  • A Priority Level

  • An Process classification

  • An Alert classification

An Alert classification is the final classification for a given URL after processing by different components. It can be one of four different classes: NOT_ACTIVE, BENIGN, SUSPICIOUS, MALICIOUS.

Malicious

The analysis of an object determines that an exploit and/or malware is used to compromise a system and/or applications.

Suspicious

The analysis of an object indicates that an exploit and/or malware may be used to compromise a system and/or applications. Or a detected obfuscated script that can be used for redirection purposes.

Benign

The analysis of an object determines that no exploit and/or malware is used to compromise a system and/or application.

Not active

The object — in this case URL — is either unreachable or does not resolve any more.

A future role for the management layer is the refinement and distribution of signatures from the high interaction components to the low interaction components.

Presentation layer

The presentation layer allows access to information concerning the processed objects and for interaction between the user and the system through a GUI. By Milestone 4 it will also allow automatic generation of PDF reports of through the Reporter component, and the generation of alarms through the Alerter component. It will also provide an API that can be used by external system to plug in and retrieve the most important information in an automated manner.