CN Group

Publications:

If your are using any of the material below please cite the corresponding publication:

F. Iglesias and T. Zseby, Pattern Discovery in Internet Background Radiation, in IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1-1..

Description:

The Internet Background Radiation is formed by unidirectional, undesired traffic that arrives in unresponsive wide network address ranges with no hosts assigned (also called "darkspaces").

The AGM vector is a format to characterize network traffic sources and/or destinations based on the collection of aggregated and mode values of principal IP header values during a pre-fixed time interval (e.g., 1 hour). The description of the AGM format and a classification of the IBR based on the analysis of AGM vectors are provided below.

Datasets, experiments:

We have analyzed the PT dataset by mapping sources arriving to the IBR with the AGM format. The PT dataset is a portion of the IBR corresponding to six months (January to June) of 2012, collected at the UCSD Network Telescope (CAIDA). Time series of aggregated AGM classes (sources and packet volumes) can be explored here:

Jul 2015 - Jan2Jun_dksp-2012, online visualization of time series
Jul 2015 - Jan2Jun_dksp-2012, PT dataset

Additional information:

--- AGM format ---

AGM vectors are formed by aggregations and mode values of seven principal IP and TCP header values, i.e. IP destination, Source Port, Destination Port, Protocol, TTL, TCP Flags and packet Length. It can be used to characterize sources or destinations whose traffic crosses an observation point. The activity of sources (or destinations) under analysis is observed during a fixed time interval (e.g. one hour).

AGM vector format (sources)

srcIP_i =
#dstIP, M(dstIP), #pkts[M(dstIP)],
#srcPort, M(srcPort), #pkts[M(srcPort)],
#dstPort, M(dstPort), #pkts[M(dstPort)],
#Protocol, M(Protocol), #pkts[M(Protocol)],
#TTL, M(TTL), #pkts[M(TTL)],
#flag, M(flag), #pkts[M(flag)],
#length, M(length), #pkts[M(length)],
#pkts

'#': "number of", 'M(...)': statistical mode, ‘pkts’: "packets"
Example: "#pkts[M(dstIP)]" is the number of packets that source 'i' sends to its main aimed IP destination during the observed 1-hour interval.

--- IBR Source Classification (PT dataset) ---

TCP445 (Sources: 12.2%, Packets: 37.0%)
Sources sending SYN packets to TCP Port 445. Related to machines infected with SMB-addressed stealth malware and worms — like Conficker — as well as attackers directly probing or scanning the network to exploit SMB (Server Message Block) vulnerabilities.

TCP3389 (Sources: 2.2%, Packets: 3.6%)
Sources sending SYN packets to TCP Port 3389. This class is related to attack attempts to the RDP (Remote Desktop Protocol); it is used by some malwares — like the Morto worm — also a classic aim for information disclosure, dictionary, brute force, DoS and man-in-the-middle (MITM) attacks.

UDP30 (Sources: 54.5%, Packets: 1.4%)
Sources sending similar looking UDP packets with 30-byte payload. It identifies mainly traffic originated from a bug in Qihoo 360 Safe products (recently discovered and started to be fixed around Dec. 2015).

UDP10320 (Sources: 0.0%, Packets: 2.6%)
Sources sending packets to UDP port 10320. This pattern identified mostly RDP (Real-Time Transport Protocol) traffic, specificaly few sources constantly sending many UDP packets. It consisted on misconfigured audio transfers aimed at services of a big audio data platform.

Uniform (Sources: 15.9%, Packets: 0.9%)
Sources sending traffic with uniform, constant characteristics: one destination, one source port, one destination port and one protocol. Traffic mainly consisted on misconfigurations, but also worm activity, DoS and DDoS to darkspace addresses. It contained important quantities of traffic related to file sharing protocols (e.g., BitTorrent, eMule, QQLive, Gnutella), gaming platforms (e.g., Garena, Steam, XBOX), DNS resolution, audio and video (e.g., RTCP, RTP, SIP, Skype) and other access and transition services (e.g., Teredo, LinkProof, NetBios).

Variable (Sources: 15.2%, Packets: 54.5%)
Remaining sources with characteristics that do not match the previous classes. Mainly different kinds of scanning, amplification search and backscatter.

Analysis of the IBR based on AGM vectors