By Bram Cappers
Many computer viruses try to infiltrate, then expand and lastly sabotage the attacked system, e.g. by spying or disrupting the attacked system. Can we find such viruses by looking into the packets sent over the network? the typical network analysis looks into bytes, or by extracting the flow from e.g. the IP-addresses. To gain more insights, one should look at the semantic level: what do messages mean? How to obtain this? Wireshark is a nice tool to do this, as it analyzes also what part of the message has a value. Machine learning already allows to find possible deviations. So, why visualization? It turns out that there are too many possible deviations. As a human has two eyes, visualization could help in finding out which ones are actual threats. In additiion, one can use the domain knowledge of a human, which still outperforms machine learning: finding a context is something we (still) do better! So, for can’t we use the human to decide context? And to do so, visualization helps!
There are different strategies to do so: (1) data-driven: what does the data want to be, (2) alert-driven: what does the machine say; and (3) knowledge driven: adding the context. For the first two, visalizations help to see patterns like heart beats or man-in-the-middle attacks.
Addressing a single attribute is most of the time not sufficient for analyzing attacks. Hence, one needs to look at multivariate event data. Their case study considered a telecom operator that does VoIP, as there is a lot of fraud happening in this area. For example, a user gets hacked and suddenly has to pay for phone calls to obscure countries and numbers, which gets very expensive very quickly. The question was, can be detect malicious phone calls? How does a phone call look like?
They came up with a system containing rules, aggregrations and selections. Each call should have a start and end. For this, they created a rule language to classify data, e.g. when is it a Call (start), or a Bye (end). This way, we can use regular expressions and linear time logic to check rules like “every call is eventually followed by a Bye”. This is all nice, but there are still many variations. Hence, they looked into selection mechanisms to find patterns.
Their tool EventPad implements this idea. First one can look into the different sequences, check each event and create rules, and color all events that satisfy that rule. In fact, one create their own similarity rules that then can be used by classical data mining and machine learning techniques. In this way, one can create rules for wanted behavior, hide these and inspect exceptions. Interesting question: how can we use this to report about the wanted behavior? Can such processes be automated once settled? And, writing regular expressions is not simple, so how can you support users in creating those?