The ever-increasing cyber threat and a drive towards greater cost savings has led to the idea of using an all-encrypted network (AEN). With the AEN approach, encryption is used to separate different groups of users from each other, including management of the network, in a manner that allows them to share the same hardware without having to trust each other.
If the end-user equipment can be trusted then ‘end-to-end’ encryption may be used to ensure only pre-defined users or equipment can receive the required information. If it is feared that end-user equipment may be compromised, then specialist equipment carefully managed at the network level may provide trusted encryption. This has the advantage of keeping tight control of the critical security enforcing function (the encryption device functionality) but at the cost of coarser security, based on users belonging to a security domain rather than being treated as individuals.
Whichever approach is used, sophisticated key-management is required to safely control and manage encryption keys and certificates. This article takes a critical look at what AENs can really achieve and the dangers lurking beneath the surface.
There are three main security pillars for information assurance, namely:
- Confidentiality: The property that information is not made available or disclosed to unauthorised individuals, entities or processes.
- Integrity: The property of safeguarding the accuracy and completeness of assets - this may include the ability to prove an action or event has taken place, such that it cannot be repudiated later.
- Availability: The property of being accessible and usable upon demand by an authorised entity.
Of these, the most surprising and concerning results come from looking at confidentiality, which is the focus of this article.
Confidentiality
Confidentiality would seem to be perfectly protected by encryption, provided the encryption is not cracked. At present, the internet uses public key encryption based on the mathematics of factorising numbers to safely manage and distribute keys and certificates. In the future, quantum computers are expected to be able to break such systems; so there is a great deal of attention being paid to post-quantum algorithms which may be resistant to such attacks. However, there is a more basic attack available to hackers today - traffic analysis.
Traffic analysis
This is the process of intercepting and examining messages in order to deduce information from patterns in communication, which can be performed even when the messages are encrypted. Developments in artificial intelligence have led to ever more sophisticated techniques for extracting information from unlikely sources.
Prior to the success at Bletchley Park in decrypting WWII cypher-text, traffic analysis was being used to identify units, intentions and the order of battle by means of direction finding, radio signatures and call signs. Today, credit card companies increasingly use patterns of spending in transaction data to spot fraudulent spending. Acoustic emanations from keyboards are infamous as a method of extracting the text being typed and somewhat amusingly may throw in some spelling correction. Diffie and Landau, in their book on wiretapping, even went so far as to say that traffic analysis, not cryptanalysis, is the backbone of communications intelligence. So, what can traffic analysis do in a modern network?
There are many potential approaches to traffic analysis which help elicit information from a network. Table 1 shows some examples of the information used, attacks that can be performed and defences that may be mounted.
Traffic information | Attack | Defence |
---|---|---|
Packet Size | Finger-printing web-sites | Modify packet size |
Packet Timing | Application/OS identification | Modify flow size |
Packet rates / counting | Source / destination identification | Modify web object size / number |
IP addresses | Identify language | Change timing |
DiffServ codepoint | Reveal network topology | Hide non-critical header information |
Communication flow patterns and direction | Infer content | Dispersed routing |
Table 1: Example analysis considerations
Dangers
The dangers are best highlighted by referring to some of the work done to-date. Work done by Muehlstein et al, looked at 30 combinations of operating system, browser and application. By far, the majority of the combinations are correctly identified over 90% of the time from encrypted traffic. In this case Windows, Ubuntu and OSX were the operating systems combined with Chrome, Firefox, Safari and IExplorer as browsers and Facebook, Twitter, Google background, YouTube and Microsoft background (and some unknown) applications.
This is the sort of information that helps fingerprint a system for cyber-attack and enriches the identification of patterns of behaviour which may reveal intentions. Such pattern identification, can, and is, used for beneficial purposes on an AEN by spotting malware, or the effects of malware, within the encrypted stream. These approaches do reveal information but are not so clearly linked to an individual. What about inferring the content itself?
Language classification in messaging has been demonstrated with a high degree of accuracy after only 50 text packets. However, things become much worse with modern voice over IP (VoIP). A modern VoIP network may well use silence suppression (not sending silences) or variable compression packets (the compression ratio changes with the nature of the sound).
By analysing gaps as small as 10ms in a speech segment, or observing the change in packet length with a variable compression rate vocoder, the likely sounds which produced the encrypted packet may be reconstructed. This is particularly true of formal, carefully enunciated speech as may be used when spelling out passwords, call signs or grid co-ordinates.
An example, from White et al, shows how effective this approach is, despite the apparently low scores.
Reconstructed voice -------------------------------------------------------- Actual voice |
METEOR Score |
---|---|
Cliff was soothed by a luxurious massage ----------------------------------------------------------- Cliff was soothed by the luxurious massage |
0.78 |
Is not except to created illuminated examples ----------------------------------------------------------- It’s not easy to create illuminating examples |
0.53 |
That you headache ----------------------------------------------------------- That's your headache |
0.18 |
Table 2: Inferred speech (White et al)
Countermeasures
There are a variety of counter-measures that can be deployed but they are not standardised. Commonly mentioned approaches and those which can be implemented on standard commercial equipment are:
- Padding;
- Tunnelling mode with network IP encryptors;
- Address translation;
- WAN acceleration;
- Enforcing the use of constant bit rate vocoders without silence suppression for VoIP (and similarly for video codecs).
Padding disguises the length of the packets and can range from simply standardising to certain fixed block sizes, to trying to mislead the eavesdropper by mimicking the traffic characteristics of different traffic. In practice, padding is of limited value unless it is extensive, which may cause congestion delays and packet loss.
IP packet encryptors can work in two modes: ‘transport’ and ‘tunnelling’. The tunnelling mode hides as much of the original header as is practical to minimise information leakage. This is efficient when IP packets are long, but again produces significant overheads on small packets such as TCP acknowledgements, voice packets and typical SNMP packets. Address translation also indirectly provides rudimentary hiding of the source and destination.
WAN acceleration on the plain-text side can potentially provide large benefits if there is significant redundancy in the network as a large amount of traffic will be represented by tokens. However, as compression also provides information about the content (as seen in voice traffic) the precise approach used by the WAN accelerator will make a difference and such accelerators invariably use a bespoke approach.
Finally, the use of constant bit rate vocoders without silence suppression stops the use of packet length or packet existence from being used to extract content. However, as codecs are generally controlled by the applications and subject to change when upgrading, enforcing particular codecs may not be as simple as it sounds. If silence suppression cannot be switched off then using a vocoder which samples longer samples of speech and therefore will have fewer silences (very short silences are eliminated) helps. Such an approach will introduce voice latency.
None of these techniques on their own will be completely effective in hiding the larger scale traffic patterns and flows that may be revealed by modern AI techniques, and hiding such patterns comes with a significant cost.
Summary
All encrypted networks are a powerful defensive weapon against cyber-attack; however, without careful design, it can give a false sense of security. A trade-off is necessary between measures to disguise traffic characteristics and causing congestion, delay and loss in a network. Nevertheless, steps can be taken to improve security, even using normal commercial equipment. As deep learning and big data AI techniques advance, traffic analysis will become easier and more effective: cyber-defence design and cyber-professionals must be up to the challenge in response.