Rechercher
Contactez-nous Suivez-nous sur Twitter En francais English Language
 











Freely subscribe to our NEWSLETTER

Newsletter FR

Newsletter EN

Vulnérabilités

Unsubscribe

Andrew Sheldon, Evidence Talks : How robust are the techniques behind the growth of digital forensic evidence in securing a conviction?

February 2016 by Andrew Sheldon, Chief Technical Officer of Evidence Talks, puts the matter beyond reasonable doub

When it comes to securing a conviction the increasing use of digital forensics in the rapid identification, analysis and presentation of evidence has enabled a step change in the efficiency of investigation teams in Government, police forces, the legal profession and civil investigators across the globe.

In addition, it has contributed to front-line police officers regaining control of cases, reducing workload and backlogs by being able to process evidence without recourse to specialist hi-tech units.

So any questions about the validity and reliability of the processes, particularly in identification, verification and authentication of file data , are matters of great concern. We recently received a specific query about the hashing process, which is a routine element during acquisition and analysis of the evidence, during verification of the forensic image and again at the end of the examination, to ensure the integrity of the data and forensic processing.

First, let’s set the context in which we work, that is the legal arbiter of reasonable doubt, and here I quote from the Institute of Criminology at Cambridge University.

“Legal academics and judges have expressed that the undefined version of BRD ("the defendant is presumed innocent unless the prosecution has proved guilt beyond a reasonable doubt") is difficult for jurors to understand. As a result several jurisdictions in the Anglo-American legal system have proposed other wordings with a view to aiding jurors’ understanding. In England and Wales for instance the Legal Studies Board advocates the wording, "The defendant is presumed innocent unless the prosecution has proved guilt beyond a reasonable doubt. Proof beyond reasonable doubt is proof that makes you sure of the defendant’s guilt".

The query we received addressed the hashing algorithms we use, and specifically whether we would be better adopting a single protocol, in this case SHA (secure hash algorithm) 256.

Making it clear that we draw a distinction between the use of MD5 and SHA1 algorithms in this type of application as distinct from cryptographic functions such as those used to validate online security, I can say with confidence that, while SHA256 is good, the MD5 and SHA1 algorithms remain perfectly valid and are essential for use in forensic casework.

This is true even when we consider the possibility of MD5 and SHA1 ‘collisions’, that is to say what happens if two specially modified data inputs are hashed and the MD5 or SHA1 of both files is the same?

Research conducted by prof. Xiaoyun Wang and her co-authors in 2004 showed that it was possible to create two files with different content that produced the same MD5 value. The implications of this possibility quickly lead to some debate in the forensic community.

One common interpretation was that MD5 could no longer be trusted because an analyst might wrongly identify an innocent file as a known file (the identification issue) or deliberately modify a file and change its hash value back to the original (the verification issue). Another hypothesis was that a suspect could make all their bad files have the hash values of known system files, thereby avoiding detection.

While theoretically possible it is practically very hard to achieve an MD5 hash collision and requires serious computational time for files larger than a few hundred bytes.

According to Stephens et al “It is important to note that the hash value shared by the two different files is a result of the collision construction process. We cannot target a given hash value, and produce a (meaningful) input bit string hashing to that given value. In cryptographic terms: our attack is an attack on collision resistance, not on preimage or second preimage resistance. This implies that both colliding files have to be specially prepared by the attacker…. Existing files with a known hash that have not been prepared in this way are not vulnerable.”

So, if you only use a single hash algorithm, you’re better off with SHA256. However, there are significant advantages if you use both MD5 and SHA1 instead of SHA256.

Let’s do the Maths. An MD5 hash is 128 bits wide and therefore the probability of two files having the same MD5 is 1/(2^128). Put another way, the probability of finding two files with the same MD5 is once in just over 3x10-39. That’s once in 340 Billion, Billion, Billion, Billion comparisons.

By contrast a SHA1 hash is 160 bits wide and so the probabilities increase to a mind-boggling once in every 6.8 x 10-49 comparisons.

Finally we can look at the approach Evidence Talks takes with SPEKTOR. Our system uses both MD5 and SHA1 rather than just one or the other, so it’s worth examining the Maths to explain why and to discuss the level of certainty it delivers, as opposed to opting for just one, stronger hash method such as SHA256. By using both algorithms we produce outcomes many times greater than either of the two protocols individually.

The full probability calculation for finding two files with different content but matching MD5 and SHA1 hash values is (2^288!)/(2^288^(2^29) * (2^288 - 2^29)!). This is a probability so small as to virtually disappear. Its representation as a power of 10 is as follows.

Expressing that verbally is impossible but just the first part of the formula 2^288 results in an 87 digit number which approximately equates to ((5 x 106) x the number of atoms in the visible universe!

Having dealt conclusively with the mathematics, let me move on to some of the logic for approach. Perhaps most compelling is the fact that although there are no national or international standards which require SHA256 in digital forensics, its use instead of MD5/SHA1 would immediately render all global child sexual exploitation image databases, which use MD5 and SHA1 values, unusable. Furthermore MD5 and SHA1 are still used and accepted by every law enforcement authority worldwide to perform the three core forensic functions, that’s to say:-

• Identifying known indecent images
• Excluding known good files such as those that appear on the NRSL hash keeper lis
• Verifying that files have not been changed.

In summary I believe the case for a combined MD5/SHA1 solution in Forensic Validation and Identification as distinct from cryptographic applications is overwhelming, supported both by the mathematical evidence and the protocols of global agencies charged with the protection and welfare of children. I’m certainly not against using SHA256 and we’ll probably add it to SPEKTOR at some point but, for today and in future, our current hashing method is comprehensively more accurate and secure than just using SHA256.


[1] http://www.win.tue.nl/hashclash/SoftIntCodeSign/


See previous articles

    

See next articles












Your podcast Here

New, you can have your Podcast here. Contact us for more information ask:
Marc Brami
Phone: +33 1 40 92 05 55
Mail: ipsimp@free.fr

All new podcasts