Using programmed execution as a scientific principle for examining questioned digital documents



Citation information

M. S. Olivier. “Using programmed execution as a scientific principle for examining questioned digital documents”. In: Current Problems of Forensic Expertology, Criminalistics and Criminal Procedure. Invited presentation. Kyiv Scientific Research Institute of Forensic Expertise, Ministry of Justice of Ukraine, Dec. 2021, pp. 27–31


The US National Academy of Science report on forensic science in 2009 noted that “the examination of digital evidence [is still treated] as an investigative rather than a forensic activity” [2, p.181]. This remains true in 2021. The work reported here is part of an attempt to move digital forensics closer to other forensic science disciplines and clearly distinguish between investigations (as typically conducted by law enforcement) and forensic examination of possible evidence. The mutual reliance of the two activities on one another is not denied; however, there is a clear difference between the tasks of law enforcement (to find the guilty) and forensic science (to make scientifically justified claims about possible evidence, whether such claims are to be used in criminal or civil cases). We have already addressed the scientific maturity of digital forensics in several earlier publications [6,7,9].

More specifically, the current work is part of work on the newly proposed field of questioned digital documents, which is comparable to (but at the same time very different from) classical questioned document (also known as forensic document examination).

In their seminal work on the answers that forensic science can provide, Inman and Rudin [4,5] propose a number of processes. The work of Inman and Rudin is based on properties of divisible matter: when physical matter is divided, such division enables one, for example, to infer that two parts of evidence form part of a greater whole that was divided, based on the nature of the break patterns on the parts. In digital forensic science, the notion of ‘matter’ is not useful, and hence divisible matter have no meaning. The application of the work of Inman and Rudin to the digital context was first proposed by Pollitt [8]. However, Pollitt does not clearly distinguish between forensic and investigative processes and does not posit a principle as a substitute for Inman end Rudin’s principle of divisible matter.

In this presentation we argue that a notion of programmed execution is more appropriate for use in digital forensics; this notion is based on the knowledge that a computer program (when supplied with the same inputs where appropriate) will, under normal conditions, produce the same output. This is also the basis of the work done by Gladyshev [3] to provide a scientific basis for forensic science. However, his work did not provide a practical mechanism to examine evidentiary artefacts.

The processes described by Inman and Rudin developed over time, as evidenced by their publications on the topic. We therefore use their notions somewhat loosely. We are also informed by other relevant notions that are well established in forensic science, such as genotypes and phenotypes (or classes and instances of such classes).

Inman and Rudin effectively propose five processes that define five sets of questions; they suggest that these are the questions that can be answered by forensic science.

Their first process is Identification. This is related to the basic question: What is it? In some of their work, they use this question in a generic sense, while other work uses it in a much more specific sense — essentially to mean the identification of the phenotype. Based on some of the examples of identification provided by Inman and Rudin, this paper follows a more generic approach. A file, which, when opened with an application that is associated with it, may cause some contents to be output (where output may mean displayed on a screen, printed, or output in the form of sound or any other sensory mechanism). The immediate purpose is to identify a document that may have probative value in a case. More specifically, the question would typically be whether such a document warrants further examination. However, such content may be sufficient to conclude that the file contains the content and may, as suggested by Inman and Rudin, be sufficient to prove certain facts. One example is where the content is digital contraband, the presence of the content may be sufficient to make the case that the person who possesses such a file, is in possession of digital contraband. This is subject to a number of caveats, such as opening the file with a known application, that is known to render content correctly. This includes a vast array of applications that can easily be demonstrated to have a large user base, such as digital music players. The user base would be sufficient to indicate the purpose of the application and the extent to which it meets its objective. No deeper analysis of the precise operation of the application would normally be required, if content of a document is the primary concern. Note that the application creates a (justified) expectation that it executes consistently and therefore the finding has probative value. The described mechanism is not the only manner in which potential evidence can be identified, but space constraints do not allow for further elaboration here.

The second process is not suggested as an end in itself by Inman and Rudin, but may be inferred to be an end in itself from the broader context in forensic science. This is the process of classification. Inman and Rudin refers to a class as a set of artefacts that share a common origin. This is in line with Cohen’s [1] description of toolmarks caused by various tools that are visible in the digital objects they create. He uses this as a mechanism to ‘attribute’ an object to the tool that created it. This is consistent with the typical logic that underlies the distinction between genotype and phenotype alluded to earlier. The tool that creates an object follows a specific program to create it and is bound to leave the same ‘marks’ in all objects created by it. In this context these toolmarks are aspects of a document that are not fully specified by the standard that governs the type of document. Examples include composition of the document on a low level. Different subparts of a document may be ordered in different valid sequences internally. Where textual components are used in the internal representation of the document (rather than as content of the document), standards may allow for different spacing mechanisms, and tools will select their own interpretation of the standard. However, any given tool will consistently make the same choices, in the manner in which it was programmed. Note that not all document types can be attributed to creators in this manner, and tools often share common code to create a document, which may have an impact on the examiner’s ability to attribute the document to a specific tool. Note that documents often contain metadata about their origins, but such metadata is usually not considered trustworthy.

The third process defined by Inman and Rudin is individualisation. The question is ‘Which one is it?’ Stated differently, this requires the document to distinguish between phenotypes. In general, it is impossible to reliably identify a “unique” digital document due to the ease with which identical copies can be made. The wear and tear of, say, typewriters used to create documents in the past tended to be unique. This is simply not the case in the digital realm. Documents created on different devices, using the same software are indistinguishable from one another.

Digital signatures are a mechanism that may identify a unique document (even if multiple copies of such a unique document exists). However, in general, individualisation is not possible.

Inman and Rudin define the next process, association, as a transfer of evidence to some target. A classical physical example is a car accident where paint from one vehicle is transferred to another. The ‘donor’s’ paint may be classified or individualised to (often) determine its make (or even the specific vehicle). This associates the two vehicles. Transfer of aspects of a document that comes into contact with some digital object does occur. Browsers and document viewers often retain a history of documents viewed with such a tool. Document editors may, similarly, keep some history (including the creation of backup files). Editors may even leave traces of the fact that they have been used to edit the document. However, often the object will be recreated by the editor and therefore toolmarks from the original creator will be lost — preventing association between these two tools. Inferences based on such traces are subject to many caveats, and should be thoroughly considered on a case-by-case basis. A few types of documents (such as email) routinely include copies of earlier versions of the conversation.

Where editors leave toolmarks in documents that retain some of their earlier toolmarks, this is a consequence of the manner in which the editor has been built —– or programmed. Hence, the principle of programmed execution is again useful. However, this is one example where a digital principle of divisible ‘matter’ may be useful. The example of email such an example, where later versions of the conversation enclose or include earlier versions, and serve as a basis for association between the correspondents. Similarly, two documents found in the same archive (such as a zip or tar file) are associated; however, the examiner should be careful about how such association is to be interpreted. Such association may be useful when the age of a digital document needs to be determined, and the age of one or more documents included in the archive are known. This notion can be expanded to any situations where digital documents are found in the same container, such as on the same disc. A lack of space prevents us from exploring the caveats of reading too much into such association here.

The final process is reconstruction. This is intended to answer questions about the how and (in relative terms) when of an event. This is where a programmed execution principle comes into its own: Repeating a programmed process (with the same inputs) should recreate identical outputs. This also applies to a sequence of processes. Often the inputs used will be apparent from the document itself. However, even if that is not the case, recreation using arbitrary inputs will often be sufficient to confirm that inferences made about identification, classification, association and even individualisation are consistent with what has been reconstructed.

While the model by Inman and Rudin attempts to cover the entire field of forensics in terms of divisible matter, it does not follow that the model (in an adapted form) entirely covers forensic digital document examination as a subfield of digital forensic science. It does, however, provide a good starting point and is rooted in a single principle (or small set of principles) that can be scientifically justified. We argue that such subfields should grow from a nucleus of questions that can be answered based on scientific principles, and then possibly be expanded on by adding other questions that can be answered by the same set of principles. The addition of selected principles should be done carefully, and only happen where they support forensic examination, and not for the sake of criminal or other investigations.

  1. F. Cohen. Digital Forensic Evidence Examination. Fred Cohen & Associates, fourth edition, 2013.

  2. Committee on Identifying the Needs of the Forensic Science Community, Committee on Science, Technology, and Law Policy and Global Affairs, and Committee on Applied and Theoretical Statistics, Division on Engineering and Physical Sciences. Strengthening forensic science in the United States: A path forward. Technical report, National Academy of Sciences, 2009.

  3. Gladyshev. Formalising Event Reconstruction in Digital Investigations. PhD thesis, University College Dublin, 2004.

  4. K. Inman and N. Rudin. Principles and Practice of Criminalistics. CRC, 2000.

  5. K. Inman and N. Rudin. The origin of evidence. Forensic Science International, 126(1):11–16, 2002.

  6. M. S. Olivier. Towards a digital forensic science. In H. S. Venter, M. Loock, M. Coetzee, M. M. Eloff, and S. Flowerday, editors, Information Security for South Africa (ISSA). IEEE, Aug. 2015.

  7. M. S. Olivier and S. Gruner. On the scientific maturity of digital forensics research. In G. Peterson and S. Shenoi, editors, Advances in Digital Forensics IX, IFIP Advances in Information and Communication Technology —– Advances in Digital Forensics, pages 33—49. Springer, 2013.

  8. M. Pollitt. Applying traditional forensic taxonomy to digital forensics. In I. Ray and S. Shenoi, editors, Advances in Digital Forensics IV, pages 17—26. Springer, 2008.

  9. S. Tewelde, M. S. Olivier, and S. Gruner. Notions of “hypothesis” in digital forensics. In G. Peterson and S. Shenoi, editors, Advances in Digital Forensics XI, volume 462 of IFIP Advances in Information and Communication Technology, pages 29-–43. Springer, 2015.

BibTeX reference

author={Martin S Olivier},
title={Using programmed execution as a scientific principle for examining questioned digital documents},
booktitle={Current Problems of Forensic Expertology, Criminalistics and Criminal Procedure},
publisher={Kyiv Scientific Research Institute of Forensic Expertise, Ministry of Justice of Ukraine},
note={Invited presentation} )

Beta version of new bibliography database; please report errors (or copyright violations) that may have slipped in.