Skip to content

Commit cdf3f51

Browse files
authored
Merge pull request #43 from tw4l/main
Disk Image Processor v1.1.0
2 parents 9bde211 + f35f068 commit cdf3f51

35 files changed

+9364
-1483
lines changed

.github/workflows/test.yml

Lines changed: 27 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,25 @@ name: "Test"
33
on:
44
pull_request:
55
push:
6+
7+
env:
8+
ACTIONS_ALLOW_UNSECURE_COMMANDS: True
9+
610
jobs:
711
tox:
812
name: "Test ${{ matrix.toxenv }}"
913
runs-on: "ubuntu-18.04"
1014
strategy:
1115
matrix:
1216
include:
13-
- python-version: "3.6"
14-
toxenv: "py36"
1517
- python-version: "3.7"
1618
toxenv: "py37"
1719
- python-version: "3.8"
1820
toxenv: "py38"
1921
- python-version: "3.9"
2022
toxenv: "py39"
23+
- python-version: "3.10"
24+
toxenv: "py310"
2125
steps:
2226
- name: "Check out repository"
2327
uses: "actions/checkout@v2"
@@ -27,54 +31,39 @@ jobs:
2731
python-version: "${{ matrix.python-version }}"
2832
- name: Install homebrew
2933
run: |
30-
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
31-
shell: bash
32-
- name: Install siegfried
34+
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
35+
test -d ~/.linuxbrew && eval $(~/.linuxbrew/bin/brew shellenv)
36+
test -d /home/linuxbrew/.linuxbrew && eval $(/home/linuxbrew/.linuxbrew/bin/brew shellenv)
37+
echo "eval \$($(brew --prefix)/bin/brew shellenv)" >>~/.profile
38+
echo "::add-path::/home/linuxbrew/.linuxbrew/bin"
39+
brew --version
40+
- name: Install Siegfried
3341
run: |
3442
brew install richardlehane/digipres/siegfried
35-
shell: bash
36-
- name: Install ClamAV
43+
- name: Install and configure clamav
3744
run: |
38-
sudo apt-get install clamav
39-
shell: bash
40-
- name: Install tree
41-
run: |
42-
sudo apt-get install tree
43-
shell: bash
45+
brew install clamav
4446
- name: Install disktype
4547
run: |
46-
sudo apt-get install disktype
47-
shell: bash
48+
brew install disktype
4849
- name: Install md5deep
4950
run: |
50-
sudo apt-get install -y md5deep
51-
shell: bash
51+
brew install md5deep
5252
- name: Install sleuthkit
5353
run: |
54-
git clone git://github.com/sleuthkit/sleuthkit.git
55-
cd sleuthkit && ./bootstrap && ./configure && make && sudo make install && sudo ldconfig && cd ..
56-
shell: bash
54+
brew install sleuthkit
5755
- name: Install bulk_extractor
5856
run: |
59-
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
60-
sudo apt-get update && sudo apt-get install -y \
61-
git \
62-
g++-7 \
63-
libgnome-keyring-dev \
64-
icnsutils \
65-
graphicsmagick \
66-
xz-utils \
67-
libewf-dev \
68-
libssl-dev \
69-
libsqlite3-dev \
70-
libboost-dev \
71-
libicu-dev \
72-
libtool
73-
git clone --recursive https://github.com/tw4l/bulk_extractor && cd bulk_extractor && chmod 755 bootstrap.sh && ./bootstrap.sh && ./configure && make && sudo make install && cd ..
74-
shell: bash
75-
- name: Install script
57+
brew install bulk_extractor
58+
- name: Install unhfs
59+
run: |
60+
curl -sfL -o hfsexplorer-2021.10.9-bin.zip https://github.com/unsound/hfsexplorer/releases/download/hfsexplorer-2021.10.9/hfsexplorer-2021.10.9-bin.zip
61+
mkdir /usr/share/hfsexplorer
62+
unzip hfsexplorer-2021.10.9-bin.zip -d /usr/share/hfsexplorer/
63+
chmod +x /usr/share/hfsexplorer/bin/unhfs
64+
- name: Run install script
7665
run: |
77-
sudo ./install.sh
66+
./test-install.sh
7867
shell: bash
7968
- name: "Get pip cache dir"
8069
id: "pip-cache"

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
*.pyc
44
.tox/
55
.pytest_cache/
6-
.DS_Store
6+
.DS_Store
7+
venv/

.gitmodules

Lines changed: 0 additions & 3 deletions
This file was deleted.

.pre-commit-config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
repos:
22
- repo: https://github.com/ambv/black
3-
rev: 18.9b0
3+
rev: 23.1.0
44
hooks:
55
- id: black
6+
exclude: disk_image_toolkit/dfxml/
67
args: [--safe, --quiet]
78
language_version: python3
89
# - repo: https://gitlab.com/pycqa/flake8

Objects.py

Lines changed: 0 additions & 1 deletion
This file was deleted.

README.md

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Disk Image Processor
22

33
Analyze disk images and/or create ready-to-ingest SIPs from a directory of disk images and related files.
4-
Version: 1.0.0
4+
Version: 1.1.0
55

66
## Usage
77

@@ -14,7 +14,8 @@ Underlying script: `diskimageanalyzer.py`
1414
In Analysis mode, each disk image is scanned and reported on. When complete, an "analysis.csv" file is created containing the following information for each disk image:
1515

1616
* Disk image name
17-
* File system
17+
* Volumes
18+
* File systems
1819
* Date statement
1920
* Date begin
2021
* Date end
@@ -26,7 +27,7 @@ The destination directory also contains a "reports" directory containing a sub-d
2627

2728
* A DFXML file
2829
* Text output from "disktype"
29-
* Brunnhilde reports (including logs and reports from clamAV and bulk_extractor)
30+
* Brunnhilde reports (including logs and reports from ClamAV and bulk_extractor)
3031

3132
Optionally, the destination directory may also contain a "files" directory, containing exported logical files from each recognized disk image in the source.
3233

@@ -38,11 +39,13 @@ Underlying script: `diskimageprocessor.py`
3839

3940
In Processing mode, each disk image is turned into a SIP, packaged as an ideal transfer to Archivematica's Automation tools, and reported on.
4041

41-
For disks with most file systems, `fiwalk` is used to generate DFXML and The Sleuth Kit's `tsk_recover` utility is used to carve allocated files from each disk image. Modified dates for the carved files are then restored from their recorded values in the fiwalk-generated DFXML file.
42+
From v1.1.0, Disk Image Processor will export files from multiple volumes if they are present on the disk image. In v1.0.0 and earlier, only one volume was exported depending on the first file system volume found by disktype.
4243

43-
For disks with an HFS file system, files are exported from the disk image using CLI version of HFSExplorer and DFXML is generated using the `walk_to_dfxml.py` script from the DFXML Python bindings.
44+
For most file systems, `fiwalk` is used to generate DFXML and The Sleuth Kit's `tsk_recover` utility is used to carve allocated files from each disk image. Modified dates for the carved files are then restored from their recorded values in the fiwalk-generated DFXML file.
4445

45-
For disks with a UDF file system, files are copied from the mounted disk image and `walk_to_dfxml.py` is used to generate DFXML.
46+
For HFS file systems, files are exported from the disk image using CLI version of HFSExplorer and DFXML is generated using the `walk_to_dfxml.py` script from the DFXML Python bindings.
47+
48+
For UDF file systems, files are copied from the mounted disk image and `walk_to_dfxml.py` is used to generate DFXML.
4649

4750
When complete, a "description.csv" spreadsheet is created containing some pre-populated archival description:
4851
* Date statement
@@ -59,9 +62,9 @@ By default, the "objects" directory in each SIP contains both a copy of a raw di
5962

6063
The "metadata/submissionDocumentation" directory in each SIP contains:
6164

62-
* A DFXML file
65+
* One or more DFXML files
6366
* Text output from "disktype"
64-
* Brunnhilde reports (including logs and reports from clamAV and, optionally, bulk_extractor)
67+
* Brunnhilde reports (including logs and reports from ClamAV and, optionally, bulk_extractor)
6568

6669
### Process a single disk image, providing options to tsk_recover (CLI only)
6770

@@ -104,24 +107,19 @@ Disk Image Processor recognizes which files are disk images by their file extens
104107

105108
## Installation and dependencies
106109

107-
This utility is designed for easy use in BitCurator v1.8.0+. All dependencies should already be installed in new releases of BitCurator. Installation outside of BitCurator is possible but difficult, with many dependencies, including Python3, PyQt5, TSK, Bulk Extractor, and the DFXML Python bindings. You will likely also need to modify some hardcoded paths in `main.py` and the processing scripts.
110+
This utility is designed for easy use in BitCurator versions 2-4. All dependencies should already be installed in new releases of BitCurator. Installation outside of BitCurator is possible but difficult, with many dependencies, including Python3, PyQt5, TSK, HFS Explorer, md5deep, and Bulk Extractor.
108111

109112
### Install as part of CCA Tools
110113

111114
Install all of the CCA Tools together using the installation script included in the [CCA Tools repo](https://github.com/CCA-Public/cca-tools).
112115

113116
### Install as a separate utility
117+
114118
* Install [PyQt5](https://www.riverbankcomputing.com/software/pyqt/download5):
115-
`sudo pip3 install pyqt5`
119+
`sudo pip3 install pyqt5`
116120
* Clone this repo to your local machine.
117-
* Make install script executable (may need to be run with sudo privileges):
118-
`chmod u+x install.sh`
119-
* Run the install script with sudo privileges:
120-
`sudo ./install.sh`
121-
122-
### PyQt4 version
123-
124-
Please note that Disk Image Processor v1.0.0 uses PyQt5. Installation of PyQt5 may cause issues with existing PyQt4 programs in BitCurator. For the a PyQt4 version of the Disk Image Processor that will not affect the functionality of other tools, see the [0.7.3 release](https://github.com/CCA-Public/diskimageprocessor/releases/tag/v0.7.3).
121+
* Run the install script with sudo privileges (assuming BitCurator 4; for BitCurator 2-3 run `./install-bc2-ubuntu18.sh` instead):
122+
`sudo ./install.sh`
125123

126124
## Credit
127125

deps/dfxml

Lines changed: 0 additions & 1 deletion
This file was deleted.

dfxml.py

Lines changed: 0 additions & 1 deletion
This file was deleted.

disk_image_toolkit/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .disk_image import DiskImage
2+
3+
__all__ = ["DiskImage"]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
"""
2+
Python modules copied from from Simson Garfinkel's DFXML project, copied here
3+
to aid in the longevity of the codebase.
4+
5+
Objects.py has been renamed to objects.py and some imports are modified,
6+
otherwise the included files are unchanged from source.
7+
8+
Source:
9+
https://github.com/dfxml-working-group/dfxml_python
10+
11+
Many thanks to Simson, Alex Nelson, and all those who work on DFXML.
12+
13+
License:
14+
# This software was developed at the National Institute of Standards
15+
# and Technology in whole or in part by employees of the Federal
16+
# Government in the course of their official duties. Pursuant to
17+
# title 17 Section 105 of the United States Code portions of this
18+
# software authored by NIST employees are not subject to copyright
19+
# protection and are in the public domain. For portions not authored
20+
# by NIST employees, NIST has been granted unlimited rights. NIST
21+
# assumes no responsibility whatsoever for its use by other parties,
22+
# and makes no guarantees, expressed or implied, about its quality,
23+
# reliability, or any other characteristic.
24+
#
25+
# We would appreciate acknowledgement if the software is used.
26+
"""

0 commit comments

Comments
 (0)