Merge pull request #47 from Evian-Zhang/docs

wtdcode · web-flow · commit 9185bc9cfbaa · 2025-06-10T08:00:07.000+08:00
Add docs
diff --git a/Readme.md b/Readme.md
@@ -11,15 +11,25 @@ Starting from v3.0.0, unicornafl is fully rewritten with `libafl_targets` in Rus
 To use `unicornafl` as a library, just add this to your `Cargo.toml`
 
 ```toml
-unicornafl = {git = "https://github.com/AFLplusplus/unicornafl", branch = "main"}
+unicornafl = { git = "https://github.com/AFLplusplus/unicornafl", branch = "main" }
 ```
 
 `main` is used here because `unicorn` is not released yet. We will make it ready shortly.
 
+For more details, please refer to [Rust usage](./docs/rust-usage.md).
+
 ### Python
 
 At this moment, manual building is required (see below) but we will soon release wheels.
 
+For more details, please refer to [Python usage](./docs/python-usage.md).
+
+### C/C++
+
+After building this repo, you could link the generated static archive or shared library with included C/C++ header file in [include/unicornafl.h](./include/unicornafl.h).
+
+For more details, please refer to [C/C++ usage](./docs/c-usage.md).
+
 ## Build
 
 Simply do:
@@ -33,7 +43,7 @@ cargo build --release
 For python bindings, we have:
 
 ```bash
-maturin build
+maturin build --release
 ```
 
 ## Example && Minimal Tutorial
@@ -62,6 +72,8 @@ afl-fuzz -i ./input -o ./output-8 -b 1 -g 8 -G 8 -V 60 -c 0 -- ./target/release/
 
 This shall find the crash instantly, thanks to the `cmplog` integration.
 
+For more details, please refer to [Fuzzing using UnicornAFL](./docs/fuzzing.md).
+
 ## Migration
 
-There should be nothing special migrating from unicornafl v2.x to unicornafl v3.x, execpt the way integrating with `AFL++`. If your harness builds and statically links against unicornafl directly, there is no longer needed for the unicorn mode with `AFL++`. However, for Python users with `libunicornafl.so` dynamically linked, unicorn mode is still needed for `AFL++` command line.
+There should be nothing special migrating from unicornafl v2.x to unicornafl v3.x, execpt the way integrating with AFL++. If your harness builds and statically links against unicornafl directly, there is no longer needed for the unicorn mode with AFL++. However, if you are using Python, or using C/C++ with `libunicornafl.so` dynamically linked, unicorn mode (`-U` option) is still needed for `afl-fuzz` command line.
diff --git a/docs/c-usage.md b/docs/c-usage.md
@@ -0,0 +1,106 @@
+# C/C++ Usage for UnicornAFL
+
+To use UnicornAFL with C/C++, you should clone this repository and build it yourself:
+
+```shell
+git clone --depth 1 https://github.com/AFLplusplus/unicornafl && cd unicornafl
+cargo build --release
+```
+
+Before building this repo, make sure that you have installed dependencies to build [Unicorn](https://github.com/unicorn-engine/unicorn), and installed stable Rust compiler with at least 1.87.0.
+
+After building this repo, there will be a `libunicornafl.a` and a `libunicornafl.so` in `./target/release/` directory. To use UnicornAFL, you should link either one, and use header file at `./include/unicornafl.h`.
+
+## API usage
+
+The API for UnicornAFL is simple but powerful, which is the following two functions: `uc_afl_fuzz` and `uc_afl_fuzz_custom`.
+
+### Simplified API
+
+`uc_afl_fuzz`
+
+```c
+uc_afl_ret uc_afl_fuzz(uc_engine* uc, char* input_file,
+                       uc_afl_cb_place_input_t place_input_callback,
+                       uint64_t* exits, size_t exit_count,
+                       uc_afl_cb_validate_crash_t validate_crash_callback,
+                       bool always_validate, uint32_t persistent_iters,
+                       void* data);
+```
+
+`uc` is a unicorn instance created in advance. See the following [Creating Unicorn Instance](#Creating-Unicorn-Instance) for more details.
+
+`input_file` is a path to input file. If you are using the fuzzing mode, just pass `NULL` to this argument, and the input seed directory should be passed to `afl-fuzz` instead. For standalone mode, UnicornAFL takes input using this argument.
+
+`place_input_callback` is the callback for UnicornAFL to place received input into Unicorn's memory space. This callback takes five arguments: a pointer to the unicorn intance which users could use to read/write unicorn's emulated CPU/memory in this callback, a pointer to the input buffer, the input buffer length, the persistent round (which means how many times have this harness executed without exiting and forking to another child process), and custom data. This callback should return a bool, indicating whether this input is acceptable.
+
+`exits` and `exit_count` means the exit points for Unicorn. When the Unicorn instance reaches one of the given exit address, UnicornAFL will switch to next round.
+
+`validate_crash_callback` is the callback for UnicornAFL when an error encounted when executing the harness. It takes six arguments: a pointer to the unicorn intance, a value indicating the error of Unicorn when exuecting the harness, a pointer to the input buffer, the input buffer length, the persistent round, and custom data. This callback should return a bool, if it is `false`, then the AFL++ main executable will not treat this round as crash. This could be used to eliminate false positives during fuzzing.
+
+`always_validate` means whether the `validate_crash_callback` will be invoked even if the Unicorn does not face errors during execution.
+
+`persistent_iters` specifies how many times should this harness being executed persistently until the parent forks another child. For simplicity, you could just pass `1` here, which means always exiting and forking whenever this harness ends. However, if you want to write a more efficient harness, you should consider running persistently. Passing `0` here means never exiting or forking unless the process crashes, just run persistently.
+
+`data` is a custom data. In each callback listed above, this pointer will also passed as the callback argument. By this way you could maintain some shared data across execution.
+
+This function returns a `uc_afl_ret`. If it is not `UC_AFL_RET_OK`, this means unexpected things happened during fuzzing that you should take care of.
+
+### Advanced API
+
+`uc_afl_fuzz_custom`
+
+```c
+uc_afl_ret uc_afl_fuzz_custom(uc_engine* uc, char* input_file,
+                              uc_afl_cb_place_input_t place_input_callback,
+                              uc_afl_fuzz_cb_t fuzz_callbck,
+                              uc_afl_cb_validate_crash_t validate_crash_callback,
+                              bool always_validate, uint32_t persistent_iters,
+                              void* data);
+```
+
+Some of the arguments are the same as the simplified API. The only difference is the `fuzz_callbck` argument. UnicornAFL will use this function to start one execution round, and when this function stops, UnicornAFL knows this round has ended. By default, UnicornAFL will just use `uc_emu_start()`.
+
+### Creating Unicorn Instance
+
+Before using fuzzing APIs, you should create unicorn instance on your own. It should be noted that, UnicornAFL does not need to know the actual target to fuzz. Instead, you should manually setup your target in Unicorn instance (for example, map the codes in unicorn's memory space).
+
+## Tips
+
+### Linking
+
+Note that `libunicornafl.a` or `libunicornafl.so` already bundles a Unicorn. As a result, you don't need to manually link Unicorn any more.
+
+### Use a different version of Unicorn
+
+It should be noted that the internal of UnicornAFL depends heavily on some newest Unicorn APIs. As a result, older version of Unicorn may not work. However, if you want to use your own version of Unicorn, you should modify the `Cargo.toml` in this repo.
+
+First, find the following line:
+
+```toml
+unicorn-engine = { git = "https://github.com/unicorn-engine/unicorn", branch = "dev" }
+```
+
+If you want to use a Unicorn in local filesystem, you should change this line to
+
+```toml
+unicorn-engine = { path = "/path/to/unicorn/bindings/rust" }
+```
+
+Note that the `bindings/rust` suffix is necessary.
+
+If you want to use a forked Unicorn or Unicorn in remote Git server, you should change this line to
+
+```toml
+unicorn-engine = { git = "http://my/own/unicorn/fork" }
+```
+
+### Debugging
+
+Inside UnicornAFL, there are many logs could be used for debugging. To enable logging, you should compile this repo using
+
+```shell
+cargo build --release --features env_logger
+```
+
+And when running, passing `RUST_LOG=trace` as environment. (`AFL_DEBUG=1` is also needed if you are using `afl-fuzz` to run the harness)
diff --git a/docs/fuzzing.md b/docs/fuzzing.md
@@ -0,0 +1,70 @@
+# Fuzzing using UnicornAFL
+
+UnicornAFL is a bridge between AFL++ and Unicorn. 
+
+## Running Mode
+
+The harness built with UnicornAFL supports two running mode: standalone mode and fuzzing mode.
+
+### Standalone Mode
+
+This mode is not intended for fuzzing. Instead, you should use this mode to check whether you have written the correct harness, and it is also helpful to analyze the crashes found by AFL++.
+
+To run harness in standalone mode, you should directly execute the harness executable that uses UnicornAFL without using `afl-fuzz`. The commandline options for executing this harness is defined by users. Users need to then pass correct value to the parameter of UnicornAFL API, especially the `input_file` argument. The commandline harness executable should take a path to a file, then if it is passed to the `input_file`, UnicornAFL will use that file as input to execute the Unicorn engine for the target being tested.
+
+Before any fuzzing, you should create a normal input seed that don't expect to crash the harness. Then you should run in standalone mode to check that the harness can execute normally. Then if anything unexpected happened during standalone mode, this means you write the wrong harness.
+
+### Fuzzing mode
+
+After testing the correctness of the harness, then you can fuzz the harness using `afl-fuzz`. To use `afl-fuzz` with UnicornAFL, you should first make sure how you build the harness.
+
+If you are using Rust, or if you are using C/C++ that statically link the `libunicornafl.a`, then the minimized working example is
+
+```shell
+afl-fuzz \
+    -i input \
+    -o output \
+    -- \
+    ./your-harness --and-your-own-harness-options
+```
+
+If you are using Python, or if you are using C/C++ that dynamically link the `libunicornafl.so`, then the minimized working example is
+
+```shell
+afl-fuzz \
+    -U \
+    -i input \
+    -o output \
+    -- \
+    ./your-harness --and-your-own-harness-options
+```
+
+The `-U` option specifies that this is the legacy Unicorn mode.
+
+Note that you don't need to use `@@` to specify input file, we use shared memory to get input seed.
+
+## Persistent Fuzzing
+
+UnicornAFL supports persistent fuzzing. Instead of forking at the beginning of each execution round, persistent fuzzing will just do a `for`-loop to execute the target. The overall steps are:
+
+1. Users invoke `afl-fuzz` and pass the path to your UnicornAFL harness.
+2. `afl-fuzz` spawns a harness process (which we call it harness parent).
+3. The harness process will execute until the beginning of one of the UnicornAFL's APIs (`uc_afl_fuzz` and `uc_afl_fuzz_custom`). Then it will fork itself, producing another process (which we call it harness child).
+4. The harness child contains a loop that executes the target with Unicorn engine repeatly. Each round is counted as a execution for `afl-fuzz`.
+5. When the user specified `persistent_round` is achieved, or the harness child process crashes (which is rare, since the exceptions shall be captured by Unicorns already), the harness child end. The harness parent will fork a new harness child and do the same thing.
+
+Since in the harness child, the target is executed repeatly, it is very important that **you should restore the Unicorn's state after each round** unless you can make sure the target does not modify Unicorn's CPU and memory in this round. To make things easier, you can just specify `persistent_round` as 1, which downgrade to the legacy forkserver-based fuzzing, which is significantly slower.
+
+## CMPLOG and CMPCOV
+
+UnicornAFL also supprost CMPLOG and CMPCOV in AFL++. If you don't know these terms, please refer to the AFL++'s documentation. In short, this is aimed to bypass the long comparison like `CMP RAX, 0x114514`.
+
+To use CMPCOV mode, you should specify `UNICORN_AFL_CMPCOV=1` environment in `afl-fuzz`.
+
+To use CMPLOG mode, you can just add `-c 0` option to `afl-fuzz`.
+
+## Which language should I choose to use?
+
+The language to choose may have a little affect on the throughput of fuzzing, while you should keep in mind that the main overhead is the target itself.
+
+Although not benchmarked, Rust may be a slightly faster than C/C++ due to the power of inlining and LTO. The python version is much more slower. However, since the it only have a little affect, it is more appropriate if you choose the language that you are good at. Don't struggle with language itself, it is fuzzing that is all you need :)
diff --git a/docs/python-usage.md b/docs/python-usage.md
@@ -0,0 +1,113 @@
+# Python Usage for UnicornAFL
+
+To use UnicornAFL with Python, you should clone this repository and build it yourself:
+
+```shell
+git clone --depth 1 https://github.com/AFLplusplus/unicornafl && cd unicornafl
+cargo build --release
+maturin build --release
+```
+
+Before building this repo, make sure that you have installed dependencies to build [Unicorn](https://github.com/unicorn-engine/unicorn), and installed stable Rust compiler with at least 1.87.0, and you should also install [maturin](https://www.maturin.rs).
+
+After building this repo, there will be a wheel in `./target/wheels`, just use it.
+
+## API usage
+
+The API for UnicornAFL is simple but powerful, which is the following two functions: `uc_afl_fuzz` and `uc_afl_fuzz_custom`.
+
+### Simplified API
+
+`uc_afl_fuzz`
+
+```python
+def uc_afl_fuzz(uc: Uc,
+                input_file: str,
+                place_input_callback: Callable,
+                exits: List[int],
+                validate_crash_callback: Callable = None,
+                always_validate: bool = False,
+                persistent_iters: int = 1,
+                data: Any = None): ...
+```
+
+`uc` is a unicorn instance created in advance. See the following [Creating Unicorn Instance](#Creating-Unicorn-Instance) for more details.
+
+`input_file` is a path to input file. If you are using the fuzzing mode, just pass `None` to this argument, and the input seed directory should be passed to `afl-fuzz` instead. For standalone mode, UnicornAFL takes input using this argument.
+
+`place_input_callback` is the callback for UnicornAFL to place received input into Unicorn's memory space. This callback takes four arguments: a pointer to the unicorn intance which users could use to read/write unicorn's emulated CPU/memory in this callback, input buffer, the persistent round (which means how many times have this harness executed without exiting and forking to another child process), and custom data. This callback should return a Bool, indicating whether this input is acceptable.
+
+`exits` means the exit points for Unicorn. When the Unicorn instance reaches one of the given exit address, UnicornAFL will switch to next round.
+
+`validate_crash_callback` is the callback for UnicornAFL when an error encounted when executing the harness. It takes five arguments: a pointer to the unicorn intance, a value indicating the error of Unicorn when exuecting the harness, the input buffer, the persistent round, and custom data. This callback should return a Bool, if it is `False`, then the AFL++ main executable will not treat this round as crash. This could be used to eliminate false positives during fuzzing.
+
+`always_validate` means whether the `validate_crash_callback` will be invoked even if the Unicorn does not face errors during execution.
+
+`persistent_iters` specifies how many times should this harness being executed persistently until the parent forks another child. For simplicity, you could just pass `1` here, which means always exiting and forking whenever this harness ends. However, if you want to write a more efficient harness, you should consider running persistently. Passing `0` here means never exiting or forking unless the process crashes, just run persistently.
+
+`data` is a custom data. In each callback listed above, this pointer will also passed as the callback argument. By this way you could maintain some shared data across execution.
+
+This function returns a `UcAflError` or value `UC_AFL_RET_OK`. If the return value is not `UC_AFL_RET_OK`, this means unexpected things happened during fuzzing that you should take care of.
+
+### Advanced API
+
+`uc_afl_fuzz_custom`
+
+```python
+def uc_afl_fuzz_custom(uc: Uc,
+                       input_file: str,
+                       place_input_callback: Callable,
+                       fuzzing_callback: Callable,
+                       validate_crash_callback: Callable = None,
+                       always_validate: bool = False,
+                       persistent_iters: int = 1,
+                       data: Any = None): ...
+```
+
+Some of the arguments are the same as the simplified API. The only difference is the `fuzz_callbck` argument. UnicornAFL will use this function to start one execution round, and when this function stops, UnicornAFL knows this round has ended. By default, UnicornAFL will just use `uc_emu_start()`.
+
+### Creating Unicorn Instance
+
+Before using fuzzing APIs, you should create unicorn instance on your own. It should be noted that, UnicornAFL does not need to know the actual target to fuzz. Instead, you should manually setup your target in Unicorn instance (for example, map the codes in unicorn's memory space).
+
+## Tips
+
+### Use a different version of Unicorn
+
+It should be noted that the internal of UnicornAFL depends heavily on some newest Unicorn APIs. As a result, older version of Unicorn may not work. However, if you want to use your own version of Unicorn, you should modify the `Cargo.toml` in this repo.
+
+First, find the following line:
+
+```toml
+unicorn-engine = { git = "https://github.com/unicorn-engine/unicorn", branch = "dev" }
+```
+
+If you want to use a Unicorn in local filesystem, you should change this line to
+
+```toml
+unicorn-engine = { path = "/path/to/unicorn/bindings/rust" }
+```
+
+Note that the `bindings/rust` suffix is necessary.
+
+If you want to use a forked Unicorn or Unicorn in remote Git server, you should change this line to
+
+```toml
+unicorn-engine = { git = "http://my/own/unicorn/fork" }
+```
+
+### Linking
+
+To use UnicornAFL and Unicorn at the same time, you should make sure that the Unicorn version that UnicornAFL uses is consistent with the Unicorn version of Unicorn Python package. Then you can import Unicorn package and UnicornAFL package at the same time.
+
+When building the Python binding, we dynamically link the Unicorn shared library. As a result, using UnicornAFL and Unicorn package at the same time will be OK as long as the Unicorn version does not conflict.
+
+### Debugging
+
+Inside UnicornAFL, there are many logs could be used for debugging. To enable logging, you should compile this repo using
+
+```shell
+cargo build --release --features env_logger
+```
+
+And when running, passing `RUST_LOG=trace` as environment. (`AFL_DEBUG=1` is also needed if you are using `afl-fuzz` to run the harness)
diff --git a/docs/rust-usage.md b/docs/rust-usage.md