|
1 | 1 | # wav-files-vad-api |
2 | 2 |
|
3 | | -This project provides a simple API for Voice Activity Detection (VAD) on WAV audio files. |
| 3 | +A command-line tool for recursively performing Voice Activity Detection (VAD) on WAV audio files using one or more external API servers. It validates files against a specific format (mono channel, 16-bit PCM, 16kHz sample rate), processes them in parallel, preserves the input directory structure in the output, and provides summary statistics on completion. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Recursive Scanning**: Walks the input directory tree to find all `.wav` files using `walkdir`. |
| 8 | +- **Format Validation**: Ensures WAV files meet the required specs (mono, 16-bit PCM, 16kHz) using the `hound` crate. |
| 9 | +- **Parallel Processing**: Leverages `rayon` to process files concurrently, with the degree of parallelism matching the number of provided API servers. |
| 10 | +- **API Integration**: Distributes load by sending JSON requests to a list of external VAD APIs via `ureq` and handles responses. |
| 11 | +- **Robust Error Handling**: Uses `anyhow` for contextual error propagation and clear logging. |
| 12 | +- **Directory Preservation**: Mirrors the input folder structure in the output directory. |
| 13 | +- **CLI-Friendly**: Built with `clap` for intuitive argument parsing and help output. |
| 14 | + |
| 15 | +## Prerequisites |
| 16 | + |
| 17 | +- Rust 1.75+ (stable channel, due to `2024` edition) |
| 18 | +- An external API server running at the specified address(es), accepting POST requests with JSON payloads for VAD. |
| 19 | + - Expected request body: `{ "input_file": String, "output_dir": String, "model": Option<String> }` |
| 20 | + - Expected success response: HTTP status `200 OK`. |
| 21 | + |
| 22 | +## Installation |
| 23 | + |
| 24 | +### From GitHub Releases |
| 25 | + |
| 26 | +Statically-linked Linux binaries are available for download from the Releases page. |
| 27 | + |
| 28 | +### From Source |
| 29 | + |
| 30 | +1. Clone the repository: |
| 31 | + ```bash |
| 32 | + git clone https://github.com/RustedBytes/wav-files-vad-api.git |
| 33 | + cd wav-files-vad-api |
| 34 | + ``` |
| 35 | + |
| 36 | +2. Build the project: |
| 37 | + ```bash |
| 38 | + cargo build --release |
| 39 | + ``` |
| 40 | + |
| 41 | + The binary will be available at `target/release/wav-files-vad-api`. |
| 42 | + |
| 43 | +## Usage |
| 44 | + |
| 45 | +Run the tool with the required input and output directories, and at least one API server address. |
| 46 | + |
| 47 | +```bash |
| 48 | +wav-files-vad-api /path/to/input/dir /path/to/output/dir --addr-api http://localhost:8000/vad |
| 49 | +``` |
| 50 | + |
| 51 | +### Arguments |
| 52 | + |
| 53 | +- `INPUT_DIR`: Path to the directory containing WAV files (scanned recursively). |
| 54 | +- `OUTPUT_DIR`: Path to the directory where VAD output files will be saved (created if it doesn't exist). |
| 55 | +- `--addr-api <ADDR_API>`: A comma-separated list of VAD API server URLs. Work will be distributed among them. (Required) |
| 56 | +- `--model <MODEL>`: An optional model name to pass to the VAD API. |
| 57 | +
|
| 58 | +### Example |
| 59 | +
|
| 60 | +Process all valid WAV files in `./raw_audio/` and save results to `./processed_audio/` using two local API servers for parallel execution: |
| 61 | +
|
| 62 | +```bash |
| 63 | +./target/release/wav-files-vad-api ./raw_audio ./processed_audio --addr-api http://127.0.0.1:8001/vad,http://127.0.0.1:8002/vad |
| 64 | +``` |
| 65 | +
|
| 66 | +Example output: |
| 67 | +``` |
| 68 | +Skipping invalid WAV file: ./raw_audio/unsupported_format.wav |
| 69 | +Error processing ./raw_audio/corrupted.wav: Failed to open WAV file: ./raw_audio/corrupted.wav |
| 70 | +VAD failed for ./raw_audio/no_speech.wav: API returned status 500 |
| 71 | +VAD complete: 42 files processed, 3 skipped. |
| 72 | +``` |
| 73 | +
|
| 74 | +## Dependencies |
| 75 | +
|
| 76 | +This tool relies on the following crates (as defined in `Cargo.toml`): |
| 77 | +
|
| 78 | +| Crate | Purpose | Version | |
| 79 | +|---|---|---| |
| 80 | +| `anyhow` | Contextual error handling | `1.0` | |
| 81 | +| `clap` | CLI argument parsing | `4.5` | |
| 82 | +| `hound` | WAV file reading and validation | `3.5` | |
| 83 | +| `rayon` | Data parallelism | `1.11` | |
| 84 | +| `serde` | JSON serialization/deserialization | `1.0` | |
| 85 | +| `ureq` | HTTP client for API requests | `3.1` | |
| 86 | +| `walkdir` | Recursive directory traversal | `2.5` | |
| 87 | +
|
| 88 | +## Contributing |
| 89 | +
|
| 90 | +1. Fork the repo. |
| 91 | +2. Create a feature branch (`git checkout -b feature/my-feature`). |
| 92 | +3. Commit changes (`git commit -am 'Add my feature'`). |
| 93 | +4. Push to the branch (`git push origin feature/my-feature`). |
| 94 | +5. Open a Pull Request. |
| 95 | +
|
| 96 | +Please ensure code is formatted with `cargo fmt` before submitting. |
| 97 | +
|
| 98 | +## License |
| 99 | +
|
| 100 | +This project is licensed under the MIT License - see the LICENSE file for details. |
| 101 | +
|
0 commit comments