A command-line tool for recursively analyzing WAV audio files in a directory (including subfolders) to detect human speech using advanced Voice Activity Detection (VAD) based on WebRTC algorithms. Files are separated into speech and non_speech subfolders within the output directory, preserving the original folder structure.
- Recursive Processing: Scans input directory and all subdirectories for
.wavfiles. - Advanced Speech Detection: Uses WebRTC VAD (via
earshotcrate) with energy-based statistical analysis on 20ms frames at 16kHz for robust detection in noisy environments. - Structure Preservation: Copies files to output while maintaining relative paths.
- Error Handling: Graceful handling of unsupported formats/channels with informative messages.
- Ensure you have Rust installed (version 1.75+ recommended).
- Clone the repository:
git clone https://github.com/RustedBytes/wav-files-vad cd wav-files-vad - Build the project:
The binary will be available at
cargo build --releasetarget/release/wav-files-vad.
Run the tool with the input directory (required) and optional output directory:
wav-files-vad [OPTIONS] <INPUT_DIR>
Args:
<INPUT_DIR> Input directory containing WAV files (processed recursively)
Options:
-o, --output-dir <OUTPUT_DIR> Output directory for separated files (creates 'speech' and 'non_speech' subfolders). Defaults to 'output' in the current directory
-h, --help Print help
-V, --version Print version
# Process ./audio_samples/ and output to ./results/
wav-files-vad ./audio_samples/ -o ./results/
This will:
- Create
./results/speech/and./results/non_speech/. - Copy detected files while preserving subfolder structure (e.g.,
./audio_samples/sub/dir/file.wav→./results/speech/sub/dir/file.wav).
- Dependencies: Managed via
Cargo.toml. Key crates:clap: CLI argument parsing.hound: WAV file I/O.walkdir: Recursive directory traversal.anyhow: Error handling.earshot: WebRTC VAD implementation.
- Run Tests:
Includes unit tests for VAD analysis (silence, speech simulation, edge cases).
cargo test - Formatting and Linting:
cargo fmt cargo clippy
- Supports only 16-bit integer WAV files (PCM format assumed).
- Stereo files are downmixed to mono; multi-channel (>2) unsupported.
- VAD tuned for English-like speech; may need adjustment for other languages/noise profiles.
- Offline processing only; no real-time mode.
@software{Smoliakov_Wav_Files_Toolkit,
author = {Smoliakov, Yehor},
month = oct,
title = {{WAV Files Toolkit: A suite of command-line tools for common WAV audio processing tasks, including conversion from other formats, data augmentation, loudness normalization, spectrogram generation, and validation.}},
url = {https://github.com/RustedBytes/wav-files-toolkit},
version = {0.4.0},
year = {2025}
}