wav-files-vad

A command-line tool for recursively analyzing WAV audio files in a directory (including subfolders) to detect human speech using advanced Voice Activity Detection (VAD) based on WebRTC algorithms. Files are separated into speech and non_speech subfolders within the output directory, preserving the original folder structure.

Features

Recursive Processing: Scans input directory and all subdirectories for .wav files.
Advanced Speech Detection: Uses WebRTC VAD (via earshot crate) with energy-based statistical analysis on 20ms frames at 16kHz for robust detection in noisy environments.
Structure Preservation: Copies files to output while maintaining relative paths.
Error Handling: Graceful handling of unsupported formats/channels with informative messages.

Installation

Ensure you have Rust installed (version 1.75+ recommended).

Clone the repository:

git clone https://github.com/RustedBytes/wav-files-vad
cd wav-files-vad

Build the project:
```
cargo build --release
```
The binary will be available at target/release/wav-files-vad.

Usage

Run the tool with the input directory (required) and optional output directory:

wav-files-vad [OPTIONS] <INPUT_DIR>

Args:
  <INPUT_DIR>    Input directory containing WAV files (processed recursively)

Options:
  -o, --output-dir <OUTPUT_DIR>    Output directory for separated files (creates 'speech' and 'non_speech' subfolders). Defaults to 'output' in the current directory
  -h, --help                       Print help
  -V, --version                    Print version

Example

# Process ./audio_samples/ and output to ./results/
wav-files-vad ./audio_samples/ -o ./results/

This will:

Create ./results/speech/ and ./results/non_speech/.
Copy detected files while preserving subfolder structure (e.g., ./audio_samples/sub/dir/file.wav → ./results/speech/sub/dir/file.wav).

Building and Development

Dependencies: Managed via Cargo.toml. Key crates:
- clap: CLI argument parsing.
- hound: WAV file I/O.
- walkdir: Recursive directory traversal.
- anyhow: Error handling.
- earshot: WebRTC VAD implementation.
Run Tests:
```
cargo test
```
Includes unit tests for VAD analysis (silence, speech simulation, edge cases).
Formatting and Linting:
```
cargo fmt
cargo clippy
```

Limitations

Supports only 16-bit integer WAV files (PCM format assumed).
Stereo files are downmixed to mono; multi-channel (>2) unsupported.
VAD tuned for English-like speech; may need adjustment for other languages/noise profiles.
Offline processing only; no real-time mode.

Cite

@software{Smoliakov_Wav_Files_Toolkit,
  author = {Smoliakov, Yehor},
  month = oct,
  title = {{WAV Files Toolkit: A suite of command-line tools for common WAV audio processing tasks, including conversion from other formats, data augmentation, loudness normalization, spectrogram generation, and validation.}},
  url = {https://github.com/RustedBytes/wav-files-toolkit},
  version = {0.4.0},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cargo		.cargo
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

wav-files-vad

Features

Installation

Usage

Example

Building and Development

Limitations

Cite

About

Uh oh!

Releases 3

Languages

License

RustedBytes/wav-files-vad

Folders and files

Latest commit

History

Repository files navigation

wav-files-vad

Features

Installation

Usage

Example

Building and Development

Limitations

Cite

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Languages