Official repository for streaming decoding algorithms for ASR-based Keyword Spotting (KWS) systems, featuring implementations from our research papers:
- MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding (Arxiv, Arxiv | T-ASLP)
 - CDC-KWS: Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency (ICASSP 2025, Arxiv | IEEE)
 - TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-Duration Transducer (ICASSP 2024, Arxiv | IEEE)
 
This repository contains state-of-the-art streaming decoding algorithms for KWS systems, featuring:
- Frame-asynchronous decoding for efficient keyword search
 - Multi-head fusion for robust performance
 - Cross-layer consistency for improved discrimination
 - Transducer-based approaches for Transducer-based KWS
 - CTC-based approaches for CTC-based KWS
 
git clone https://github.com/yourusername/KWStreamingSearch.git
cd KWStreamingSearch
pip install -r requirements.txtKWStreamingSearch
├── KWStreamingSearch
│   ├── CTC
│   │   ├── cdc_streaming_search.py  # CDC-enhanced streaming search
│   │   └── ctc_streaming_search.py  # Basic CTC streaming search
│   ├── MFA
│   │   ├── mfa_streaming_search.py  # Multi-head frame-asynchronous search
│   │   └── mfs_streaming_search.py  # Multi-head frame-synchronous search
│   ├── Transducer
│   │   └── trans_streaming_search.py # Transducer-based streaming search
│   ├── __init__.py
│   ├── base_search.py  # Base search class
│   ├── example.py  # Usage examples
│   └── fusion_strategy.py  # Fusion strategies
├── README.md
└── requirements.txt
See example.py for more detailed usage examples.
- 
TDT-KWS:
- Propose a new Transducer-based streaming deocding method, outperforming the traditional ASR-based decoding.
 - Significant inference speed-up by the variant of Token-and-Duration Transducer.
 
 - 
CDC-KWS:
- CTC-based keyword-specific streaming search.
 - Improved robustness in noisy environments by cross-layer discrimination consitency.
 
 - 
MFA-KWS:
- State-of-the-art on Snips, MobvoiHotwords, LibriKWS-20.
 - Significant speed-up over frame-synchronous baselines.
 
 
We welcome contributions! Please open an issue or submit a pull request.
This project is licensed under the Apache License 2.0
If you think our work helps in your research, please cite:
@inproceedings{icassp2024-yuxi-tdt_kw,
  author       = {Yu Xi and Hao Li and Baochen Yang and Haoyu Li and Hainan Xu and Kai Yu},
  title        = {{TDT-KWS:} Fast and Accurate Keyword Spotting Using Token-and-Duration
                  Transducer},
  booktitle    = {ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages        = {11350--11355},
  year         = {2024},
}
@INPROCEEDINGS{icassp2025-yuxi-cdc_kws,
  author={Xi, Yu and Li, Haoyu and Gu, Xiaoyu and Li, Hao and Jiang, Yidi and Yu, Kai},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency}, 
  year={2025},
  pages={1-5},
}(MFA-KWS will be added after publication)
For questions, please contact Yu Xi: [email protected] or Haoyu Li: [email protected] or open an issue.