diff --git a/.trunk/configs/.markdownlint.yaml b/.trunk/configs/.markdownlint.yaml index b40ee9d7..94777f46 100644 --- a/.trunk/configs/.markdownlint.yaml +++ b/.trunk/configs/.markdownlint.yaml @@ -1,2 +1,7 @@ # Prettier friendly markdownlint config (all formatting rules disabled) extends: markdownlint/style/prettier +MD033: + allowed_elements: + - div + - p + - strong diff --git a/README.md b/README.md index 5328d2e5..05a52e34 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,87 @@ ---- -title: Vocalizr -emoji: πŸ”Š -colorFrom: purple -colorTo: yellow -sdk: docker -app_port: 7860 ---- +# πŸ”Š Vocalizr -## Vocalizr: Voice Generator part of the Chatacter Backend +

+ A professional AI-powered voice generation application for high-quality text-to-speech synthesis +

-[![Code Quality](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/code_quality.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/code_quality.yaml) +
+ +[![Build](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/build.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/build.yaml) +[![CI Tools](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/ci_tools.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/ci_tools.yaml) [![CodeQL](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/github-code-scanning/codeql) [![Dependabot Updates](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/dependabot/dependabot-updates/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/dependabot/dependabot-updates) -[![Docker Images](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/docker.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/docker.yaml) -[![GitHub Release](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/github.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/github.yaml) -[![Push to HuggingFace](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/huggingface.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/huggingface.yaml) -[![Upgrade Trunk Check](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/trunk_upgrade.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/trunk_upgrade.yaml) -[![Upload Python Package](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/pypi.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/pypi.yaml) +[![Release](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/release.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/release.yaml) +[![Test](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/test.yaml/badge.svg)](https://github.com/AlphaSphereDotAI/vocalizr/actions/workflows/test.yaml) + +
+ +Vocalizr is a state-of-the-art voice generation application that transforms text into natural-sounding speech using the powerful Kokoro AI model. Part of the Character Backend ecosystem, it provides both a user-friendly web interface and a robust API for seamless integration into larger applications. + +## ✨ Features + +- **🎭 Multiple Voice Personas**: Choose from 20+ distinct voice options including American and British accents, male and female voices +- **πŸš€ GPU Acceleration**: Automatic CUDA detection and utilization for high-performance generation +- **🌐 Web Interface**: Intuitive Gradio-based interface for easy interaction +- **πŸ”§ Configurable Parameters**: Adjust speed, character limits, and file output options +- **πŸ“ File Export**: Save generated audio as WAV files for offline use +- **🐳 Docker Support**: Ready-to-deploy containerized application +- **πŸ“Š Real-time Streaming**: Live audio generation and playback +- **πŸ›‘οΈ Production Ready**: Comprehensive monitoring, logging, and error handling + +## πŸš€ Quick Start + +### Using Docker (Recommended) + +```bash +# Pull and run the latest image +docker run -p 7860:7860 ghcr.io/alphaspheredotai/vocalizr:latest + +# Access the web interface at http://localhost:7860 +``` + +### Using uv + +```bash +uvx vocalizr +``` + +## πŸ“š Documentation + +- **[Voice Reference](docs/VOICES.md)** - for complete list of voice ids +- **[Installation Guide](docs/INSTALLATION.md)** - Detailed setup instructions for all platforms +- **[Usage Guide](docs/USAGE.md)** - Web interface and CLI usage examples +- **[API Reference](docs/API.md)** - Complete API documentation for developers +- **[Configuration](docs/CONFIGURATION.md)** - Environment variables and settings +- **[Development Guide](docs/DEVELOPMENT.md)** - Contributing and architecture overview +- **[Examples](docs/EXAMPLES.md)** - Code examples and tutorials +- **[Deployment](docs/DEPLOYMENT.md)** - Production deployment guide +- **[Troubleshooting](docs/TROUBLESHOOTING.md)** - Common issues and solutions + +## πŸ› οΈ System Requirements + +- **Python**: 3.12 or higher +- **Memory**: 4GB RAM minimum (8GB recommended) +- **Storage**: 2GB free space for models and cache +- **GPU**: CUDA-compatible GPU (optional, for faster generation) +- **Network**: Internet connection for initial model download + +## πŸ“„ License + +This project is licensed under the MIT License. See [LICENSE](docs/LICENSE.md) for details. + +## πŸ™ Acknowledgments + +- **Kokoro AI Model**: Built on the powerful Kokoro text-to-speech engine +- **Gradio**: Enabling the intuitive web interface +- **AlphaSphere.AI**: Part of the comprehensive Character Backend ecosystem + +## πŸ“ž Support + +- πŸ› **Issues**: [GitHub Issues](https://github.com/AlphaSphereDotAI/vocalizr/issues) +- πŸ“§ **Contact**: [alphasphere.ai@gmail.com](mailto:alphasphere.ai@gmail.com) + +--- + +

+ 🌟 If you find Vocalizr useful, please consider giving it a star! 🌟 +

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md new file mode 100644 index 00000000..51a7e7d0 --- /dev/null +++ b/docs/CHANGELOG.md @@ -0,0 +1,191 @@ +# πŸ“ Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +### Added +- Comprehensive professional documentation suite +- Installation guide with multiple deployment methods +- Usage guide for web interface, CLI, and Python API +- Complete API documentation with integration examples +- Configuration guide for environment variables and deployment +- Development guide with architecture and contribution workflow +- Examples and tutorials for common use cases +- Deployment guide for production environments +- Troubleshooting guide for common issues +- Contributing guidelines and code of conduct + +### Changed +- Enhanced README.md with professional overview and features +- Improved project structure with organized documentation + +### Documentation +- Added comprehensive documentation covering all aspects of the project +- Included practical examples for various use cases +- Provided detailed troubleshooting and support information + +## [0.0.1] - 2024-01-15 + +### Added +- Initial release of Vocalizr voice generation application +- Gradio web interface for text-to-speech conversion +- Support for 20+ voice personas (American and British accents) +- Command-line interface for easy deployment +- Python API for programmatic usage +- Docker support for containerized deployment +- CUDA GPU acceleration support +- Configurable speed and character limit settings +- Audio file export functionality (WAV format) +- Real-time streaming audio generation +- Comprehensive logging and error handling + +### Features +- **Voice Selection**: Multiple voice options with different personalities + - American female voices: Heart, Bella, Nicole, Aoede, Kore, Sarah, Nova, Sky, Alloy, Jessica, River + - American male voices: Michael, Fenrir, Puck, Echo, Eric, Liam, Onyx, Santa, Adam + - British female voices: Emma, Isabella, Alice, Lily + - British male voices: George, Fable, Lewis, Daniel + +- **Web Interface**: Intuitive Gradio-based interface with: + - Text input with character limit control + - Voice selection dropdown + - Speed adjustment slider (0.5x to 2.0x) + - Hardware detection and display + - Real-time audio generation and playback + - Audio file download capability + +- **API Features**: + - Generator-based audio streaming + - Configurable voice and speed parameters + - Optional file saving + - Debug mode support + - Error handling and validation + +- **Technical Specifications**: + - Built on Kokoro AI text-to-speech model + - 24kHz audio output sample rate + - Float32 audio data format + - Automatic CUDA detection and usage + - Environment-based configuration + - Structured logging with Loguru + +### Technical Details +- **Framework**: Gradio for web interface +- **AI Model**: Kokoro 82M parameter model +- **Audio Processing**: soundfile for WAV file operations +- **Backend**: PyTorch with CUDA support +- **Configuration**: Environment variable based +- **Containerization**: Docker with multi-stage build +- **Package Management**: uv for dependency management + +### Infrastructure +- GitHub Actions CI/CD workflows +- Docker image publishing to GitHub Container Registry +- Automated code quality checks with Ruff +- Dependabot for dependency updates +- CodeQL security scanning +- Automated testing and linting + +### Dependencies +- `gradio[mcp]>=5.38.0` - Web interface framework +- `kokoro>=0.9.4` - Text-to-speech AI model +- `soundfile>=0.13.1` - Audio file processing +- `pip>=25.1.1` - Package installer + +### Development Dependencies +- `ruff>=0.11.12` - Code formatting and linting +- `ty>=0.0.1a10` - Type checking utilities + +### Known Issues +- Requires internet connection for initial model download +- GPU acceleration requires CUDA-compatible hardware +- Large memory usage for longer text inputs + +### Breaking Changes +- None (initial release) + +--- + +## Release Notes Format + +For future releases, we follow this format: + +### Version Types +- **Major** (X.0.0): Breaking changes +- **Minor** (0.X.0): New features, backwards compatible +- **Patch** (0.0.X): Bug fixes, backwards compatible + +### Change Categories +- **Added**: New features +- **Changed**: Changes in existing functionality +- **Deprecated**: Soon-to-be removed features +- **Removed**: Removed features +- **Fixed**: Bug fixes +- **Security**: Vulnerability fixes + +### Unreleased Section +- Keep track of changes not yet released +- Move to versioned section on release +- Follow semantic versioning principles + +--- + +## Contributing to Changelog + +When contributing to the project: + +1. **Add entries** to the `[Unreleased]` section +2. **Use appropriate categories** (Added, Changed, Fixed, etc.) +3. **Write clear descriptions** of changes +4. **Reference issues/PRs** where relevant +5. **Follow the format** established in previous entries + +### Example Entry Format + +```markdown +### Added +- New batch processing API endpoint for multiple text inputs (#123) +- Support for custom voice models via configuration (#124) + +### Fixed +- Memory leak in audio generation pipeline (#125) +- Incorrect sample rate handling for certain voices (#126) + +### Changed +- Improved error messages for invalid input validation (#127) +- Updated Gradio to version 5.45.0 for better performance (#128) +``` + +--- + +## Upgrade Guide + +### From Future Versions + +Instructions for upgrading between versions will be provided here as they become available. + +### Breaking Changes Policy + +We are committed to minimizing breaking changes. When they are necessary: + +1. **Advance notice** will be given (at least one minor version) +2. **Migration guides** will be provided +3. **Deprecation warnings** will be added first +4. **Alternative approaches** will be documented + +--- + +## Support and Resources + +- **Documentation**: [docs/](docs/) +- **Issues**: [GitHub Issues](https://github.com/AlphaSphereDotAI/vocalizr/issues) +- **Discussions**: [GitHub Discussions](https://github.com/AlphaSphereDotAI/vocalizr/discussions) +- **Contact**: [mohamed.hisham.abdelzaher@gmail.com](mailto:mohamed.hisham.abdelzaher@gmail.com) + +--- + +**Note**: This changelog is maintained by the project maintainers and community contributors. If you notice any missing or incorrect information, please [open an issue](https://github.com/AlphaSphereDotAI/vocalizr/issues) or [submit a pull request](https://github.com/AlphaSphereDotAI/vocalizr/pulls). \ No newline at end of file diff --git a/docs/CODE_OF_CONDUCT.md b/docs/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..0ac542f0 --- /dev/null +++ b/docs/CODE_OF_CONDUCT.md @@ -0,0 +1,275 @@ +# πŸ“œ Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our community include: + +### βœ… Positive Behaviors + +- **Demonstrating empathy and kindness** toward other people +- **Being respectful** of differing opinions, viewpoints, and experiences +- **Giving and gracefully accepting** constructive feedback +- **Accepting responsibility** and apologizing to those affected by our mistakes, and learning from the experience +- **Focusing on what is best** not just for us as individuals, but for the overall community +- **Using welcoming and inclusive language** +- **Being collaborative** and working together towards common goals +- **Helping newcomers** feel welcome and supported +- **Recognizing and celebrating** the contributions of others + +### ❌ Unacceptable Behaviors + +Examples of unacceptable behavior include: + +- **Harassment of any kind**, including unwelcome comments related to personal characteristics +- **Trolling, insulting, or derogatory comments**, and personal or political attacks +- **Public or private harassment**, including stalking or following +- **Publishing others' private information**, such as a physical or email address, without their explicit permission +- **Sexual attention or advances** of any kind +- **Conduct which could reasonably be considered inappropriate** in a professional setting +- **Deliberate intimidation**, threats, or violent language +- **Spamming or excessive self-promotion** +- **Disrupting discussions** or community events +- **Other conduct** which could reasonably be considered inappropriate in a professional setting + +## Scope + +This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include: + +### 🌐 Community Spaces + +- Using an official e-mail address +- Posting via an official social media account +- Acting as an appointed representative at an online or offline event +- Participating in project repositories, issues, pull requests, and discussions +- Attending community events, meetups, or conferences +- Communicating in official project channels (Discord, Slack, forums, etc.) + +### πŸ“± External Representation + +When community members are representing the project or community externally, they are expected to maintain the same standards of behavior outlined in this Code of Conduct. + +## Enforcement Responsibilities + +Community leaders and maintainers are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. + +### πŸ›‘οΈ Moderator Powers + +Community leaders have the right and responsibility to: + +- **Remove, edit, or reject** comments, commits, code, wiki edits, issues, and other contributions that are not aligned with this Code of Conduct +- **Communicate reasons** for moderation decisions when appropriate +- **Temporarily or permanently ban** any contributor for behaviors that they deem inappropriate, threatening, offensive, or harmful + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: + +### 1. πŸ“ Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. + +**Example Actions**: +- Private message explaining the issue +- Request to edit or remove problematic content +- Warning about future consequences + +### 2. ⚠️ Warning + +**Community Impact**: A violation through a single incident or series of actions. + +**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. + +**Example Actions**: +- Formal written warning +- Temporary restriction from community participation +- Monitoring of future interactions + +### 3. 🚫 Temporary Ban + +**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. + +**Duration**: Typically 1 week to 3 months, depending on severity. + +**Example Actions**: +- Suspension from repository access +- Temporary ban from community events +- Removal from communication channels + +### 4. πŸ”’ Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within the community. + +**Example Actions**: +- Permanent removal from all community spaces +- Blocking from future participation +- Legal action if warranted + +## Reporting Guidelines + +### πŸ“§ How to Report + +If you experience or witness unacceptable behaviorβ€”or have any other concernsβ€”please report it by contacting the project maintainers: + +**Primary Contact**: [mohamed.hisham.abdelzaher@gmail.com](mailto:mohamed.hisham.abdelzaher@gmail.com) + +**Alternative Reporting Methods**: +- Create a private issue in the repository +- Direct message on GitHub +- Contact via project communication channels + +### πŸ”’ Confidentiality + +All reports will be handled with discretion and confidentiality. We respect the privacy of those who report issues and those who are reported. + +### πŸ“‹ Report Information + +When reporting, please include: + +- **What happened**: A clear description of the incident +- **When it happened**: Date and time (with timezone if possible) +- **Where it happened**: Platform, channel, or location +- **Who was involved**: People involved (you can use pseudonyms) +- **Supporting evidence**: Screenshots, links, or other evidence +- **Impact**: How the incident affected you or others +- **Desired outcome**: What you'd like to see happen + +### πŸ•’ Response Time + +- **Acknowledgment**: Within 24-48 hours +- **Initial response**: Within 72 hours +- **Resolution**: Within 1-2 weeks (depending on complexity) + +## Investigation Process + +### πŸ“Š Fair and Thorough + +When a report is made, the maintainers will: + +1. **Acknowledge receipt** of the report promptly +2. **Review the information** provided and gather additional context if needed +3. **Interview relevant parties** separately and confidentially +4. **Consult with other maintainers** for serious incidents +5. **Determine appropriate action** based on evidence and impact +6. **Communicate the outcome** to relevant parties +7. **Follow up** to ensure the resolution is effective + +### βš–οΈ Due Process + +All individuals involved will be given: + +- **Notice** of the allegations +- **Opportunity to respond** and provide their perspective +- **Fair consideration** of all evidence +- **Appropriate representation** if needed for serious cases + +## Appeal Process + +### πŸ“ Right to Appeal + +Anyone who receives an enforcement action has the right to appeal the decision by: + +1. **Sending an email** to the maintainers within 30 days +2. **Providing new evidence** or clarification +3. **Requesting reconsideration** of the penalty + +### πŸ”„ Appeal Review + +Appeals will be reviewed by maintainers who were not involved in the original decision when possible. The appeal process aims to be fair and transparent while maintaining community safety. + +## Support Resources + +### πŸ†˜ Getting Help + +If you're experiencing harassment or need support: + +- **Talk to a trusted friend** or community member +- **Contact project maintainers** for assistance +- **Use platform reporting tools** (GitHub, Discord, etc.) +- **Seek professional help** if needed + +### 🌐 External Resources + +- **Crisis Text Line**: Text HOME to 741741 +- **National Suicide Prevention Lifeline**: 988 +- **LGBTQ National Hotline**: 1-888-843-4564 +- **RAINN National Sexual Assault Hotline**: 1-800-656-4673 + +## Community Guidelines + +### 🀝 Building Inclusive Spaces + +To create a welcoming environment: + +- **Use inclusive language** that welcomes all community members +- **Respect pronouns** and chosen names +- **Be mindful of cultural differences** and time zones +- **Provide context** for jokes, idioms, or cultural references +- **Offer help** to newcomers and those asking questions +- **Celebrate diversity** and different perspectives + +### πŸ’¬ Communication Best Practices + +- **Be patient** with people learning or asking questions +- **Assume good intent** when interpreting messages +- **Ask for clarification** if something is unclear +- **Use constructive language** when providing feedback +- **Take breaks** if discussions become heated +- **Move to private channels** for lengthy off-topic discussions + +### 🎯 Technical Discussions + +- **Focus on technical merit** rather than personal attributes +- **Provide evidence** for technical claims +- **Acknowledge uncertainty** when you're not sure +- **Credit others** for their ideas and work +- **Separate technical disagreements** from personal conflicts + +## Maintenance and Updates + +### πŸ“… Regular Review + +This Code of Conduct will be reviewed annually by the maintainers to ensure it remains effective and relevant to our community. + +### πŸ”„ Version History + +- **Version 1.0** (2024): Initial Code of Conduct based on Contributor Covenant 2.1 + +### πŸ“ Feedback + +We welcome feedback on this Code of Conduct. Please: + +- **Open an issue** for suggestions or improvements +- **Start a discussion** about community standards +- **Contact maintainers** with private feedback + +## Acknowledgments + +This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html](https://www.contributor-covenant.org/version/2/1/code_of_conduct.html). + +Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity). + +For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). Translations are available at [https://www.contributor-covenant.org/translations](https://www.contributor-covenant.org/translations). + +## Contact Information + +**Project Maintainer**: [Mohamed Hisham Abdelzaher](mailto:mohamed.hisham.abdelzaher@gmail.com) + +**Repository**: [https://github.com/AlphaSphereDotAI/vocalizr](https://github.com/AlphaSphereDotAI/vocalizr) + +**Community Discussions**: [https://github.com/AlphaSphereDotAI/vocalizr/discussions](https://github.com/AlphaSphereDotAI/vocalizr/discussions) + +--- + +**Remember**: A welcoming and inclusive community benefits everyone. Thank you for helping make Vocalizr a positive space for all contributors! 🌟 \ No newline at end of file diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md new file mode 100644 index 00000000..9c305e93 --- /dev/null +++ b/docs/CONFIGURATION.md @@ -0,0 +1,857 @@ +# βš™οΈ Configuration Guide + +Comprehensive guide to configuring Vocalizr for different environments and use cases. + +## Table of Contents + +- [Environment Variables](#environment-variables) +- [Configuration Files](#configuration-files) +- [Runtime Configuration](#runtime-configuration) +- [Hardware Configuration](#hardware-configuration) +- [Deployment Configurations](#deployment-configurations) +- [Security Configuration](#security-configuration) +- [Logging Configuration](#logging-configuration) +- [Performance Tuning](#performance-tuning) + +## Environment Variables + +### Core Configuration + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `GRADIO_SERVER_NAME` | string | `localhost` | Server hostname/IP address | +| `GRADIO_SERVER_PORT` | integer | `7860` | Server port number | +| `DEBUG` | boolean | `True` | Enable debug mode | + +### File Paths + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `HF_HOME` | string | `~/.cache/huggingface` | Hugging Face cache directory | +| `VOCALIZR_RESULTS_DIR` | string | `./results` | Audio output directory | +| `VOCALIZR_LOG_DIR` | string | `./logs` | Log file directory | + +### Model Configuration + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `VOCALIZR_MODEL_REPO` | string | `hexgrad/Kokoro-82M` | Hugging Face model repository | +| `VOCALIZR_LANG_CODE` | string | `a` | Language code for the model | +| `HF_TOKEN` | string | - | Hugging Face API token (if required) | + +### Hardware Configuration + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `CUDA_VISIBLE_DEVICES` | string | - | Specific GPU devices to use | +| `TORCH_DEVICE` | string | `auto` | PyTorch device (cpu/cuda/auto) | + +### Examples + +#### Development Environment + +```bash +# .env file for development +GRADIO_SERVER_NAME=localhost +GRADIO_SERVER_PORT=7860 +DEBUG=true +HF_HOME=/tmp/huggingface_cache +VOCALIZR_RESULTS_DIR=./dev_results +VOCALIZR_LOG_DIR=./dev_logs +``` + +#### Production Environment + +```bash +# .env file for production +GRADIO_SERVER_NAME=0.0.0.0 +GRADIO_SERVER_PORT=80 +DEBUG=false +HF_HOME=/app/cache +VOCALIZR_RESULTS_DIR=/app/results +VOCALIZR_LOG_DIR=/app/logs +HF_TOKEN=your_production_token +``` + +#### Docker Environment + +```bash +# Docker environment variables +GRADIO_SERVER_NAME=0.0.0.0 +GRADIO_SERVER_PORT=7860 +HF_HOME=/home/nonroot/hf +DEBUG=false +``` + +## Configuration Files + +### .env File + +Create a `.env` file in your project root: + +```bash +# Server Configuration +GRADIO_SERVER_NAME=localhost +GRADIO_SERVER_PORT=7860 +DEBUG=false + +# Model Configuration +VOCALIZR_MODEL_REPO=hexgrad/Kokoro-82M +VOCALIZR_LANG_CODE=a + +# File Paths +HF_HOME=/path/to/cache +VOCALIZR_RESULTS_DIR=/path/to/results +VOCALIZR_LOG_DIR=/path/to/logs + +# Authentication (if needed) +HF_TOKEN=your_huggingface_token + +# Hardware +CUDA_VISIBLE_DEVICES=0,1 +``` + +### Python Configuration + +Create a `config.py` file: + +```python +import os +from pathlib import Path + +class VocalizrConfig: + """Vocalizr configuration class.""" + + # Server settings + SERVER_NAME = os.getenv('GRADIO_SERVER_NAME', 'localhost') + SERVER_PORT = int(os.getenv('GRADIO_SERVER_PORT', '7860')) + DEBUG = os.getenv('DEBUG', 'True').lower() == 'true' + + # Model settings + MODEL_REPO = os.getenv('VOCALIZR_MODEL_REPO', 'hexgrad/Kokoro-82M') + LANG_CODE = os.getenv('VOCALIZR_LANG_CODE', 'a') + + # Paths + BASE_DIR = Path.cwd() + RESULTS_DIR = Path(os.getenv('VOCALIZR_RESULTS_DIR', BASE_DIR / 'results')) + LOG_DIR = Path(os.getenv('VOCALIZR_LOG_DIR', BASE_DIR / 'logs')) + HF_HOME = Path(os.getenv('HF_HOME', Path.home() / '.cache' / 'huggingface')) + + # Authentication + HF_TOKEN = os.getenv('HF_TOKEN') + + # Hardware + CUDA_DEVICES = os.getenv('CUDA_VISIBLE_DEVICES') + TORCH_DEVICE = os.getenv('TORCH_DEVICE', 'auto') + + @classmethod + def create_directories(cls): + """Create necessary directories.""" + cls.RESULTS_DIR.mkdir(exist_ok=True) + cls.LOG_DIR.mkdir(exist_ok=True) + cls.HF_HOME.mkdir(parents=True, exist_ok=True) + +# Usage +config = VocalizrConfig() +config.create_directories() +``` + +### YAML Configuration + +Create a `config.yaml` file: + +```yaml +# Vocalizr Configuration + +server: + name: localhost + port: 7860 + debug: true + +model: + repository: hexgrad/Kokoro-82M + language_code: a + cache_dir: ~/.cache/huggingface + +paths: + results: ./results + logs: ./logs + cache: ~/.cache/vocalizr + +hardware: + use_cuda: auto + visible_devices: null + device: auto + +logging: + level: INFO + format: "{time:YYYY-MM-DD at HH:mm:ss} | {level} | {message}" + colorize: true + file_rotation: "1 day" + file_retention: "7 days" + +performance: + max_concurrent_generations: 5 + memory_limit: 8GB + timeout: 300 + +security: + enable_auth: false + username: null + password: null + ssl_enabled: false + ssl_cert: null + ssl_key: null +``` + +Load YAML configuration: + +```python +import yaml +from pathlib import Path + +def load_yaml_config(config_path="config.yaml"): + """Load configuration from YAML file.""" + config_file = Path(config_path) + + if config_file.exists(): + with open(config_file, 'r') as f: + config = yaml.safe_load(f) + return config + else: + raise FileNotFoundError(f"Configuration file {config_path} not found") + +# Usage +config = load_yaml_config() +server_port = config['server']['port'] +``` + +## Runtime Configuration + +### Programmatic Configuration + +```python +import os +from vocalizr import PIPELINE + +def configure_vocalizr(): + """Configure Vocalizr at runtime.""" + + # Modify environment variables + os.environ['DEBUG'] = 'false' + os.environ['GRADIO_SERVER_PORT'] = '8080' + + # Configure pipeline + PIPELINE.speed = 1.2 # Default speed + PIPELINE.device = 'cuda' if torch.cuda.is_available() else 'cpu' + + # Set up logging + from loguru import logger + logger.remove() # Remove default handler + logger.add( + "./logs/vocalizr.log", + rotation="1 day", + retention="1 week", + level="INFO" + ) + +configure_vocalizr() +``` + +### Dynamic Configuration + +```python +class DynamicConfig: + """Dynamic configuration that can be updated at runtime.""" + + def __init__(self): + self._config = { + 'default_voice': 'af_heart', + 'default_speed': 1.0, + 'max_text_length': 1000, + 'auto_save': False, + 'debug_mode': False + } + + def get(self, key, default=None): + return self._config.get(key, default) + + def set(self, key, value): + self._config[key] = value + print(f"Configuration updated: {key} = {value}") + + def update(self, **kwargs): + self._config.update(kwargs) + print(f"Configuration updated: {kwargs}") + + def reset(self): + """Reset to default configuration.""" + self.__init__() + +# Global configuration instance +config = DynamicConfig() + +# Usage +config.set('default_voice', 'bf_emma') +config.update(default_speed=1.5, auto_save=True) +``` + +## Hardware Configuration + +### GPU Configuration + +```bash +# Use specific GPU devices +export CUDA_VISIBLE_DEVICES=0,1 + +# Force CPU usage +export CUDA_VISIBLE_DEVICES="" + +# Set GPU memory fraction +export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 +``` + +Python GPU configuration: + +```python +import torch +import os + +def configure_gpu(): + """Configure GPU settings.""" + + if torch.cuda.is_available(): + # Set memory allocation strategy + torch.cuda.set_per_process_memory_fraction(0.8) + + # Enable memory mapping for large models + torch.backends.cuda.matmul.allow_tf32 = True + torch.backends.cudnn.allow_tf32 = True + + # Print GPU info + print(f"CUDA devices available: {torch.cuda.device_count()}") + for i in range(torch.cuda.device_count()): + props = torch.cuda.get_device_properties(i) + print(f"Device {i}: {props.name} ({props.total_memory // 1024**2} MB)") + + else: + print("CUDA not available, using CPU") + +configure_gpu() +``` + +### Memory Configuration + +```python +import psutil +import gc + +def configure_memory(): + """Configure memory settings.""" + + # Get system memory info + memory = psutil.virtual_memory() + available_gb = memory.available / (1024**3) + + print(f"Available memory: {available_gb:.1f} GB") + + # Configure based on available memory + if available_gb < 4: + print("Low memory detected, using conservative settings") + os.environ['VOCALIZR_MAX_CONCURRENT'] = '1' + os.environ['VOCALIZR_BATCH_SIZE'] = '1' + elif available_gb < 8: + print("Medium memory detected, using balanced settings") + os.environ['VOCALIZR_MAX_CONCURRENT'] = '3' + os.environ['VOCALIZR_BATCH_SIZE'] = '2' + else: + print("High memory detected, using optimal settings") + os.environ['VOCALIZR_MAX_CONCURRENT'] = '5' + os.environ['VOCALIZR_BATCH_SIZE'] = '4' + + # Enable aggressive garbage collection + gc.set_threshold(700, 10, 10) + +configure_memory() +``` + +## Deployment Configurations + +### Docker Configuration + +#### Dockerfile Configuration + +```dockerfile +# Set environment variables +ENV GRADIO_SERVER_PORT=7860 \ + GRADIO_SERVER_NAME=0.0.0.0 \ + HF_HOME=/home/nonroot/hf \ + DEBUG=false \ + PYTHONUNBUFFERED=1 + +# Configure cache directory +RUN mkdir -p /home/nonroot/hf && \ + chown -R nonroot:nonroot /home/nonroot/hf +``` + +#### Docker Compose Configuration + +```yaml +version: '3.8' + +services: + vocalizr: + image: ghcr.io/alphaspheredotai/vocalizr:latest + container_name: vocalizr + ports: + - "7860:7860" + environment: + - GRADIO_SERVER_NAME=0.0.0.0 + - GRADIO_SERVER_PORT=7860 + - DEBUG=false + - HF_HOME=/app/cache + volumes: + - ./cache:/app/cache + - ./results:/app/results + - ./logs:/app/logs + restart: unless-stopped + deploy: + resources: + limits: + memory: 8G + cpus: '4.0' + reservations: + memory: 4G + cpus: '2.0' + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:7860/health"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 60s + + # Optional: Redis for caching + redis: + image: redis:alpine + container_name: vocalizr-redis + ports: + - "6379:6379" + volumes: + - redis_data:/data + restart: unless-stopped + +volumes: + redis_data: +``` + +### Kubernetes Configuration + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: vocalizr + labels: + app: vocalizr +spec: + replicas: 3 + selector: + matchLabels: + app: vocalizr + template: + metadata: + labels: + app: vocalizr + spec: + containers: + - name: vocalizr + image: ghcr.io/alphaspheredotai/vocalizr:latest + ports: + - containerPort: 7860 + env: + - name: GRADIO_SERVER_NAME + value: "0.0.0.0" + - name: GRADIO_SERVER_PORT + value: "7860" + - name: DEBUG + value: "false" + - name: HF_HOME + value: "/app/cache" + resources: + requests: + memory: "4Gi" + cpu: "2" + limits: + memory: "8Gi" + cpu: "4" + volumeMounts: + - name: cache-volume + mountPath: /app/cache + - name: results-volume + mountPath: /app/results + livenessProbe: + httpGet: + path: /health + port: 7860 + initialDelaySeconds: 60 + periodSeconds: 30 + readinessProbe: + httpGet: + path: /health + port: 7860 + initialDelaySeconds: 30 + periodSeconds: 10 + volumes: + - name: cache-volume + persistentVolumeClaim: + claimName: vocalizr-cache-pvc + - name: results-volume + persistentVolumeClaim: + claimName: vocalizr-results-pvc + +--- +apiVersion: v1 +kind: Service +metadata: + name: vocalizr-service +spec: + selector: + app: vocalizr + ports: + - protocol: TCP + port: 80 + targetPort: 7860 + type: LoadBalancer +``` + +## Security Configuration + +### Authentication + +```python +def setup_authentication(): + """Configure authentication for Gradio app.""" + + import gradio as gr + from vocalizr.gui import app_block + + # Basic authentication + auth = None + if os.getenv('VOCALIZR_AUTH_ENABLED', 'false').lower() == 'true': + username = os.getenv('VOCALIZR_USERNAME') + password = os.getenv('VOCALIZR_PASSWORD') + if username and password: + auth = (username, password) + + # SSL configuration + ssl_keyfile = os.getenv('VOCALIZR_SSL_KEY') + ssl_certfile = os.getenv('VOCALIZR_SSL_CERT') + + app = app_block() + app.launch( + auth=auth, + ssl_keyfile=ssl_keyfile, + ssl_certfile=ssl_certfile, + share=False, # Don't create public links + enable_queue=True + ) +``` + +### Rate Limiting + +```python +import time +from collections import defaultdict +from functools import wraps + +class RateLimiter: + def __init__(self, max_requests=10, time_window=60): + self.max_requests = max_requests + self.time_window = time_window + self.requests = defaultdict(list) + + def is_allowed(self, client_id): + now = time.time() + + # Clean old requests + self.requests[client_id] = [ + req_time for req_time in self.requests[client_id] + if now - req_time < self.time_window + ] + + # Check if under limit + if len(self.requests[client_id]) < self.max_requests: + self.requests[client_id].append(now) + return True + + return False + +rate_limiter = RateLimiter(max_requests=10, time_window=60) + +def rate_limit_decorator(func): + @wraps(func) + def wrapper(*args, **kwargs): + client_id = kwargs.get('client_id', 'default') + + if not rate_limiter.is_allowed(client_id): + raise Exception("Rate limit exceeded") + + return func(*args, **kwargs) + + return wrapper +``` + +## Logging Configuration + +### Advanced Logging Setup + +```python +from loguru import logger +import sys +import os + +def setup_logging(): + """Configure comprehensive logging.""" + + # Remove default handler + logger.remove() + + # Console handler (for development) + if os.getenv('DEBUG', 'false').lower() == 'true': + logger.add( + sys.stdout, + format="{time:YYYY-MM-DD HH:mm:ss} | " + "{level: <8} | " + "{name}:{function}:" + "{line} - {message}", + level="DEBUG", + colorize=True + ) + + # File handler (for production) + log_dir = os.getenv('VOCALIZR_LOG_DIR', './logs') + logger.add( + f"{log_dir}/vocalizr.log", + format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | " + "{name}:{function}:{line} - {message}", + level="INFO", + rotation="10 MB", + retention="1 month", + compression="gz" + ) + + # Error handler (separate file for errors) + logger.add( + f"{log_dir}/errors.log", + format="{time:YYYY-MM-DD HH:mm:ss} | {level: <8} | " + "{name}:{function}:{line} - {message}", + level="ERROR", + rotation="5 MB", + retention="3 months" + ) + + # Performance monitoring + if os.getenv('VOCALIZR_PERF_LOGGING', 'false').lower() == 'true': + logger.add( + f"{log_dir}/performance.log", + format="{time:YYYY-MM-DD HH:mm:ss} | {message}", + filter=lambda record: "PERF" in record["extra"], + rotation="1 day", + retention="1 week" + ) + +setup_logging() +``` + +### Structured Logging + +```python +import json +from datetime import datetime + +class StructuredLogger: + def __init__(self): + self.logger = logger + + def log_generation(self, text, voice, duration, success=True, error=None): + """Log audio generation events.""" + log_data = { + "event": "audio_generation", + "timestamp": datetime.utcnow().isoformat(), + "text_length": len(text), + "voice": voice, + "duration_seconds": duration, + "success": success, + "error": str(error) if error else None + } + + if success: + self.logger.info(f"GENERATION_SUCCESS: {json.dumps(log_data)}") + else: + self.logger.error(f"GENERATION_FAILED: {json.dumps(log_data)}") + + def log_performance(self, metric_name, value, unit="seconds"): + """Log performance metrics.""" + log_data = { + "event": "performance_metric", + "timestamp": datetime.utcnow().isoformat(), + "metric": metric_name, + "value": value, + "unit": unit + } + + self.logger.bind(PERF=True).info(json.dumps(log_data)) + +# Usage +structured_logger = StructuredLogger() +``` + +## Performance Tuning + +### Memory Optimization + +```python +import gc +import torch +from functools import wraps + +def memory_monitor(func): + """Decorator to monitor memory usage.""" + @wraps(func) + def wrapper(*args, **kwargs): + if torch.cuda.is_available(): + torch.cuda.reset_peak_memory_stats() + start_memory = torch.cuda.memory_allocated() + + result = func(*args, **kwargs) + + if torch.cuda.is_available(): + end_memory = torch.cuda.memory_allocated() + peak_memory = torch.cuda.max_memory_allocated() + print(f"Memory used: {(end_memory - start_memory) / 1024**2:.1f} MB") + print(f"Peak memory: {peak_memory / 1024**2:.1f} MB") + + return result + + return wrapper + +def optimize_memory(): + """Optimize memory settings.""" + + # Enable memory-efficient attention + if torch.cuda.is_available(): + torch.backends.cuda.enable_flash_sdp(True) + + # Set garbage collection thresholds + gc.set_threshold(700, 10, 10) + + # Configure PyTorch memory allocation + if torch.cuda.is_available(): + torch.cuda.empty_cache() + torch.cuda.set_per_process_memory_fraction(0.8) +``` + +### Concurrent Processing + +```python +import asyncio +import concurrent.futures +from typing import List, Tuple + +class ConcurrentProcessor: + def __init__(self, max_workers=4): + self.max_workers = max_workers + self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) + + async def process_batch(self, texts: List[str], voice: str = "af_heart"): + """Process multiple texts concurrently.""" + loop = asyncio.get_event_loop() + + # Create tasks for each text + tasks = [] + for text in texts: + task = loop.run_in_executor( + self.executor, + self._generate_sync, + text, + voice + ) + tasks.append(task) + + # Wait for all tasks to complete + results = await asyncio.gather(*tasks, return_exceptions=True) + return results + + def _generate_sync(self, text: str, voice: str): + """Synchronous generation for thread executor.""" + audio_chunks = [] + for sr, audio in generate_audio_for_text(text=text, voice=voice): + audio_chunks.append((sr, audio)) + return audio_chunks + +# Usage +processor = ConcurrentProcessor(max_workers=4) +results = asyncio.run(processor.process_batch(["Hello", "World", "Test"])) +``` + +### Configuration Validation + +```python +import os +from typing import Any, Dict + +class ConfigValidator: + """Validate configuration settings.""" + + REQUIRED_VARS = ['GRADIO_SERVER_NAME', 'GRADIO_SERVER_PORT'] + TYPE_MAPPING = { + 'GRADIO_SERVER_PORT': int, + 'DEBUG': lambda x: x.lower() == 'true' + } + + @classmethod + def validate(cls) -> Dict[str, Any]: + """Validate current configuration.""" + errors = [] + config = {} + + # Check required variables + for var in cls.REQUIRED_VARS: + value = os.getenv(var) + if value is None: + errors.append(f"Required environment variable {var} not set") + else: + # Type conversion + if var in cls.TYPE_MAPPING: + try: + value = cls.TYPE_MAPPING[var](value) + except (ValueError, TypeError) as e: + errors.append(f"Invalid type for {var}: {e}") + + config[var] = value + + # Additional validation + port = config.get('GRADIO_SERVER_PORT') + if port and (port < 1 or port > 65535): + errors.append(f"Invalid port number: {port}") + + if errors: + raise ValueError(f"Configuration validation failed: {', '.join(errors)}") + + return config + +# Validate configuration on startup +try: + validated_config = ConfigValidator.validate() + print("Configuration validation passed") +except ValueError as e: + print(f"Configuration error: {e}") + exit(1) +``` + +## Next Steps + +- Review [Development Guide](DEVELOPMENT.md) for development-specific configuration +- Check [Deployment Guide](DEPLOYMENT.md) for production deployment settings +- See [Troubleshooting](TROUBLESHOOTING.md) for configuration-related issues +- Explore [API Documentation](API.md) for programmatic configuration options \ No newline at end of file diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md new file mode 100644 index 00000000..3332e64c --- /dev/null +++ b/docs/CONTRIBUTING.md @@ -0,0 +1,571 @@ +# 🀝 Contributing to Vocalizr + +Thank you for your interest in contributing to Vocalizr! This guide will help you get started with contributing to the project. + +## Table of Contents + +- [Code of Conduct](#code-of-conduct) +- [Getting Started](#getting-started) +- [How to Contribute](#how-to-contribute) +- [Development Setup](#development-setup) +- [Coding Standards](#coding-standards) +- [Submitting Changes](#submitting-changes) +- [Review Process](#review-process) +- [Community Guidelines](#community-guidelines) + +## Code of Conduct + +By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md). Please read it before contributing. + +### Summary + +- **Be respectful** and inclusive +- **Be collaborative** and constructive +- **Be mindful** of your words and actions +- **Be helpful** to newcomers and experienced contributors alike + +## Getting Started + +### Ways to Contribute + +We welcome various types of contributions: + +- πŸ› **Bug reports** - Help us identify and fix issues +- πŸ’‘ **Feature requests** - Suggest new functionality +- πŸ“– **Documentation** - Improve or add documentation +- πŸ§ͺ **Testing** - Add or improve tests +- πŸ”§ **Code contributions** - Fix bugs or implement features +- 🎨 **Design** - Improve UI/UX of the web interface +- 🌍 **Localization** - Add support for new languages/voices +- πŸ“ **Examples** - Create tutorials and examples + +### Prerequisites + +Before contributing, ensure you have: + +- **Git** installed and configured +- **Python 3.12+** installed +- **Basic knowledge** of Python and web development +- **Familiarity** with the project structure (see [Development Guide](docs/DEVELOPMENT.md)) + +## How to Contribute + +### 1. Find an Issue + +Start by looking at our [issues page](https://github.com/AlphaSphereDotAI/vocalizr/issues): + +- 🏷️ **Good first issue** - Perfect for newcomers +- πŸ†˜ **Help wanted** - Issues where we need assistance +- πŸ› **Bug** - Confirmed bugs that need fixing +- ✨ **Enhancement** - New features or improvements + +### 2. Fork the Repository + +1. Click the "Fork" button on the [repository page](https://github.com/AlphaSphereDotAI/vocalizr) +2. Clone your fork locally: + ```bash + git clone https://github.com/YOUR_USERNAME/vocalizr.git + cd vocalizr + ``` + +### 3. Create a Branch + +Create a descriptive branch name: + +```bash +# For bug fixes +git checkout -b fix/memory-leak-in-audio-generation + +# For new features +git checkout -b feature/add-batch-processing + +# For documentation +git checkout -b docs/improve-api-documentation +``` + +### Branch Naming Conventions + +- `feature/description` - New features +- `fix/description` - Bug fixes +- `docs/description` - Documentation changes +- `test/description` - Test additions/improvements +- `refactor/description` - Code refactoring +- `chore/description` - Maintenance tasks + +## Development Setup + +### 1. Set Up Environment + +```bash +# Create virtual environment +python -m venv venv +source venv/bin/activate # Linux/macOS +# or +venv\Scripts\activate # Windows + +# Install development dependencies +pip install -e ".[dev]" + +# Or with uv +uv sync --group dev +``` + +### 2. Install Pre-commit Hooks + +```bash +# Install pre-commit +pip install pre-commit + +# Set up git hooks +pre-commit install +``` + +### 3. Verify Setup + +```bash +# Run tests +pytest + +# Check code quality +ruff check src/ +ruff format src/ + +# Start the application +python -m vocalizr +``` + +## Coding Standards + +### Code Style + +We use **Ruff** for code formatting and linting: + +```bash +# Format code +ruff format src/ + +# Check for issues +ruff check src/ + +# Fix auto-fixable issues +ruff check --fix src/ +``` + +### Type Hints + +All functions should include comprehensive type hints: + +```python +from typing import Generator, Literal, Any +from numpy import ndarray, dtype, float32 + +def generate_audio_for_text( + text: str, + voice: str = "af_heart", + speed: float = 1.0, + save_file: bool = False, + debug: bool = False, + char_limit: int = -1, +) -> Generator[ + tuple[Literal[24000], ndarray[tuple[float32], dtype[float32]]], + Any, + None, +]: + """Generate audio from text with proper type annotations.""" + # Implementation here +``` + +### Documentation + +#### Docstring Format + +Use Google-style docstrings: + +```python +def complex_function( + param1: str, + param2: int, + param3: bool = False +) -> tuple[str, int]: + """ + Brief description of what the function does. + + Longer description with more details about the function's purpose, + algorithms used, or any important considerations. + + Args: + param1: Description of the first parameter. + param2: Description of the second parameter. + param3: Description of the optional parameter. Defaults to False. + + Returns: + A tuple containing: + - str: Description of first return value + - int: Description of second return value + + Raises: + ValueError: If param1 is empty. + RuntimeError: If param2 is negative. + + Example: + >>> result = complex_function("hello", 42) + >>> print(result) + ('hello_processed', 42) + """ + if not param1: + raise ValueError("param1 cannot be empty") + + if param2 < 0: + raise RuntimeError("param2 must be non-negative") + + return f"{param1}_processed", param2 +``` + +#### Code Comments + +- Use comments to explain **why**, not what +- Keep comments concise and up-to-date +- Use TODO comments for future improvements + +```python +# Good: Explains why +# Use exponential backoff to handle temporary API failures +retry_delay *= 2 + +# Bad: Explains what (obvious from code) +# Multiply retry_delay by 2 +retry_delay *= 2 +``` + +### Testing + +#### Test Structure + +```python +import pytest +from unittest.mock import patch, MagicMock +from vocalizr.model import generate_audio_for_text + +class TestGenerateAudioForText: + """Test suite for the generate_audio_for_text function.""" + + def test_basic_generation(self): + """Test basic audio generation functionality.""" + # Arrange + text = "Hello, world!" + expected_voice = "af_heart" + + # Act + with patch('vocalizr.model.PIPELINE') as mock_pipeline: + mock_pipeline.return_value = [(None, None, np.array([0.1, 0.2]))] + results = list(generate_audio_for_text(text, voice=expected_voice)) + + # Assert + assert len(results) > 0 + sample_rate, audio = results[0] + assert sample_rate == 24000 + assert isinstance(audio, np.ndarray) + + @pytest.mark.parametrize("invalid_text", ["", " ", "abc"]) + def test_invalid_text_input(self, invalid_text): + """Test handling of invalid text inputs.""" + with pytest.raises(Exception): + list(generate_audio_for_text(invalid_text)) + + def test_voice_parameter_validation(self): + """Test that voice parameter is properly validated.""" + # Test with valid voice + with patch('vocalizr.model.PIPELINE'): + list(generate_audio_for_text("test", voice="af_heart")) + + # Test with invalid voice should still work (handled gracefully) + with patch('vocalizr.model.PIPELINE'): + list(generate_audio_for_text("test", voice="invalid_voice")) +``` + +#### Test Guidelines + +- **Arrange-Act-Assert** pattern +- **Descriptive test names** that explain what is being tested +- **Parameterized tests** for multiple input scenarios +- **Mock external dependencies** (network calls, file system) +- **Test edge cases** and error conditions + +### Performance Considerations + +- **Memory efficiency**: Clean up resources after use +- **CPU optimization**: Use efficient algorithms +- **GPU utilization**: Leverage CUDA when available +- **Caching**: Implement appropriate caching strategies + +```python +import gc +import torch + +def memory_efficient_function(): + """Example of memory-efficient implementation.""" + try: + # Main logic here + result = process_data() + return result + finally: + # Clean up resources + gc.collect() + if torch.cuda.is_available(): + torch.cuda.empty_cache() +``` + +## Submitting Changes + +### 1. Commit Guidelines + +#### Commit Message Format + +Use [Conventional Commits](https://www.conventionalcommits.org/) format: + +``` +type(scope): brief description + +Longer description explaining the change in more detail. +Include motivation for the change and contrast with previous behavior. + +Fixes #123 +``` + +#### Commit Types + +- `feat`: New feature +- `fix`: Bug fix +- `docs`: Documentation changes +- `style`: Code style changes (formatting, etc.) +- `refactor`: Code refactoring +- `test`: Adding or updating tests +- `chore`: Maintenance tasks +- `perf`: Performance improvements +- `ci`: CI/CD changes + +#### Examples + +```bash +# Good commit messages +git commit -m "feat(api): add batch processing endpoint" +git commit -m "fix(model): resolve memory leak in audio generation" +git commit -m "docs(readme): update installation instructions" + +# Bad commit messages +git commit -m "fix stuff" +git commit -m "update code" +git commit -m "changes" +``` + +### 2. Pull Request Process + +#### Before Submitting + +- [ ] All tests pass locally +- [ ] Code follows style guidelines +- [ ] Documentation is updated +- [ ] No breaking changes (or properly documented) +- [ ] Branch is up-to-date with main + +#### Pull Request Template + +When creating a PR, use this template: + +```markdown +## Description + +Brief description of the changes made. + +## Type of Change + +- [ ] Bug fix (non-breaking change which fixes an issue) +- [ ] New feature (non-breaking change which adds functionality) +- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) +- [ ] Documentation update +- [ ] Performance improvement +- [ ] Code refactoring + +## Related Issues + +Fixes #(issue number) +Related to #(issue number) + +## Testing + +- [ ] Unit tests added/updated +- [ ] Integration tests added/updated +- [ ] Manual testing completed +- [ ] All tests pass + +## Screenshots (if applicable) + +Add screenshots to help explain your changes. + +## Checklist + +- [ ] My code follows the style guidelines of this project +- [ ] I have performed a self-review of my own code +- [ ] I have commented my code, particularly in hard-to-understand areas +- [ ] I have made corresponding changes to the documentation +- [ ] My changes generate no new warnings +- [ ] I have added tests that prove my fix is effective or that my feature works +- [ ] New and existing unit tests pass locally with my changes +- [ ] Any dependent changes have been merged and published + +## Additional Notes + +Any additional information or context about the changes. +``` + +### 3. Draft Pull Requests + +Use draft PRs for: + +- **Work in progress** - Getting early feedback +- **Large changes** - Breaking down into smaller reviews +- **Experimental features** - Testing ideas + +## Review Process + +### What Reviewers Look For + +#### Code Quality +- **Correctness**: Does the code work as intended? +- **Readability**: Is the code easy to understand? +- **Maintainability**: Can the code be easily modified? +- **Performance**: Are there any performance issues? + +#### Design +- **Architecture**: Does the change fit the overall design? +- **API design**: Are new APIs well-designed and consistent? +- **Error handling**: Are errors handled appropriately? +- **Edge cases**: Are edge cases considered? + +#### Testing +- **Coverage**: Are all code paths tested? +- **Quality**: Are tests well-written and reliable? +- **Integration**: Do tests work with the existing test suite? + +### Responding to Feedback + +#### How to Address Review Comments + +1. **Read carefully** - Understand the feedback +2. **Ask questions** - If something is unclear +3. **Make changes** - Address the feedback +4. **Respond** - Let reviewers know what you've changed +5. **Be patient** - Reviews take time + +#### Example Response + +```markdown +Thanks for the review! I've addressed your comments: + +1. **Memory leak in audio generation**: Fixed by adding proper cleanup in the finally block (commit abc123) +2. **Missing error handling**: Added try-catch for network errors (commit def456) +3. **Documentation**: Updated the docstring with examples (commit ghi789) + +The failing test was due to a missing mock - fixed in commit jkl012. + +Ready for another review! +``` + +### Review Timeline + +- **Initial response**: Within 48 hours +- **Complete review**: Within 1 week +- **Follow-up reviews**: Within 24-48 hours + +## Community Guidelines + +### Communication + +#### Be Respectful +- Use welcoming and inclusive language +- Be respectful of differing viewpoints +- Accept constructive criticism gracefully +- Focus on what is best for the community + +#### Be Helpful +- Help newcomers get started +- Share knowledge and resources +- Provide constructive feedback +- Be patient with questions + +### Issue Etiquette + +#### Reporting Bugs +- **Search first** - Check if the issue already exists +- **Use templates** - Fill out the bug report template +- **Provide details** - Include reproduction steps +- **Follow up** - Respond to questions from maintainers + +#### Feature Requests +- **Explain the use case** - Why is this needed? +- **Consider alternatives** - Are there existing solutions? +- **Be open to feedback** - The feature might need adjustments + +### Discussion Guidelines + +#### GitHub Discussions +- **Use appropriate categories** - Help organize discussions +- **Search before posting** - Avoid duplicate discussions +- **Stay on topic** - Keep discussions focused +- **Be constructive** - Provide helpful input + +#### Code Reviews +- **Be constructive** - Focus on improvement, not criticism +- **Explain reasoning** - Why should something change? +- **Suggest solutions** - Don't just point out problems +- **Acknowledge good work** - Recognize quality contributions + +### Recognition + +We recognize contributors in several ways: + +- **Contributors list** in README +- **Release notes** mentioning contributors +- **Special recognition** for significant contributions +- **Maintainer status** for ongoing contributors + +### Getting Help + +If you need help with contributing: + +1. **Check documentation** - Start with this guide and [Development Guide](docs/DEVELOPMENT.md) +2. **Search issues** - Someone might have had the same question +3. **Ask in discussions** - Use GitHub Discussions for questions +4. **Contact maintainers** - Reach out directly if needed + +### Maintainer Responsibilities + +Maintainers will: + +- **Respond promptly** to issues and PRs +- **Provide clear feedback** on contributions +- **Maintain quality standards** while being welcoming +- **Guide contributors** through the process +- **Make decisions** fairly and transparently + +### License + +By contributing to Vocalizr, you agree that your contributions will be licensed under the same license as the project (MIT License). + +## Thank You! πŸ™ + +We appreciate all contributions to Vocalizr, whether it's: + +- Reporting a bug +- Discussing the current state of the code +- Submitting a fix +- Proposing new features +- Becoming a maintainer + +Every contribution makes the project better for everyone. Thank you for being part of the Vocalizr community! + +--- + +**Questions?** Feel free to [open a discussion](https://github.com/AlphaSphereDotAI/vocalizr/discussions) or reach out to the maintainers. + +**New to open source?** Check out [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/) for more guidance. \ No newline at end of file diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md new file mode 100644 index 00000000..f1549cf4 --- /dev/null +++ b/docs/DEPLOYMENT.md @@ -0,0 +1,1309 @@ +# πŸš€ Deployment Guide + +Complete guide for deploying Vocalizr in production environments. + +## Table of Contents + +- [Deployment Overview](#deployment-overview) +- [Docker Deployment](#docker-deployment) +- [Kubernetes Deployment](#kubernetes-deployment) +- [Cloud Platforms](#cloud-platforms) +- [Load Balancing](#load-balancing) +- [Monitoring & Logging](#monitoring--logging) +- [Security Considerations](#security-considerations) +- [Performance Optimization](#performance-optimization) +- [Scaling Strategies](#scaling-strategies) + +## Deployment Overview + +### Architecture Patterns + +#### Single Instance Deployment +``` +Internet β†’ Load Balancer β†’ Vocalizr Instance +``` + +#### High Availability Deployment +``` +Internet β†’ Load Balancer β†’ [Vocalizr Instance 1] + β†’ [Vocalizr Instance 2] + β†’ [Vocalizr Instance N] +``` + +#### Microservices Architecture +``` +Internet β†’ API Gateway β†’ [Voice Service] + β†’ [Audio Processing Service] + β†’ [File Storage Service] +``` + +### Deployment Checklist + +- [ ] Environment configuration validated +- [ ] Resource requirements calculated +- [ ] Security measures implemented +- [ ] Monitoring and logging configured +- [ ] Backup and recovery procedures established +- [ ] Performance testing completed +- [ ] Health checks implemented +- [ ] Auto-scaling configured (if needed) + +## Docker Deployment + +### Basic Docker Deployment + +```bash +# Pull the latest image +docker pull ghcr.io/alphaspheredotai/vocalizr:latest + +# Run with basic configuration +docker run -d \ + --name vocalizr \ + -p 7860:7860 \ + -e GRADIO_SERVER_NAME=0.0.0.0 \ + -e DEBUG=false \ + ghcr.io/alphaspheredotai/vocalizr:latest +``` + +### Production Docker Configuration + +```bash +# Create necessary directories +mkdir -p /opt/vocalizr/{cache,results,logs,config} + +# Run with production settings +docker run -d \ + --name vocalizr-prod \ + --restart unless-stopped \ + -p 7860:7860 \ + -e GRADIO_SERVER_NAME=0.0.0.0 \ + -e GRADIO_SERVER_PORT=7860 \ + -e DEBUG=false \ + -e HF_HOME=/app/cache \ + -v /opt/vocalizr/cache:/app/cache \ + -v /opt/vocalizr/results:/app/results \ + -v /opt/vocalizr/logs:/app/logs \ + --memory=8g \ + --cpus=4 \ + --health-cmd="curl -f http://localhost:7860/health || exit 1" \ + --health-interval=30s \ + --health-timeout=10s \ + --health-retries=3 \ + ghcr.io/alphaspheredotai/vocalizr:latest +``` + +### Docker Compose for Production + +```yaml +# docker-compose.yml +version: '3.8' + +services: + vocalizr: + image: ghcr.io/alphaspheredotai/vocalizr:latest + container_name: vocalizr-app + restart: unless-stopped + ports: + - "7860:7860" + environment: + - GRADIO_SERVER_NAME=0.0.0.0 + - GRADIO_SERVER_PORT=7860 + - DEBUG=false + - HF_HOME=/app/cache + volumes: + - vocalizr_cache:/app/cache + - vocalizr_results:/app/results + - vocalizr_logs:/app/logs + - ./config:/app/config:ro + deploy: + resources: + limits: + memory: 8G + cpus: '4.0' + reservations: + memory: 4G + cpus: '2.0' + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:7860/health"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 60s + depends_on: + - redis + networks: + - vocalizr_network + + nginx: + image: nginx:alpine + container_name: vocalizr-nginx + restart: unless-stopped + ports: + - "80:80" + - "443:443" + volumes: + - ./nginx.conf:/etc/nginx/nginx.conf:ro + - ./ssl:/etc/nginx/ssl:ro + depends_on: + - vocalizr + networks: + - vocalizr_network + + redis: + image: redis:alpine + container_name: vocalizr-redis + restart: unless-stopped + ports: + - "6379:6379" + volumes: + - redis_data:/data + command: redis-server --appendonly yes + networks: + - vocalizr_network + + prometheus: + image: prom/prometheus + container_name: vocalizr-prometheus + restart: unless-stopped + ports: + - "9090:9090" + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro + - prometheus_data:/prometheus + networks: + - vocalizr_network + + grafana: + image: grafana/grafana + container_name: vocalizr-grafana + restart: unless-stopped + ports: + - "3000:3000" + environment: + - GF_SECURITY_ADMIN_PASSWORD=admin + volumes: + - grafana_data:/var/lib/grafana + networks: + - vocalizr_network + +volumes: + vocalizr_cache: + vocalizr_results: + vocalizr_logs: + redis_data: + prometheus_data: + grafana_data: + +networks: + vocalizr_network: + driver: bridge +``` + +### Nginx Configuration + +```nginx +# nginx.conf +events { + worker_connections 1024; +} + +http { + upstream vocalizr_backend { + server vocalizr:7860; + # Add more servers for load balancing + # server vocalizr2:7860; + # server vocalizr3:7860; + } + + # Rate limiting + limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; + limit_req_zone $binary_remote_addr zone=generate:10m rate=1r/s; + + server { + listen 80; + server_name your-domain.com; + + # Redirect HTTP to HTTPS + return 301 https://$server_name$request_uri; + } + + server { + listen 443 ssl http2; + server_name your-domain.com; + + # SSL Configuration + ssl_certificate /etc/nginx/ssl/cert.pem; + ssl_certificate_key /etc/nginx/ssl/key.pem; + ssl_protocols TLSv1.2 TLSv1.3; + ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512; + ssl_prefer_server_ciphers off; + + # Security headers + add_header X-Frame-Options DENY; + add_header X-Content-Type-Options nosniff; + add_header X-XSS-Protection "1; mode=block"; + add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"; + + # File upload limits + client_max_body_size 10M; + + # Proxy settings + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + # Main application + location / { + limit_req zone=api burst=20 nodelay; + proxy_pass http://vocalizr_backend; + proxy_read_timeout 300s; + proxy_connect_timeout 75s; + } + + # Audio generation endpoint (stricter rate limiting) + location /generate { + limit_req zone=generate burst=5 nodelay; + proxy_pass http://vocalizr_backend; + proxy_read_timeout 600s; # Longer timeout for generation + } + + # WebSocket support for Gradio + location /ws { + proxy_pass http://vocalizr_backend; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + } + + # Health check + location /health { + proxy_pass http://vocalizr_backend; + access_log off; + } + + # Static files (if any) + location /static/ { + alias /var/www/static/; + expires 1y; + add_header Cache-Control "public, immutable"; + } + } +} +``` + +## Kubernetes Deployment + +### Basic Kubernetes Manifests + +#### Namespace +```yaml +# namespace.yaml +apiVersion: v1 +kind: Namespace +metadata: + name: vocalizr +``` + +#### ConfigMap +```yaml +# configmap.yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: vocalizr-config + namespace: vocalizr +data: + GRADIO_SERVER_NAME: "0.0.0.0" + GRADIO_SERVER_PORT: "7860" + DEBUG: "false" + HF_HOME: "/app/cache" +``` + +#### Secret +```yaml +# secret.yaml +apiVersion: v1 +kind: Secret +metadata: + name: vocalizr-secrets + namespace: vocalizr +type: Opaque +data: + HF_TOKEN: +``` + +#### Persistent Volume Claims +```yaml +# pvc.yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: vocalizr-cache-pvc + namespace: vocalizr +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 50Gi +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: vocalizr-results-pvc + namespace: vocalizr +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 100Gi +``` + +#### Deployment +```yaml +# deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: vocalizr + namespace: vocalizr + labels: + app: vocalizr +spec: + replicas: 3 + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 1 + selector: + matchLabels: + app: vocalizr + template: + metadata: + labels: + app: vocalizr + spec: + containers: + - name: vocalizr + image: ghcr.io/alphaspheredotai/vocalizr:latest + ports: + - containerPort: 7860 + name: http + envFrom: + - configMapRef: + name: vocalizr-config + - secretRef: + name: vocalizr-secrets + resources: + requests: + memory: "4Gi" + cpu: "2" + limits: + memory: "8Gi" + cpu: "4" + volumeMounts: + - name: cache-volume + mountPath: /app/cache + - name: results-volume + mountPath: /app/results + livenessProbe: + httpGet: + path: /health + port: 7860 + initialDelaySeconds: 60 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + readinessProbe: + httpGet: + path: /health + port: 7860 + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + volumes: + - name: cache-volume + persistentVolumeClaim: + claimName: vocalizr-cache-pvc + - name: results-volume + persistentVolumeClaim: + claimName: vocalizr-results-pvc +``` + +#### Service +```yaml +# service.yaml +apiVersion: v1 +kind: Service +metadata: + name: vocalizr-service + namespace: vocalizr + labels: + app: vocalizr +spec: + selector: + app: vocalizr + ports: + - protocol: TCP + port: 80 + targetPort: 7860 + name: http + type: ClusterIP +``` + +#### Ingress +```yaml +# ingress.yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: vocalizr-ingress + namespace: vocalizr + annotations: + kubernetes.io/ingress.class: nginx + cert-manager.io/cluster-issuer: letsencrypt-prod + nginx.ingress.kubernetes.io/rate-limit: "10" + nginx.ingress.kubernetes.io/rate-limit-burst: "20" + nginx.ingress.kubernetes.io/proxy-body-size: "10m" + nginx.ingress.kubernetes.io/proxy-read-timeout: "300" +spec: + tls: + - hosts: + - your-domain.com + secretName: vocalizr-tls + rules: + - host: your-domain.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: vocalizr-service + port: + number: 80 +``` + +#### Horizontal Pod Autoscaler +```yaml +# hpa.yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: vocalizr-hpa + namespace: vocalizr +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: vocalizr + minReplicas: 3 + maxReplicas: 10 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 +``` + +### Deploy to Kubernetes + +```bash +# Apply all manifests +kubectl apply -f namespace.yaml +kubectl apply -f configmap.yaml +kubectl apply -f secret.yaml +kubectl apply -f pvc.yaml +kubectl apply -f deployment.yaml +kubectl apply -f service.yaml +kubectl apply -f ingress.yaml +kubectl apply -f hpa.yaml + +# Check deployment status +kubectl get pods -n vocalizr +kubectl get services -n vocalizr +kubectl get ingress -n vocalizr + +# View logs +kubectl logs -f deployment/vocalizr -n vocalizr +``` + +## Cloud Platforms + +### AWS Deployment + +#### ECS Fargate +```json +{ + "family": "vocalizr-task", + "networkMode": "awsvpc", + "requiresCompatibilities": ["FARGATE"], + "cpu": "2048", + "memory": "8192", + "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole", + "taskRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskRole", + "containerDefinitions": [ + { + "name": "vocalizr", + "image": "ghcr.io/alphaspheredotai/vocalizr:latest", + "portMappings": [ + { + "containerPort": 7860, + "protocol": "tcp" + } + ], + "environment": [ + { + "name": "GRADIO_SERVER_NAME", + "value": "0.0.0.0" + }, + { + "name": "DEBUG", + "value": "false" + } + ], + "logConfiguration": { + "logDriver": "awslogs", + "options": { + "awslogs-group": "/ecs/vocalizr", + "awslogs-region": "us-west-2", + "awslogs-stream-prefix": "ecs" + } + }, + "healthCheck": { + "command": ["CMD-SHELL", "curl -f http://localhost:7860/health || exit 1"], + "interval": 30, + "timeout": 5, + "retries": 3, + "startPeriod": 60 + } + } + ] +} +``` + +#### CloudFormation Template +```yaml +# cloudformation.yaml +AWSTemplateFormatVersion: '2010-09-09' +Description: 'Vocalizr deployment on AWS' + +Parameters: + VpcId: + Type: AWS::EC2::VPC::Id + Description: VPC ID for deployment + + SubnetIds: + Type: List + Description: Subnet IDs for deployment + + DomainName: + Type: String + Description: Domain name for the application + +Resources: + # Application Load Balancer + LoadBalancer: + Type: AWS::ElasticLoadBalancingV2::LoadBalancer + Properties: + Type: application + Scheme: internet-facing + Subnets: !Ref SubnetIds + SecurityGroups: + - !Ref LoadBalancerSecurityGroup + + # ECS Cluster + ECSCluster: + Type: AWS::ECS::Cluster + Properties: + ClusterName: vocalizr-cluster + CapacityProviders: + - FARGATE + - FARGATE_SPOT + + # ECS Service + ECSService: + Type: AWS::ECS::Service + Properties: + Cluster: !Ref ECSCluster + TaskDefinition: !Ref TaskDefinition + DesiredCount: 3 + LaunchType: FARGATE + NetworkConfiguration: + AwsvpcConfiguration: + SecurityGroups: + - !Ref AppSecurityGroup + Subnets: !Ref SubnetIds + AssignPublicIp: ENABLED + LoadBalancers: + - ContainerName: vocalizr + ContainerPort: 7860 + TargetGroupArn: !Ref TargetGroup + + # Auto Scaling + AutoScalingTarget: + Type: AWS::ApplicationAutoScaling::ScalableTarget + Properties: + ServiceNamespace: ecs + ResourceId: !Sub service/${ECSCluster}/${ECSService.Name} + ScalableDimension: ecs:service:DesiredCount + MinCapacity: 2 + MaxCapacity: 10 + + AutoScalingPolicy: + Type: AWS::ApplicationAutoScaling::ScalingPolicy + Properties: + PolicyName: vocalizr-scaling-policy + PolicyType: TargetTrackingScaling + ScalingTargetId: !Ref AutoScalingTarget + TargetTrackingScalingPolicyConfiguration: + TargetValue: 70.0 + PredefinedMetricSpecification: + PredefinedMetricType: ECSServiceAverageCPUUtilization +``` + +### Google Cloud Platform + +#### Cloud Run Deployment +```yaml +# cloud-run.yaml +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: vocalizr + annotations: + run.googleapis.com/ingress: all +spec: + template: + metadata: + annotations: + autoscaling.knative.dev/minScale: "1" + autoscaling.knative.dev/maxScale: "10" + run.googleapis.com/cpu-throttling: "false" + run.googleapis.com/memory: "8Gi" + run.googleapis.com/cpu: "4" + spec: + containers: + - image: ghcr.io/alphaspheredotai/vocalizr:latest + ports: + - containerPort: 7860 + env: + - name: GRADIO_SERVER_NAME + value: "0.0.0.0" + - name: GRADIO_SERVER_PORT + value: "7860" + - name: DEBUG + value: "false" + resources: + limits: + memory: "8Gi" + cpu: "4" + livenessProbe: + httpGet: + path: /health + port: 7860 + initialDelaySeconds: 60 + periodSeconds: 30 +``` + +Deploy to Cloud Run: +```bash +# Build and push to Google Container Registry +gcloud builds submit --tag gcr.io/PROJECT_ID/vocalizr + +# Deploy to Cloud Run +gcloud run deploy vocalizr \ + --image gcr.io/PROJECT_ID/vocalizr \ + --platform managed \ + --region us-central1 \ + --memory 8Gi \ + --cpu 4 \ + --min-instances 1 \ + --max-instances 10 \ + --port 7860 \ + --allow-unauthenticated \ + --set-env-vars GRADIO_SERVER_NAME=0.0.0.0,DEBUG=false +``` + +### Azure Container Instances + +```yaml +# azure-container.yaml +apiVersion: 2021-07-01 +location: eastus +name: vocalizr-container-group +properties: + containers: + - name: vocalizr + properties: + image: ghcr.io/alphaspheredotai/vocalizr:latest + ports: + - port: 7860 + protocol: TCP + environmentVariables: + - name: GRADIO_SERVER_NAME + value: "0.0.0.0" + - name: DEBUG + value: "false" + resources: + requests: + cpu: 4 + memoryInGB: 8 + osType: Linux + restartPolicy: Always + ipAddress: + type: Public + ports: + - port: 7860 + protocol: TCP +``` + +## Load Balancing + +### HAProxy Configuration + +```haproxy +# haproxy.cfg +global + daemon + maxconn 4096 + +defaults + mode http + timeout connect 5000ms + timeout client 50000ms + timeout server 50000ms + option httplog + +frontend vocalizr_frontend + bind *:80 + bind *:443 ssl crt /etc/ssl/certs/vocalizr.pem + redirect scheme https if !{ ssl_fc } + + # Rate limiting + stick-table type ip size 100k expire 30s store http_req_rate(10s) + http-request track-sc0 src + http-request reject if { sc_http_req_rate(0) gt 10 } + + default_backend vocalizr_backend + +backend vocalizr_backend + balance roundrobin + option httpchk GET /health + + server vocalizr1 vocalizr1:7860 check + server vocalizr2 vocalizr2:7860 check + server vocalizr3 vocalizr3:7860 check +``` + +### Nginx Load Balancer + +```nginx +# nginx-lb.conf +upstream vocalizr_pool { + least_conn; + server vocalizr1:7860 max_fails=3 fail_timeout=30s; + server vocalizr2:7860 max_fails=3 fail_timeout=30s; + server vocalizr3:7860 max_fails=3 fail_timeout=30s; +} + +server { + listen 80; + + # Health check endpoint + location /health { + proxy_pass http://vocalizr_pool; + proxy_set_header Host $host; + access_log off; + } + + # Main application + location / { + proxy_pass http://vocalizr_pool; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + + # Session affinity (sticky sessions) + ip_hash; + } +} +``` + +## Monitoring & Logging + +### Prometheus Configuration + +```yaml +# prometheus.yml +global: + scrape_interval: 15s + +scrape_configs: + - job_name: 'vocalizr' + static_configs: + - targets: ['vocalizr:7860'] + metrics_path: /metrics + scrape_interval: 30s + + - job_name: 'node-exporter' + static_configs: + - targets: ['node-exporter:9100'] + +rule_files: + - "vocalizr_rules.yml" + +alerting: + alertmanagers: + - static_configs: + - targets: + - alertmanager:9093 +``` + +### Grafana Dashboard + +```json +{ + "dashboard": { + "title": "Vocalizr Monitoring", + "panels": [ + { + "title": "Request Rate", + "type": "graph", + "targets": [ + { + "expr": "rate(http_requests_total[5m])", + "legendFormat": "Requests/sec" + } + ] + }, + { + "title": "Response Time", + "type": "graph", + "targets": [ + { + "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))", + "legendFormat": "95th percentile" + } + ] + }, + { + "title": "Memory Usage", + "type": "graph", + "targets": [ + { + "expr": "process_resident_memory_bytes", + "legendFormat": "Memory Usage" + } + ] + } + ] + } +} +``` + +### Logging Stack (ELK) + +#### Elasticsearch Configuration +```yaml +# elasticsearch.yml +cluster.name: vocalizr-logs +node.name: node-1 +path.data: /usr/share/elasticsearch/data +network.host: 0.0.0.0 +discovery.type: single-node +``` + +#### Logstash Configuration +```ruby +# logstash.conf +input { + beats { + port => 5044 + } +} + +filter { + if [fields][service] == "vocalizr" { + grok { + match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \| %{WORD:level} \| %{GREEDYDATA:msg}" } + } + + date { + match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ] + } + } +} + +output { + elasticsearch { + hosts => ["elasticsearch:9200"] + index => "vocalizr-logs-%{+YYYY.MM.dd}" + } +} +``` + +#### Filebeat Configuration +```yaml +# filebeat.yml +filebeat.inputs: +- type: log + enabled: true + paths: + - /app/logs/*.log + fields: + service: vocalizr + fields_under_root: true + +output.logstash: + hosts: ["logstash:5044"] +``` + +## Security Considerations + +### SSL/TLS Configuration + +#### Let's Encrypt with Certbot +```bash +# Install certbot +apt-get update && apt-get install -y certbot python3-certbot-nginx + +# Generate certificate +certbot --nginx -d your-domain.com + +# Auto-renewal +echo "0 12 * * * /usr/bin/certbot renew --quiet" | crontab - +``` + +#### Custom SSL Certificate +```nginx +# SSL configuration +ssl_certificate /etc/ssl/certs/vocalizr.crt; +ssl_certificate_key /etc/ssl/private/vocalizr.key; +ssl_protocols TLSv1.2 TLSv1.3; +ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512; +ssl_prefer_server_ciphers off; +ssl_session_cache shared:SSL:10m; +ssl_session_timeout 10m; +``` + +### Network Security + +#### Firewall Rules (iptables) +```bash +# Allow SSH, HTTP, HTTPS +iptables -A INPUT -p tcp --dport 22 -j ACCEPT +iptables -A INPUT -p tcp --dport 80 -j ACCEPT +iptables -A INPUT -p tcp --dport 443 -j ACCEPT + +# Allow application port (internal only) +iptables -A INPUT -p tcp --dport 7860 -s 10.0.0.0/8 -j ACCEPT + +# Drop everything else +iptables -A INPUT -j DROP +``` + +#### AWS Security Groups +```yaml +SecurityGroup: + Type: AWS::EC2::SecurityGroup + Properties: + GroupDescription: Vocalizr Security Group + VpcId: !Ref VpcId + SecurityGroupIngress: + - IpProtocol: tcp + FromPort: 80 + ToPort: 80 + CidrIp: 0.0.0.0/0 + - IpProtocol: tcp + FromPort: 443 + ToPort: 443 + CidrIp: 0.0.0.0/0 + - IpProtocol: tcp + FromPort: 7860 + ToPort: 7860 + SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup +``` + +### Authentication & Authorization + +#### OAuth2 Integration +```python +# oauth_config.py +from authlib.integrations.flask_client import OAuth + +oauth = OAuth(app) + +oauth.register( + name='google', + client_id='your-client-id', + client_secret='your-client-secret', + server_metadata_url='https://accounts.google.com/.well-known/openid_configuration', + client_kwargs={ + 'scope': 'openid email profile' + } +) + +@app.route('/login') +def login(): + redirect_uri = url_for('callback', _external=True) + return oauth.google.authorize_redirect(redirect_uri) + +@app.route('/callback') +def callback(): + token = oauth.google.authorize_access_token() + user = oauth.google.parse_id_token(token) + # Store user session + return redirect('/') +``` + +## Performance Optimization + +### Caching Strategies + +#### Redis Caching +```python +import redis +import pickle + +class RedisCache: + def __init__(self, host='localhost', port=6379, db=0): + self.redis_client = redis.Redis(host=host, port=port, db=db) + + def get_audio(self, text_hash): + cached = self.redis_client.get(f"audio:{text_hash}") + if cached: + return pickle.loads(cached) + return None + + def set_audio(self, text_hash, audio_data, ttl=3600): + self.redis_client.setex( + f"audio:{text_hash}", + ttl, + pickle.dumps(audio_data) + ) +``` + +#### CDN Configuration (CloudFlare) +```javascript +// CloudFlare Worker +addEventListener('fetch', event => { + event.respondWith(handleRequest(event.request)) +}) + +async function handleRequest(request) { + const cache = caches.default + const cacheKey = new Request(request.url, request) + + // Check cache first + let response = await cache.match(cacheKey) + + if (!response) { + // Forward to origin + response = await fetch(request) + + // Cache audio files + if (request.url.includes('/generate') && response.status === 200) { + response = new Response(response.body, response) + response.headers.set('Cache-Control', 'max-age=3600') + event.waitUntil(cache.put(cacheKey, response.clone())) + } + } + + return response +} +``` + +### Database Optimization + +#### PostgreSQL for Metadata +```sql +-- Create tables for audio metadata +CREATE TABLE generated_audio ( + id SERIAL PRIMARY KEY, + text_hash VARCHAR(32) UNIQUE NOT NULL, + text_content TEXT NOT NULL, + voice VARCHAR(50) NOT NULL, + speed FLOAT NOT NULL, + file_path VARCHAR(255), + duration FLOAT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- Indexes for performance +CREATE INDEX idx_text_hash ON generated_audio(text_hash); +CREATE INDEX idx_created_at ON generated_audio(created_at); +CREATE INDEX idx_voice ON generated_audio(voice); + +-- Cleanup old records +DELETE FROM generated_audio +WHERE accessed_at < NOW() - INTERVAL '7 days'; +``` + +## Scaling Strategies + +### Horizontal Scaling + +#### Auto-scaling Configuration +```yaml +# kubernetes-hpa.yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: vocalizr-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: vocalizr + minReplicas: 2 + maxReplicas: 20 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 + behavior: + scaleUp: + stabilizationWindowSeconds: 300 + policies: + - type: Percent + value: 100 + periodSeconds: 15 + scaleDown: + stabilizationWindowSeconds: 300 + policies: + - type: Percent + value: 50 + periodSeconds: 60 +``` + +### Vertical Scaling + +#### Resource Optimization +```yaml +# Optimized resource allocation +resources: + requests: + memory: "4Gi" + cpu: "2" + nvidia.com/gpu: 1 + limits: + memory: "16Gi" + cpu: "8" + nvidia.com/gpu: 1 +``` + +### Geographic Distribution + +#### Multi-Region Deployment +```yaml +# Region-specific deployments +regions: + us-west-2: + replicas: 5 + resources: + cpu: "4" + memory: "8Gi" + + eu-west-1: + replicas: 3 + resources: + cpu: "4" + memory: "8Gi" + + ap-southeast-1: + replicas: 2 + resources: + cpu: "4" + memory: "8Gi" +``` + +### Health Checks and Readiness + +```python +# health_check.py +from flask import Flask, jsonify +import psutil +import torch + +app = Flask(__name__) + +@app.route('/health') +def health_check(): + """Basic health check endpoint.""" + return jsonify({ + "status": "healthy", + "service": "vocalizr", + "timestamp": time.time() + }) + +@app.route('/readiness') +def readiness_check(): + """Detailed readiness check.""" + try: + # Check memory usage + memory = psutil.virtual_memory() + memory_usage = memory.percent + + # Check GPU if available + gpu_available = torch.cuda.is_available() + gpu_memory = None + if gpu_available: + gpu_memory = torch.cuda.memory_allocated() / torch.cuda.max_memory_allocated() * 100 + + # Check model loading + model_loaded = True # Check if PIPELINE is loaded + + status = "ready" if all([ + memory_usage < 90, + model_loaded, + gpu_memory is None or gpu_memory < 90 + ]) else "not_ready" + + return jsonify({ + "status": status, + "checks": { + "memory_usage": f"{memory_usage:.1f}%", + "gpu_available": gpu_available, + "gpu_memory": f"{gpu_memory:.1f}%" if gpu_memory else "N/A", + "model_loaded": model_loaded + } + }) + + except Exception as e: + return jsonify({ + "status": "error", + "error": str(e) + }), 500 +``` + +This deployment guide provides comprehensive coverage for deploying Vocalizr in production environments, from simple Docker deployments to complex Kubernetes clusters with auto-scaling, monitoring, and security features. + +## Next Steps + +- Review [Configuration Guide](CONFIGURATION.md) for environment-specific settings +- Check [Troubleshooting Guide](TROUBLESHOOTING.md) for deployment issues +- See [Monitoring Guide] for operational best practices +- Explore [Security Guide] for hardening your deployment \ No newline at end of file diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md new file mode 100644 index 00000000..fc85eb1d --- /dev/null +++ b/docs/DEVELOPMENT.md @@ -0,0 +1,795 @@ +# πŸ› οΈ Development Guide + +Complete guide for developers who want to contribute to or extend Vocalizr. + +## Table of Contents + +- [Development Setup](#development-setup) +- [Architecture Overview](#architecture-overview) +- [Code Organization](#code-organization) +- [Development Workflow](#development-workflow) +- [Testing](#testing) +- [Code Quality](#code-quality) +- [Documentation](#documentation) +- [Debugging](#debugging) +- [Contributing](#contributing) + +## Development Setup + +### Prerequisites + +- Python 3.12 or higher +- Git +- uv package manager (recommended) or pip +- Docker (optional, for containerized development) + +### Clone and Setup + +```bash +# Clone the repository +git clone https://github.com/AlphaSphereDotAI/vocalizr.git +cd vocalizr + +# Install with uv (recommended) +uv sync +``` + +### Development Dependencies + +The project includes these development tools: + +- **ruff**: Code formatting and linting +- **ty**: Type checking utilities +- **pytest**: Testing framework (when added) +- **trunk**: Git hooks for code quality + +### IDE Setup + +#### VS Code + +Create `.vscode/settings.json`: + +```json +{ + "python.defaultInterpreterPath": "./venv/bin/python", + "python.linting.enabled": true, + "python.linting.ruffEnabled": true, + "python.formatting.provider": "ruff", + "python.sortImports.args": ["--profile", "black"], + "editor.formatOnSave": true, + "editor.codeActionsOnSave": { + "source.organizeImports": true + }, + "files.exclude": { + "**/__pycache__": true, + "**/*.pyc": true, + ".pytest_cache": true, + ".ruff_cache": true + } +} +``` + +Create `.vscode/launch.json` for debugging: + +```json +{ + "version": "0.2.0", + "configurations": [ + { + "name": "Run Vocalizr", + "type": "python", + "request": "launch", + "module": "vocalizr", + "console": "integratedTerminal", + "env": { + "DEBUG": "true", + "GRADIO_SERVER_PORT": "7860" + } + }, + { + "name": "Debug Tests", + "type": "python", + "request": "launch", + "module": "pytest", + "args": ["-v", "tests/"], + "console": "integratedTerminal" + } + ] +} +``` + +#### PyCharm + +1. Open the project in PyCharm +2. Configure Python interpreter to use the virtual environment +3. Enable Ruff as the code formatter and linter +4. Set up run configurations for the application and tests + +## Architecture Overview + +### High-Level Architecture + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Vocalizr Application β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Web Interface (Gradio) β”‚ CLI Interface β”‚ Python API β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Core Generation Engine β”‚ +β”‚ (generate_audio_for_text) β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Kokoro AI Pipeline β”‚ +β”‚ (Text-to-Speech Model) β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ PyTorch Backend β”‚ Audio Processing β”‚ +β”‚ (CUDA/CPU Support) β”‚ (soundfile, numpy) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Component Interaction + +```mermaid +graph TD + A[User Input] --> B[Web Interface] + A --> C[CLI Interface] + A --> D[Python API] + + B --> E[GUI Module] + C --> F[Main Module] + D --> G[Model Module] + + E --> G + F --> G + + G --> H[Kokoro Pipeline] + H --> I[Audio Generation] + I --> J[File Output] + I --> K[Streaming Output] +``` + +### Data Flow + +1. **Input Processing**: Text input validation and preprocessing +2. **Voice Selection**: Voice ID mapping to model parameters +3. **Generation**: Kokoro pipeline processes text to audio +4. **Post-processing**: Audio normalization and formatting +5. **Output**: Streaming audio or file save + +## Code Organization + +### Directory Structure + +``` +vocalizr/ +β”œβ”€β”€ src/vocalizr/ # Main package +β”‚ β”œβ”€β”€ __init__.py # Package initialization & config +β”‚ β”œβ”€β”€ __main__.py # CLI entry point +β”‚ β”œβ”€β”€ gui.py # Gradio web interface +β”‚ └── model.py # Core generation logic +β”œβ”€β”€ docs/ # Documentation +β”œβ”€β”€ tests/ # Test files (to be added) +β”œβ”€β”€ scripts/ # Utility scripts +β”œβ”€β”€ examples/ # Usage examples +β”œβ”€β”€ .github/ # GitHub workflows +β”œβ”€β”€ pyproject.toml # Project configuration +β”œβ”€β”€ Dockerfile # Container configuration +β”œβ”€β”€ compose.yaml # Docker Compose setup +└── README.md # Main documentation +``` + +### Module Responsibilities + +#### `__init__.py` +- Global configuration and constants +- Environment variable loading +- Logging setup +- Pipeline initialization +- Voice choices mapping + +```python +# Key exports +DEBUG: bool +SERVER_NAME: str +SERVER_PORT: int +PIPELINE: KPipeline +CHOICES: dict[str, str] +CUDA_AVAILABLE: bool +``` + +#### `__main__.py` +- CLI entry point +- Application launcher +- Gradio app configuration + +```python +def main() -> None: + """Launch the Gradio voice generation web application.""" +``` + +#### `gui.py` +- Gradio interface components +- User interaction handling +- UI layout and controls + +```python +def app_block() -> Blocks: + """Create and return the main application interface.""" +``` + +#### `model.py` +- Core audio generation logic +- File I/O operations +- Error handling + +```python +def generate_audio_for_text(...) -> Generator[...]: + """Generates audio from text using specified voice and speed.""" + +def save_file_wav(audio: ndarray) -> None: + """Saves audio array to WAV file.""" +``` + +## Development Workflow + +### Setting Up Development Environment + +```bash +# 1. Create development branch +git checkout -b feature/your-feature-name + +# 2. Set up development environment +export DEBUG=true +export GRADIO_SERVER_PORT=7860 + +# 3. Install in development mode +pip install -e ".[dev]" + +# 4. Run the application +python -m vocalizr +``` + +### Code Style and Standards + +#### Formatting with Ruff + +```bash +# Check code style +ruff check src/ + +# Format code +ruff format src/ + +# Check and fix issues +ruff check --fix src/ +``` + +#### Type Hints + +Use comprehensive type hints throughout the codebase: + +```python +from typing import Generator, Literal, Any +from numpy import ndarray, dtype, float32 + +def generate_audio_for_text( + text: str, + voice: str = "af_heart", + speed: float = 1.0, + save_file: bool = False, + debug: bool = False, + char_limit: int = -1, +) -> Generator[ + tuple[Literal[24000], ndarray[tuple[float32], dtype[float32]]], + Any, + None, +]: + """Type-annotated function with detailed return type.""" +``` + +#### Documentation Standards + +Use comprehensive docstrings following Google style: + +```python +def generate_audio_for_text( + text: str, + voice: str = "af_heart", + speed: float = 1.0, + save_file: bool = False, + debug: bool = False, + char_limit: int = -1, +) -> Generator[...]: + """ + Generates audio from the provided text using the specified voice and speed. + + It allows saving the generated audio to a file if required. The function + yields tuples containing the audio sampling rate and the audio data as a + NumPy array. + + Args: + text: The input text to generate audio for. If char_limit is set to a + positive value, the text will be truncated to fit that limit. + voice: The voice profile to use for audio generation. + Defaults to "af_heart". + speed: The speed modifier for audio generation. Defaults to 1.0. + save_file: Whether to save the generated audio to a file. Defaults + to False. + debug: Whether to enable debug mode. Defaults to False. + char_limit: The maximum number of characters to include in the input. + + Yields: + A tuple where the first element is the fixed sampling rate of 24,000 Hz, + and the second element is a NumPy array representing the generated + audio data. + + Raises: + Error: If audio generation fails or unexpected type is returned. + RuntimeError: If file saving fails when save_file is True. + """ +``` + +### Git Workflow + +#### Commit Message Standards + +Follow conventional commit format: + +```bash +# Feature commits +git commit -m "feat: add new voice selection interface" + +# Bug fixes +git commit -m "fix: resolve memory leak in audio generation" + +# Documentation +git commit -m "docs: update API documentation with examples" + +# Refactoring +git commit -m "refactor: simplify audio processing pipeline" + +# Tests +git commit -m "test: add unit tests for voice selection" +``` + +#### Branch Naming + +Use descriptive branch names: + +```bash +# Features +feature/voice-customization +feature/batch-processing + +# Bug fixes +fix/memory-leak +fix/gradio-interface-crash + +# Documentation +docs/api-reference +docs/deployment-guide + +# Refactoring +refactor/model-architecture +refactor/error-handling +``` + +## Testing + +### Test Structure + +Create comprehensive tests following this structure: + +``` +tests/ +β”œβ”€β”€ unit/ # Unit tests +β”‚ β”œβ”€β”€ test_model.py # Model function tests +β”‚ β”œβ”€β”€ test_gui.py # GUI component tests +β”‚ └── test_config.py # Configuration tests +β”œβ”€β”€ integration/ # Integration tests +β”‚ β”œβ”€β”€ test_pipeline.py # End-to-end pipeline tests +β”‚ └── test_api.py # API integration tests +β”œβ”€β”€ fixtures/ # Test data and fixtures +β”‚ β”œβ”€β”€ sample_texts.py # Sample text inputs +β”‚ └── expected_outputs/ # Expected audio outputs +└── conftest.py # Pytest configuration +``` + +### Writing Tests + +#### Unit Tests Example + +```python +# tests/unit/test_model.py +import pytest +import numpy as np +from unittest.mock import patch, MagicMock +from vocalizr.model import generate_audio_for_text, save_file_wav + +class TestGenerateAudioForText: + """Test suite for generate_audio_for_text function.""" + + def test_basic_generation(self): + """Test basic audio generation with default parameters.""" + text = "Hello, world!" + + # Mock the pipeline to avoid actual model calls + with patch('vocalizr.model.PIPELINE') as mock_pipeline: + mock_pipeline.return_value = [ + (None, None, np.array([0.1, 0.2, 0.3], dtype=np.float32)) + ] + + results = list(generate_audio_for_text(text)) + + assert len(results) > 0 + sample_rate, audio = results[0] + assert sample_rate == 24000 + assert isinstance(audio, np.ndarray) + + def test_invalid_text_input(self): + """Test handling of invalid text input.""" + with pytest.raises(Exception): + list(generate_audio_for_text("")) + + with pytest.raises(Exception): + list(generate_audio_for_text("abc")) # Too short + + def test_voice_selection(self): + """Test different voice selections.""" + text = "Testing voice selection" + + for voice in ["af_heart", "bf_emma", "am_michael"]: + with patch('vocalizr.model.PIPELINE') as mock_pipeline: + mock_pipeline.return_value = [ + (None, None, np.array([0.1], dtype=np.float32)) + ] + + results = list(generate_audio_for_text(text, voice=voice)) + assert len(results) > 0 + +class TestSaveFileWav: + """Test suite for save_file_wav function.""" + + def test_save_valid_audio(self, tmp_path): + """Test saving valid audio data.""" + audio = np.array([0.1, 0.2, 0.3], dtype=np.float32) + + with patch('vocalizr.model.AUDIO_FILE_PATH', tmp_path / 'test.wav'): + save_file_wav(audio) + assert (tmp_path / 'test.wav').exists() + + def test_save_invalid_audio(self): + """Test error handling for invalid audio data.""" + with pytest.raises(RuntimeError): + save_file_wav(None) +``` + +#### Integration Tests Example + +```python +# tests/integration/test_pipeline.py +import pytest +from vocalizr.model import generate_audio_for_text + +class TestPipelineIntegration: + """Integration tests for the complete pipeline.""" + + @pytest.mark.slow + def test_end_to_end_generation(self): + """Test complete end-to-end audio generation.""" + text = "This is a comprehensive integration test." + + results = list(generate_audio_for_text( + text=text, + voice="af_heart", + speed=1.0 + )) + + assert len(results) > 0 + + for sample_rate, audio in results: + assert sample_rate == 24000 + assert len(audio) > 0 + assert audio.dtype == np.float32 + + @pytest.mark.parametrize("voice", [ + "af_heart", "bf_emma", "am_michael", "bm_george" + ]) + def test_voice_compatibility(self, voice): + """Test compatibility across different voices.""" + text = "Voice compatibility test" + + results = list(generate_audio_for_text(text, voice=voice)) + assert len(results) > 0 +``` + +### Running Tests + +```bash +# Run all tests +pytest + +# Run with coverage +pytest --cov=src/vocalizr --cov-report=html + +# Run specific test file +pytest tests/unit/test_model.py + +# Run tests with specific markers +pytest -m "not slow" # Skip slow tests + +# Run tests in parallel +pytest -n auto +``` + +### Test Configuration + +Create `tests/conftest.py`: + +```python +# tests/conftest.py +import pytest +import tempfile +import os +from pathlib import Path + +@pytest.fixture +def temp_dir(): + """Create a temporary directory for tests.""" + with tempfile.TemporaryDirectory() as tmp_dir: + yield Path(tmp_dir) + +@pytest.fixture +def mock_environment(): + """Mock environment variables for testing.""" + original_env = os.environ.copy() + + # Set test environment + os.environ.update({ + 'DEBUG': 'true', + 'GRADIO_SERVER_PORT': '7861', # Different port for tests + 'HF_HOME': '/tmp/test_cache' + }) + + yield + + # Restore original environment + os.environ.clear() + os.environ.update(original_env) + +@pytest.fixture +def sample_texts(): + """Provide sample texts for testing.""" + return [ + "Hello, world!", + "This is a longer test sentence for audio generation.", + "Testing with numbers: 123 and symbols: !@#$%", + "Multi-sentence test. This has multiple sentences. End of test." + ] +``` + +## Code Quality + +### Pre-commit Hooks + +Set up pre-commit hooks to ensure code quality: + +```yaml +# .pre-commit-config.yaml +repos: + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.1.6 + hooks: + - id: ruff + args: [--fix, --exit-non-zero-on-fix] + - id: ruff-format + + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.5.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-merge-conflict + - id: check-yaml + - id: check-toml + + - repo: https://github.com/python-poetry/poetry + rev: 1.7.0 + hooks: + - id: poetry-check +``` + +Install and activate: + +```bash +pip install pre-commit +pre-commit install +``` + +### Code Review Checklist + +Before submitting PRs, ensure: + +- [ ] Code follows Ruff formatting standards +- [ ] All type hints are present and accurate +- [ ] Comprehensive docstrings for all functions +- [ ] Unit tests cover new functionality +- [ ] Integration tests pass +- [ ] No breaking changes to existing API +- [ ] Documentation updated for new features +- [ ] Error handling implemented +- [ ] Performance impact considered +- [ ] Security implications reviewed + +## Documentation + +### Building Documentation + +```bash +# Generate API documentation +python -c " +import inspect +from vocalizr import model, gui +print('Model functions:', [name for name, obj in inspect.getmembers(model, inspect.isfunction)]) +print('GUI functions:', [name for name, obj in inspect.getmembers(gui, inspect.isfunction)]) +" + +# Validate documentation links +python scripts/validate_docs.py +``` + +### Documentation Standards + +1. **README**: Keep updated with latest features +2. **API Docs**: Auto-generate from docstrings when possible +3. **Examples**: Provide working code examples +4. **Tutorials**: Step-by-step guides for common tasks +5. **Architecture**: Diagrams and explanations + +## Debugging + +### Debug Configuration + +Set up debugging environment: + +```bash +export DEBUG=true +export GRADIO_SERVER_PORT=7860 +export VOCALIZR_LOG_LEVEL=DEBUG +``` + +### Common Debug Scenarios + +#### Memory Issues + +```python +import psutil +import torch +import gc + +def debug_memory(): + """Debug memory usage.""" + process = psutil.Process() + memory_info = process.memory_info() + + print(f"RSS: {memory_info.rss / 1024**2:.1f} MB") + print(f"VMS: {memory_info.vms / 1024**2:.1f} MB") + + if torch.cuda.is_available(): + print(f"GPU Memory: {torch.cuda.memory_allocated() / 1024**2:.1f} MB") + print(f"GPU Cached: {torch.cuda.memory_reserved() / 1024**2:.1f} MB") + +# Use before and after generation +debug_memory() +``` + +#### Performance Profiling + +```python +import cProfile +import pstats +from vocalizr.model import generate_audio_for_text + +def profile_generation(): + """Profile audio generation performance.""" + pr = cProfile.Profile() + pr.enable() + + # Your code here + list(generate_audio_for_text("Test profiling text")) + + pr.disable() + stats = pstats.Stats(pr) + stats.sort_stats('cumulative') + stats.print_stats(10) + +profile_generation() +``` + +### Logging for Development + +```python +from loguru import logger + +# Add debug-specific logging +logger.add( + "debug.log", + level="DEBUG", + format="{time} | {level} | {name}:{function}:{line} | {message}", + rotation="1 MB" +) + +def debug_generation(text, voice): + """Debug wrapper for generation.""" + logger.debug(f"Starting generation: text='{text[:50]}...', voice={voice}") + + try: + for i, (sr, audio) in enumerate(generate_audio_for_text(text, voice)): + logger.debug(f"Generated chunk {i}: {len(audio)} samples") + yield sr, audio + except Exception as e: + logger.exception(f"Generation failed: {e}") + raise +``` + +## Contributing + +### Getting Started + +1. **Fork the repository** on GitHub +2. **Clone your fork** locally +3. **Create a feature branch** from main +4. **Make your changes** following the development standards +5. **Add tests** for new functionality +6. **Update documentation** as needed +7. **Submit a pull request** with detailed description + +### Pull Request Guidelines + +#### PR Title Format +``` +type(scope): brief description + +Examples: +feat(model): add batch processing support +fix(gui): resolve memory leak in audio player +docs(api): add integration examples +refactor(core): simplify pipeline initialization +``` + +#### PR Description Template +```markdown +## Description +Brief description of changes made. + +## Type of Change +- [ ] Bug fix (non-breaking change which fixes an issue) +- [ ] New feature (non-breaking change which adds functionality) +- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) +- [ ] Documentation update + +## Testing +- [ ] Unit tests added/updated +- [ ] Integration tests added/updated +- [ ] All tests pass locally + +## Checklist +- [ ] Code follows project style guidelines +- [ ] Self-review of code completed +- [ ] Documentation updated +- [ ] No breaking changes introduced +``` + +### Review Process + +1. **Automated Checks**: CI/CD pipeline runs tests and linting +2. **Code Review**: Maintainers review code quality and design +3. **Testing**: Verify functionality works as expected +4. **Documentation**: Ensure docs are updated appropriately +5. **Merge**: Approved PRs are merged to main branch + +## Next Steps + +- Review [Contributing Guidelines](CONTRIBUTING.md) for detailed contribution process +- Check [Examples](EXAMPLES.md) for practical development examples +- See [API Documentation](API.md) for detailed technical reference +- Explore [Troubleshooting](TROUBLESHOOTING.md) for development issues \ No newline at end of file diff --git a/docs/INSTALLATION.md b/docs/INSTALLATION.md new file mode 100644 index 00000000..467cc0d8 --- /dev/null +++ b/docs/INSTALLATION.md @@ -0,0 +1,314 @@ +# πŸ“¦ Installation Guide + +This guide provides detailed instructions for installing Vocalizr on various platforms and environments. + +## Table of Contents + +- [System Requirements](#system-requirements) +- [Installation Methods](#installation-methods) + - [Docker (Recommended)](#docker-recommended) + - [pip Install](#pip-install) + - [From Source](#from-source) + - [Development Setup](#development-setup) +- [Verification](#verification) +- [GPU Support](#gpu-support) +- [Troubleshooting](#troubleshooting) + +## System Requirements + +### Minimum Requirements +- **Operating System**: Linux, macOS, or Windows +- **Python**: 3.12 or higher (for non-Docker installations) +- **Memory**: 4GB RAM +- **Storage**: 2GB free space for models and cache +- **Network**: Internet connection for initial model download + +### Recommended Requirements +- **Memory**: 8GB+ RAM for optimal performance +- **GPU**: CUDA-compatible GPU for faster generation +- **CPU**: Multi-core processor for better performance + +## Installation Methods + +### Docker (Recommended) + +Docker is the easiest and most reliable way to run Vocalizr. + +#### Prerequisites +- Docker installed on your system ([Get Docker](https://docs.docker.com/get-docker/)) + +#### Quick Start +```bash +# Pull and run the latest image +docker run -p 7860:7860 ghcr.io/alphaspheredotai/vocalizr:latest +``` + +#### Custom Configuration +```bash +# Run with custom environment variables +docker run -p 7860:7860 \ + -e GRADIO_SERVER_PORT=8080 \ + -e DEBUG=false \ + ghcr.io/alphaspheredotai/vocalizr:latest +``` + +#### With Volume Mounts +```bash +# Mount local directories for logs and results +docker run -p 7860:7860 \ + -v ./logs:/home/nonroot/logs \ + -v ./results:/home/nonroot/results \ + ghcr.io/alphaspheredotai/vocalizr:latest +``` + +#### Docker Compose +Create a `docker-compose.yml` file: + +```yaml +version: '3.8' +services: + vocalizr: + image: ghcr.io/alphaspheredotai/vocalizr:latest + ports: + - "7860:7860" + environment: + - GRADIO_SERVER_NAME=0.0.0.0 + - GRADIO_SERVER_PORT=7860 + - DEBUG=false + volumes: + - ./logs:/home/nonroot/logs + - ./results:/home/nonroot/results + restart: unless-stopped +``` + +Run with: +```bash +docker-compose up -d +``` + +### pip Install + +#### Prerequisites +- Python 3.12 or higher +- pip package manager + +#### Installation +```bash +# Install from PyPI +pip install vocalizr + +# Or install with specific version +pip install vocalizr==0.0.1 +``` + +#### Virtual Environment (Recommended) +```bash +# Create virtual environment +python -m venv vocalizr-env + +# Activate virtual environment +# On Linux/macOS: +source vocalizr-env/bin/activate +# On Windows: +vocalizr-env\Scripts\activate + +# Install vocalizr +pip install vocalizr +``` + +#### Running +```bash +# Start the application +vocalizr + +# Or run as module +python -m vocalizr +``` + +### From Source + +#### Prerequisites +- Python 3.12 or higher +- Git +- uv package manager (recommended) or pip + +#### Clone Repository +```bash +git clone https://github.com/AlphaSphereDotAI/vocalizr.git +cd vocalizr +``` + +#### With uv (Recommended) +```bash +# Install uv if not already installed +curl -LsSf https://astral.sh/uv/install.sh | sh + +# Sync dependencies and install +uv sync +uv run vocalizr +``` + +#### With pip +```bash +# Install in development mode +pip install -e . + +# Run the application +vocalizr +``` + +### Development Setup + +For contributors and developers who want to modify the code: + +#### Prerequisites +- All requirements from "From Source" installation +- Git + +#### Setup +```bash +# Clone the repository +git clone https://github.com/AlphaSphereDotAI/vocalizr.git +cd vocalizr + +# Install with development dependencies +pip install -e ".[dev]" + +# Or with uv +uv sync --group dev +``` + +#### Development Tools +```bash +# Run linting +ruff check src/ + +# Format code +ruff format src/ + +# Type checking (if configured) +mypy src/ +``` + +## Verification + +After installation, verify that Vocalizr is working correctly: + +### Command Line Test +```bash +# Test import +python -c "import vocalizr; print('Vocalizr imported successfully')" + +# Check version +python -c "import vocalizr; print(vocalizr.__version__)" +``` + +### Web Interface Test +1. Start the application: + ```bash + vocalizr + ``` + +2. Open your browser and navigate to: + - Local: `http://localhost:7860` + - Custom port: `http://localhost:[YOUR_PORT]` + +3. You should see the Vocalizr web interface + +### API Test +```python +from vocalizr.model import generate_audio_for_text + +# Test audio generation +for sample_rate, audio_data in generate_audio_for_text( + text="Hello, Vocalizr is working!", + voice="af_heart" +): + print(f"Generated audio: {len(audio_data)} samples at {sample_rate}Hz") + break # Just test the first chunk +``` + +## GPU Support + +Vocalizr automatically detects and uses CUDA-compatible GPUs when available. + +### NVIDIA GPU Setup +1. Install NVIDIA drivers +2. Install CUDA toolkit +3. Install PyTorch with CUDA support: + ```bash + pip install torch --index-url https://download.pytorch.org/whl/cu124 + ``` + +### Verification +```python +import torch +print(f"CUDA available: {torch.cuda.is_available()}") +print(f"CUDA devices: {torch.cuda.device_count()}") +``` + +## Troubleshooting + +### Common Issues + +#### 1. Python Version Error +``` +ERROR: Python 3.12 or higher is required +``` +**Solution**: Upgrade Python to version 3.12 or higher. + +#### 2. Network Connection Error +``` +Failed to resolve 'huggingface.co' +``` +**Solution**: Check internet connection. Vocalizr needs to download models on first run. + +#### 3. Permission Denied (Docker) +``` +Permission denied while trying to connect to Docker daemon +``` +**Solution**: Add your user to the docker group or run with `sudo`. + +#### 4. Port Already in Use +``` +OSError: [Errno 98] Address already in use +``` +**Solution**: Change the port or stop the conflicting service: +```bash +# Change port +GRADIO_SERVER_PORT=8080 vocalizr + +# Or kill process using port 7860 +sudo lsof -t -i:7860 | xargs kill -9 +``` + +#### 5. Out of Memory Error +``` +CUDA out of memory +``` +**Solution**: +- Reduce batch size +- Use CPU instead of GPU +- Add more GPU memory + +### Getting Help + +If you encounter issues not covered here: + +1. Check the [Troubleshooting Guide](TROUBLESHOOTING.md) +2. Search [existing issues](https://github.com/AlphaSphereDotAI/vocalizr/issues) +3. Create a [new issue](https://github.com/AlphaSphereDotAI/vocalizr/issues/new) with: + - Your operating system + - Python version + - Installation method used + - Full error message + - Steps to reproduce + +## Next Steps + +After successful installation: + +- Read the [Usage Guide](USAGE.md) to learn how to use Vocalizr +- Explore the [API Documentation](API.md) for programmatic usage +- Check the [Configuration Guide](CONFIGURATION.md) for customization options +- Review [Examples](EXAMPLES.md) for common use cases \ No newline at end of file diff --git a/docs/LICENSE.md b/docs/LICENSE.md new file mode 100644 index 00000000..f7f32ea5 --- /dev/null +++ b/docs/LICENSE.md @@ -0,0 +1,232 @@ +# πŸ“„ License + +## MIT License + +**Copyright (c) 2024 AlphaSphere.AI** + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +**THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE.** + +--- + +## Third-Party Licenses + +This project incorporates software from other open source projects. The following is a summary of the licenses of those dependencies: + +### Direct Dependencies + +#### Gradio +- **License**: Apache License 2.0 +- **Copyright**: Gradio Team +- **Website**: https://gradio.app/ +- **Usage**: Web interface framework + +#### Kokoro +- **License**: Apache License 2.0 +- **Copyright**: Kokoro AI Team +- **Repository**: https://github.com/hexgrad/kokoro +- **Usage**: Text-to-speech AI model + +#### SoundFile +- **License**: BSD 3-Clause License +- **Copyright**: Bastian Bechtold +- **Repository**: https://github.com/bastibe/python-soundfile +- **Usage**: Audio file processing + +#### PyTorch +- **License**: BSD 3-Clause License +- **Copyright**: PyTorch Contributors +- **Website**: https://pytorch.org/ +- **Usage**: Machine learning framework + +#### NumPy +- **License**: BSD 3-Clause License +- **Copyright**: NumPy Developers +- **Website**: https://numpy.org/ +- **Usage**: Numerical computing + +#### Loguru +- **License**: MIT License +- **Copyright**: Delgan +- **Repository**: https://github.com/Delgan/loguru +- **Usage**: Logging framework + +### Development Dependencies + +#### Ruff +- **License**: MIT License +- **Copyright**: Charlie Marsh +- **Repository**: https://github.com/astral-sh/ruff +- **Usage**: Code formatting and linting + +### Model Licenses + +#### Kokoro-82M Model +- **License**: Apache License 2.0 +- **Provider**: HexGrad +- **Model ID**: hexgrad/Kokoro-82M +- **Platform**: Hugging Face Hub +- **Usage**: Pre-trained text-to-speech model + +--- + +## Attribution Requirements + +When redistributing this software or incorporating it into other projects, please ensure compliance with the following attribution requirements: + +### For Software Distribution +- Include this LICENSE file +- Preserve copyright notices +- Include attribution for third-party components +- Mention any modifications made + +### For Commercial Use +- This software may be used commercially under the MIT License +- No additional permissions required +- Attribution appreciated but not required +- Consider supporting the project through contributions + +### For Academic/Research Use +- Citation information: + ``` + Vocalizr: AI-Powered Voice Generation Application + Copyright (c) 2024 AlphaSphere.AI + Available at: https://github.com/AlphaSphereDotAI/vocalizr + ``` + +--- + +## License Compatibility + +### Compatible Licenses +This MIT-licensed software is compatible with: +- **Apache License 2.0** βœ… +- **BSD Licenses** βœ… +- **GPL 2.0/3.0** βœ… (can be included in GPL projects) +- **LGPL** βœ… +- **ISC License** βœ… +- **Unlicense** βœ… + +### Incompatible Licenses +Be cautious when combining with: +- **Copyleft licenses with stronger requirements** +- **Proprietary licenses with conflicting terms** +- **Licenses requiring specific attribution formats** + +--- + +## Contributor License Agreement + +### For Contributors +By contributing to this project, you agree that: + +1. **Your contributions** will be licensed under the same MIT License +2. **You have the right** to license your contributions +3. **You grant** AlphaSphere.AI and users a perpetual, worldwide license +4. **You understand** that your contributions may be redistributed + +### For Maintainers +Project maintainers commit to: + +1. **Maintain** the MIT License for the project +2. **Respect** contributor rights and attributions +3. **Ensure** compatibility with third-party licenses +4. **Provide** clear licensing information + +--- + +## Trademark Notice + +- **"Vocalizr"** is a trademark of AlphaSphere.AI +- **"AlphaSphere.AI"** is a trademark of AlphaSphere.AI +- Use of trademarks requires permission from AlphaSphere.AI +- This license does not grant trademark rights + +--- + +## Disclaimer + +### No Warranty +This software is provided "as is" without warranty of any kind. The authors and copyright holders disclaim all warranties, express or implied, including but not limited to: + +- **Merchantability** +- **Fitness for a particular purpose** +- **Non-infringement** +- **Accuracy or completeness** + +### Limitation of Liability +In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software. + +### Use at Your Own Risk +Users of this software do so at their own risk and are responsible for: + +- **Compliance** with applicable laws and regulations +- **Proper use** of the software +- **Data protection** and privacy considerations +- **Security** of their implementations + +--- + +## Export Control + +This software may be subject to export control laws and regulations. Users are responsible for compliance with all applicable export control laws in their jurisdiction, including but not limited to: + +- **U.S. Export Administration Regulations (EAR)** +- **International Traffic in Arms Regulations (ITAR)** +- **EU Dual-Use Regulation** +- **Local export control laws** + +--- + +## Privacy and Data Processing + +### Data Handling +This software may process: +- **Text input** provided by users +- **Audio output** generated from text +- **Usage logs** and application telemetry +- **Configuration data** and preferences + +### User Responsibilities +Users are responsible for: +- **Compliance** with data protection laws (GDPR, CCPA, etc.) +- **Proper consent** for data processing +- **Data security** and protection measures +- **Transparency** about data use to end users + +--- + +## Contact Information + +For licensing questions or permissions: + +- **Email**: [mohamed.hisham.abdelzaher@gmail.com](mailto:mohamed.hisham.abdelzaher@gmail.com) +- **Repository**: [https://github.com/AlphaSphereDotAI/vocalizr](https://github.com/AlphaSphereDotAI/vocalizr) +- **Issues**: [https://github.com/AlphaSphereDotAI/vocalizr/issues](https://github.com/AlphaSphereDotAI/vocalizr/issues) + +--- + +## Version History + +- **Version 1.0** (2024): Initial MIT License +- **Last Updated**: January 2024 +- **Next Review**: January 2025 + +--- + +**Note**: This license file is part of the Vocalizr project documentation. For the most current license information, please refer to the LICENSE file in the project repository. \ No newline at end of file diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 00000000..2d7e04ce --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,1076 @@ +# πŸ”§ Troubleshooting Guide + +Comprehensive guide for diagnosing and resolving common issues with Vocalizr. + +## Table of Contents + +- [Quick Diagnostics](#quick-diagnostics) +- [Installation Issues](#installation-issues) +- [Runtime Issues](#runtime-issues) +- [Performance Problems](#performance-problems) +- [Network & Connectivity](#network--connectivity) +- [Audio Generation Issues](#audio-generation-issues) +- [Docker & Container Issues](#docker--container-issues) +- [Deployment Issues](#deployment-issues) +- [Development Issues](#development-issues) +- [Getting Help](#getting-help) + +## Quick Diagnostics + +### System Information Checker + +```python +#!/usr/bin/env python3 +""" +Vocalizr System Diagnostics Script +Run this script to check your system compatibility and diagnose issues. +""" + +import sys +import platform +import subprocess +import pkg_resources +import torch +import psutil +from pathlib import Path + +def check_python_version(): + """Check Python version compatibility.""" + version = sys.version_info + print(f"Python Version: {version.major}.{version.minor}.{version.micro}") + + if version < (3, 12): + print("❌ Python 3.12+ required") + return False + else: + print("βœ… Python version compatible") + return True + +def check_system_resources(): + """Check system memory and CPU.""" + memory = psutil.virtual_memory() + cpu_count = psutil.cpu_count() + + print(f"System Memory: {memory.total // (1024**3)} GB total, {memory.available // (1024**3)} GB available") + print(f"CPU Cores: {cpu_count}") + + issues = [] + if memory.total < 4 * (1024**3): # Less than 4GB + issues.append("❌ Insufficient memory (4GB+ recommended)") + else: + print("βœ… Sufficient memory available") + + if cpu_count < 2: + issues.append("❌ Insufficient CPU cores (2+ recommended)") + else: + print("βœ… Sufficient CPU cores") + + return len(issues) == 0, issues + +def check_gpu_availability(): + """Check CUDA GPU availability.""" + try: + cuda_available = torch.cuda.is_available() + if cuda_available: + gpu_count = torch.cuda.device_count() + gpu_name = torch.cuda.get_device_name(0) + gpu_memory = torch.cuda.get_device_properties(0).total_memory // (1024**3) + + print(f"βœ… CUDA GPU Available: {gpu_name}") + print(f" GPU Memory: {gpu_memory} GB") + print(f" GPU Count: {gpu_count}") + return True + else: + print("ℹ️ No CUDA GPU detected (CPU mode will be used)") + return True + except Exception as e: + print(f"⚠️ GPU check failed: {e}") + return False + +def check_dependencies(): + """Check required dependencies.""" + required_packages = [ + 'gradio', + 'kokoro', + 'soundfile', + 'torch', + 'numpy', + 'loguru' + ] + + missing = [] + for package in required_packages: + try: + pkg_resources.get_distribution(package) + print(f"βœ… {package} installed") + except pkg_resources.DistributionNotFound: + missing.append(package) + print(f"❌ {package} missing") + + return len(missing) == 0, missing + +def check_network_connectivity(): + """Check network connectivity to required services.""" + test_urls = [ + 'huggingface.co', + 'github.com' + ] + + for url in test_urls: + try: + result = subprocess.run( + ['ping', '-c', '1', url], + capture_output=True, + timeout=5 + ) + if result.returncode == 0: + print(f"βœ… {url} reachable") + else: + print(f"❌ {url} unreachable") + except (subprocess.TimeoutExpired, FileNotFoundError): + print(f"❌ Cannot test connectivity to {url}") + +def check_file_permissions(): + """Check file system permissions.""" + test_dirs = [ + Path.cwd() / 'results', + Path.cwd() / 'logs', + Path.home() / '.cache' + ] + + for test_dir in test_dirs: + try: + test_dir.mkdir(exist_ok=True) + test_file = test_dir / 'test_write.tmp' + test_file.write_text('test') + test_file.unlink() + print(f"βœ… Write permissions OK: {test_dir}") + except Exception as e: + print(f"❌ Write permission denied: {test_dir} - {e}") + +def main(): + """Run all diagnostic checks.""" + print("πŸ” Vocalizr System Diagnostics") + print("=" * 50) + + all_good = True + + print("\nπŸ“Š System Information:") + print(f"OS: {platform.system()} {platform.release()}") + print(f"Architecture: {platform.machine()}") + + print("\n🐍 Python Environment:") + if not check_python_version(): + all_good = False + + print("\nπŸ’Ύ System Resources:") + resources_ok, resource_issues = check_system_resources() + if not resources_ok: + all_good = False + for issue in resource_issues: + print(issue) + + print("\nπŸš€ GPU Support:") + if not check_gpu_availability(): + all_good = False + + print("\nπŸ“¦ Dependencies:") + deps_ok, missing_deps = check_dependencies() + if not deps_ok: + all_good = False + print("Missing packages:", ', '.join(missing_deps)) + + print("\n🌐 Network Connectivity:") + check_network_connectivity() + + print("\nπŸ“ File Permissions:") + check_file_permissions() + + print("\n" + "=" * 50) + if all_good: + print("βœ… All checks passed! Your system should work with Vocalizr.") + else: + print("❌ Some issues detected. Please address the items marked with ❌.") + + print("\nFor help, visit: https://github.com/AlphaSphereDotAI/vocalizr/issues") + +if __name__ == "__main__": + main() +``` + +### Quick Health Check + +```bash +#!/bin/bash +# Quick Vocalizr health check script + +echo "πŸ” Vocalizr Quick Health Check" +echo "================================" + +# Check if Vocalizr is installed +echo "πŸ“¦ Checking Vocalizr installation..." +if python -c "import vocalizr" 2>/dev/null; then + echo "βœ… Vocalizr is installed" +else + echo "❌ Vocalizr is not installed or not importable" + exit 1 +fi + +# Check if service is running +echo "🌐 Checking if Vocalizr service is running..." +if curl -s http://localhost:7860/health >/dev/null 2>&1; then + echo "βœ… Vocalizr service is responding" +else + echo "ℹ️ Vocalizr service is not running on localhost:7860" +fi + +# Check disk space +echo "πŸ’Ύ Checking disk space..." +AVAILABLE=$(df . | tail -1 | awk '{print $4}') +if [ "$AVAILABLE" -gt 2097152 ]; then # 2GB in KB + echo "βœ… Sufficient disk space available" +else + echo "⚠️ Low disk space (less than 2GB available)" +fi + +# Check memory +echo "🧠 Checking memory usage..." +MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}') +if [ "$MEMORY_USAGE" -lt 80 ]; then + echo "βœ… Memory usage OK (${MEMORY_USAGE}%)" +else + echo "⚠️ High memory usage (${MEMORY_USAGE}%)" +fi + +echo "βœ… Health check complete" +``` + +## Installation Issues + +### Python Version Incompatibility + +**Problem**: Error about Python version requirement +``` +ERROR: Python 3.12 or higher is required +``` + +**Solutions**: + +1. **Check current Python version**: + ```bash + python --version + python3 --version + ``` + +2. **Install Python 3.12+ on Ubuntu/Debian**: + ```bash + sudo apt update + sudo apt install software-properties-common + sudo add-apt-repository ppa:deadsnakes/ppa + sudo apt update + sudo apt install python3.12 python3.12-pip python3.12-venv + ``` + +3. **Install Python 3.12+ on macOS**: + ```bash + # Using Homebrew + brew install python@3.12 + + # Using pyenv + pyenv install 3.12.0 + pyenv global 3.12.0 + ``` + +4. **Install Python 3.12+ on Windows**: + - Download from [python.org](https://www.python.org/downloads/) + - Or use Windows Store + - Or use Chocolatey: `choco install python312` + +### Package Installation Failures + +**Problem**: pip install fails with dependency conflicts + +**Solutions**: + +1. **Use virtual environment**: + ```bash + python -m venv vocalizr-env + source vocalizr-env/bin/activate # Linux/macOS + # or + vocalizr-env\Scripts\activate # Windows + + pip install --upgrade pip + pip install vocalizr + ``` + +2. **Clear pip cache**: + ```bash + pip cache purge + pip install --no-cache-dir vocalizr + ``` + +3. **Install with specific index**: + ```bash + pip install vocalizr --extra-index-url https://download.pytorch.org/whl/cu124 + ``` + +### uv Package Manager Issues + +**Problem**: uv commands not working + +**Solutions**: + +1. **Install uv**: + ```bash + # Linux/macOS + curl -LsSf https://astral.sh/uv/install.sh | sh + + # Windows + powershell -c "irm https://astral.sh/uv/install.ps1 | iex" + + # Using pip + pip install uv + ``` + +2. **Sync dependencies**: + ```bash + uv sync --refresh + ``` + +## Runtime Issues + +### Model Download Failures + +**Problem**: Cannot download Kokoro model from Hugging Face + +**Error Messages**: +``` +LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub +Failed to resolve 'huggingface.co' +``` + +**Solutions**: + +1. **Check internet connectivity**: + ```bash + ping huggingface.co + curl -I https://huggingface.co + ``` + +2. **Set proxy if needed**: + ```bash + export HTTP_PROXY=http://proxy.company.com:8080 + export HTTPS_PROXY=http://proxy.company.com:8080 + export NO_PROXY=localhost,127.0.0.1 + ``` + +3. **Manual model download**: + ```python + import os + os.environ['HF_HOME'] = '/path/to/cache' + + from huggingface_hub import snapshot_download + snapshot_download( + repo_id="hexgrad/Kokoro-82M", + local_dir="/path/to/local/model" + ) + ``` + +4. **Use offline mode** (after initial download): + ```python + os.environ['HF_HUB_OFFLINE'] = '1' + ``` + +### Memory Errors + +**Problem**: Out of memory errors during generation + +**Error Messages**: +``` +RuntimeError: CUDA out of memory +MemoryError: Unable to allocate array +``` + +**Solutions**: + +1. **Reduce batch size**: + ```python + # Generate shorter texts + text_chunks = [text[i:i+500] for i in range(0, len(text), 500)] + ``` + +2. **Clear GPU memory**: + ```python + import torch + torch.cuda.empty_cache() + import gc + gc.collect() + ``` + +3. **Force CPU usage**: + ```bash + export CUDA_VISIBLE_DEVICES="" + ``` + +4. **Increase system memory**: + - Add swap space on Linux: + ```bash + sudo fallocate -l 4G /swapfile + sudo chmod 600 /swapfile + sudo mkswap /swapfile + sudo swapon /swapfile + ``` + +### Permission Errors + +**Problem**: Cannot write to output directories + +**Solutions**: + +1. **Check directory permissions**: + ```bash + ls -la results/ logs/ + ``` + +2. **Fix permissions**: + ```bash + chmod 755 results/ logs/ + chown $USER:$USER results/ logs/ + ``` + +3. **Use custom directories**: + ```bash + export VOCALIZR_RESULTS_DIR=/tmp/vocalizr_results + export VOCALIZR_LOG_DIR=/tmp/vocalizr_logs + mkdir -p $VOCALIZR_RESULTS_DIR $VOCALIZR_LOG_DIR + ``` + +## Performance Problems + +### Slow Generation Times + +**Problem**: Audio generation takes too long + +**Diagnostics**: +```python +import time +import torch +from vocalizr.model import generate_audio_for_text + +def benchmark_generation(): + text = "This is a performance test." + + # Check if CUDA is being used + print(f"CUDA available: {torch.cuda.is_available()}") + if torch.cuda.is_available(): + print(f"Current device: {torch.cuda.current_device()}") + print(f"Device name: {torch.cuda.get_device_name()}") + + start_time = time.time() + + for sr, audio in generate_audio_for_text(text): + break # Just test first chunk + + end_time = time.time() + print(f"Generation time: {end_time - start_time:.2f} seconds") + +benchmark_generation() +``` + +**Solutions**: + +1. **Enable GPU acceleration**: + ```bash + # Check GPU driver + nvidia-smi + + # Install CUDA-enabled PyTorch + pip install torch --index-url https://download.pytorch.org/whl/cu124 + ``` + +2. **Optimize system resources**: + ```bash + # Set CPU governor to performance + echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor + + # Disable CPU throttling + echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo + ``` + +3. **Use caching**: + ```python + # Implement audio caching to avoid regeneration + import hashlib + import pickle + + def get_cache_key(text, voice, speed): + return hashlib.md5(f"{text}{voice}{speed}".encode()).hexdigest() + ``` + +### High Memory Usage + +**Problem**: Application uses too much memory + +**Solutions**: + +1. **Monitor memory usage**: + ```python + import psutil + import torch + + def monitor_memory(): + process = psutil.Process() + memory_info = process.memory_info() + print(f"RSS: {memory_info.rss / 1024**2:.1f} MB") + + if torch.cuda.is_available(): + print(f"GPU memory: {torch.cuda.memory_allocated() / 1024**2:.1f} MB") + ``` + +2. **Implement memory cleanup**: + ```python + import gc + + def cleanup_memory(): + gc.collect() + if torch.cuda.is_available(): + torch.cuda.empty_cache() + ``` + +3. **Use memory-efficient settings**: + ```python + # Reduce model precision if supported + torch.backends.cudnn.allow_tf32 = True + torch.backends.cuda.matmul.allow_tf32 = True + ``` + +## Network & Connectivity + +### Port Already in Use + +**Problem**: Cannot bind to port 7860 + +**Error Messages**: +``` +OSError: [Errno 98] Address already in use +``` + +**Solutions**: + +1. **Find process using port**: + ```bash + sudo lsof -i :7860 + sudo netstat -tulpn | grep :7860 + ``` + +2. **Kill process**: + ```bash + sudo kill -9 + ``` + +3. **Use different port**: + ```bash + export GRADIO_SERVER_PORT=8080 + vocalizr + ``` + +### Firewall Issues + +**Problem**: Cannot access web interface externally + +**Solutions**: + +1. **Check firewall status**: + ```bash + # Ubuntu/Debian + sudo ufw status + + # CentOS/RHEL + sudo firewall-cmd --list-all + ``` + +2. **Open port**: + ```bash + # Ubuntu/Debian + sudo ufw allow 7860 + + # CentOS/RHEL + sudo firewall-cmd --add-port=7860/tcp --permanent + sudo firewall-cmd --reload + ``` + +3. **Test connectivity**: + ```bash + # From another machine + telnet your-server-ip 7860 + ``` + +### Proxy Configuration + +**Problem**: Cannot access external services through corporate proxy + +**Solutions**: + +1. **Set proxy environment variables**: + ```bash + export HTTP_PROXY=http://proxy.company.com:8080 + export HTTPS_PROXY=http://proxy.company.com:8080 + export NO_PROXY=localhost,127.0.0.1,.company.com + ``` + +2. **Configure pip proxy**: + ```bash + pip install --proxy http://proxy.company.com:8080 vocalizr + ``` + +3. **Git proxy configuration**: + ```bash + git config --global http.proxy http://proxy.company.com:8080 + git config --global https.proxy http://proxy.company.com:8080 + ``` + +## Audio Generation Issues + +### No Audio Output + +**Problem**: Generation completes but no audio is produced + +**Diagnostics**: +```python +def debug_audio_generation(): + from vocalizr.model import generate_audio_for_text + import numpy as np + + text = "Hello world" + + for sr, audio in generate_audio_for_text(text, debug=True): + print(f"Sample rate: {sr}") + print(f"Audio shape: {audio.shape}") + print(f"Audio dtype: {audio.dtype}") + print(f"Audio range: [{audio.min():.3f}, {audio.max():.3f}]") + print(f"Audio stats: mean={audio.mean():.3f}, std={audio.std():.3f}") + + # Check for silence + if np.abs(audio).max() < 0.001: + print("⚠️ Audio appears to be silent") + break + +debug_audio_generation() +``` + +**Solutions**: + +1. **Check text input**: + ```python + # Ensure text is not empty or too short + text = text.strip() + if len(text) < 4: + print("Text too short for generation") + ``` + +2. **Verify voice selection**: + ```python + from vocalizr import CHOICES + print("Available voices:", list(CHOICES.keys())) + ``` + +3. **Test with simple text**: + ```python + # Use simple ASCII text first + test_text = "This is a simple test." + ``` + +### Distorted Audio + +**Problem**: Generated audio sounds distorted or garbled + +**Solutions**: + +1. **Check sample rate**: + ```python + # Ensure correct sample rate when saving + import soundfile as sf + sf.write('output.wav', audio, 24000) # Use correct sample rate + ``` + +2. **Verify audio format**: + ```python + # Check audio data type and range + audio = audio.astype(np.float32) + audio = np.clip(audio, -1.0, 1.0) # Ensure proper range + ``` + +3. **Test different voices**: + ```python + # Some voices might work better for certain content + for voice in ['af_heart', 'bf_emma', 'am_michael']: + print(f"Testing voice: {voice}") + # Generate and test + ``` + +### Slow Audio Playback + +**Problem**: Audio plays back too slowly or quickly + +**Solutions**: + +1. **Check speed parameter**: + ```python + # Ensure speed is reasonable + speed = max(0.5, min(2.0, speed)) # Clamp to valid range + ``` + +2. **Verify sample rate**: + ```python + # Use consistent sample rate + EXPECTED_SAMPLE_RATE = 24000 + ``` + +## Docker & Container Issues + +### Container Won't Start + +**Problem**: Docker container fails to start + +**Diagnostics**: +```bash +# Check container logs +docker logs vocalizr + +# Check container status +docker ps -a + +# Inspect container +docker inspect vocalizr +``` + +**Solutions**: + +1. **Check resource limits**: + ```bash + # Increase memory limit + docker run --memory=8g --cpus=4 vocalizr + ``` + +2. **Verify environment variables**: + ```bash + docker run -e DEBUG=true vocalizr + ``` + +3. **Check port mapping**: + ```bash + # Ensure port isn't already in use + docker run -p 8080:7860 vocalizr + ``` + +### Volume Mount Issues + +**Problem**: Cannot access mounted volumes + +**Solutions**: + +1. **Check permissions**: + ```bash + # Ensure directories exist and have correct permissions + mkdir -p ./cache ./results ./logs + chmod 755 ./cache ./results ./logs + ``` + +2. **Use absolute paths**: + ```bash + docker run -v $(pwd)/cache:/app/cache vocalizr + ``` + +3. **SELinux issues** (CentOS/RHEL): + ```bash + # Add :Z flag for SELinux + docker run -v $(pwd)/cache:/app/cache:Z vocalizr + ``` + +## Deployment Issues + +### Load Balancer Problems + +**Problem**: Load balancer not distributing traffic correctly + +**Solutions**: + +1. **Check health endpoints**: + ```bash + curl http://localhost:7860/health + ``` + +2. **Verify backend status**: + ```bash + # For HAProxy + echo "show stat" | socat stdio /var/run/haproxy/admin.sock + ``` + +3. **Test individual backends**: + ```bash + curl -H "Host: your-domain.com" http://backend1:7860/health + ``` + +### SSL Certificate Issues + +**Problem**: SSL/TLS certificate errors + +**Solutions**: + +1. **Check certificate validity**: + ```bash + openssl x509 -in cert.pem -text -noout + openssl s_client -connect your-domain.com:443 + ``` + +2. **Verify certificate chain**: + ```bash + curl -I https://your-domain.com + ``` + +3. **Test with curl**: + ```bash + curl -v https://your-domain.com/health + ``` + +### Kubernetes Issues + +**Problem**: Pods not starting or crashing + +**Diagnostics**: +```bash +# Check pod status +kubectl get pods -n vocalizr + +# Check pod logs +kubectl logs -f deployment/vocalizr -n vocalizr + +# Describe pod for events +kubectl describe pod -n vocalizr + +# Check resource usage +kubectl top pods -n vocalizr +``` + +**Solutions**: + +1. **Check resource requests/limits**: + ```yaml + resources: + requests: + memory: "4Gi" + cpu: "2" + limits: + memory: "8Gi" + cpu: "4" + ``` + +2. **Verify image pull**: + ```bash + kubectl describe pod -n vocalizr | grep -A 5 Events + ``` + +3. **Check node resources**: + ```bash + kubectl describe nodes + kubectl top nodes + ``` + +## Development Issues + +### IDE/Editor Problems + +**Problem**: Code completion or linting not working + +**Solutions**: + +1. **VS Code configuration**: + ```json + { + "python.defaultInterpreterPath": "./venv/bin/python", + "python.linting.ruffEnabled": true + } + ``` + +2. **Install development dependencies**: + ```bash + pip install -e ".[dev]" + ``` + +3. **Rebuild language server cache**: + - VS Code: Reload window (Ctrl+Shift+P > "Developer: Reload Window") + - PyCharm: File > Invalidate Caches and Restart + +### Testing Issues + +**Problem**: Tests failing or not running + +**Solutions**: + +1. **Install test dependencies**: + ```bash + pip install pytest pytest-cov + ``` + +2. **Run tests with verbose output**: + ```bash + pytest -v tests/ + ``` + +3. **Check test environment**: + ```bash + # Ensure test environment is isolated + python -m pytest --tb=short + ``` + +### Import Errors + +**Problem**: Cannot import vocalizr modules + +**Solutions**: + +1. **Check installation**: + ```bash + pip show vocalizr + python -c "import vocalizr; print(vocalizr.__file__)" + ``` + +2. **Verify PYTHONPATH**: + ```bash + echo $PYTHONPATH + python -c "import sys; print(sys.path)" + ``` + +3. **Reinstall in development mode**: + ```bash + pip uninstall vocalizr + pip install -e . + ``` + +## Getting Help + +### Log Collection + +When reporting issues, collect these logs: + +```bash +#!/bin/bash +# Collect diagnostic information + +echo "Collecting Vocalizr diagnostic information..." + +# Create output directory +mkdir -p vocalizr_diagnostics +cd vocalizr_diagnostics + +# System information +echo "System Information:" > system_info.txt +uname -a >> system_info.txt +cat /etc/os-release >> system_info.txt 2>/dev/null || sw_vers >> system_info.txt 2>/dev/null +free -h >> system_info.txt 2>/dev/null || vm_stat >> system_info.txt 2>/dev/null + +# Python environment +echo "Python Environment:" > python_info.txt +python --version >> python_info.txt +pip list >> python_info.txt + +# GPU information +echo "GPU Information:" > gpu_info.txt +nvidia-smi >> gpu_info.txt 2>/dev/null || echo "No NVIDIA GPU detected" >> gpu_info.txt + +# Docker information (if applicable) +if command -v docker &> /dev/null; then + echo "Docker Information:" > docker_info.txt + docker version >> docker_info.txt + docker ps -a >> docker_info.txt +fi + +# Application logs +if [ -d "../logs" ]; then + cp -r ../logs/ ./ +fi + +# Create archive +cd .. +tar -czf vocalizr_diagnostics.tar.gz vocalizr_diagnostics/ +echo "Diagnostic information saved to: vocalizr_diagnostics.tar.gz" +``` + +### Issue Reporting Template + +When creating an issue, include: + +```markdown +## Environment +- OS: [Ubuntu 22.04 / macOS 13.0 / Windows 11] +- Python version: [3.12.0] +- Vocalizr version: [0.0.1] +- Installation method: [pip / Docker / source] + +## Description +Brief description of the issue. + +## Steps to Reproduce +1. Step 1 +2. Step 2 +3. Step 3 + +## Expected Behavior +What should happen. + +## Actual Behavior +What actually happens. + +## Error Messages +``` +Paste any error messages here +``` + +## Additional Context +- Any relevant configuration +- Screenshots if applicable +- Related issues or discussions + +## Diagnostic Information +Please attach the diagnostic bundle from the log collection script. +``` + +### Community Resources + +- **GitHub Issues**: [https://github.com/AlphaSphereDotAI/vocalizr/issues](https://github.com/AlphaSphereDotAI/vocalizr/issues) +- **GitHub Discussions**: [https://github.com/AlphaSphereDotAI/vocalizr/discussions](https://github.com/AlphaSphereDotAI/vocalizr/discussions) +- **Email Support**: [mohamed.hisham.abdelzaher@gmail.com](mailto:mohamed.hisham.abdelzaher@gmail.com) + +### Before Reporting Issues + +1. **Search existing issues** for similar problems +2. **Try the latest version** to see if the issue is already fixed +3. **Run the diagnostic script** to gather system information +4. **Try minimal reproduction** to isolate the problem +5. **Check documentation** for configuration options + +### Emergency Procedures + +For critical production issues: + +1. **Immediate mitigation**: + - Switch to backup instances + - Implement circuit breakers + - Scale down if resource exhaustion + +2. **Data collection**: + - Capture logs before restart + - Save memory dumps if needed + - Document timeline of events + +3. **Recovery**: + - Restart services in safe mode + - Gradually restore full functionality + - Monitor for recurring issues + +## Next Steps + +- Review [Configuration Guide](CONFIGURATION.md) for optimization options +- Check [Deployment Guide](DEPLOYMENT.md) for production best practices +- See [Development Guide](DEVELOPMENT.md) for debugging techniques +- Visit [Examples](EXAMPLES.md) for working code samples \ No newline at end of file diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 00000000..0e0477c5 --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,241 @@ +# 🎯 Usage Guide + +Learn how to use Vocalizr effectively through its web interface, command line, and Python API. + +## Table of Contents + +- [Web Interface](#web-interface) +- [Command Line Interface](#command-line-interface) +- [Voice Selection](#voice-selection) +- [Configuration Options](#configuration-options) +- [File Management](#file-management) +- [Best Practices](#best-practices) + +## Web Interface + +The Gradio web interface provides an intuitive way to generate speech from text. + +### Launching the Interface + +```bash +# Start the application +uvx vocalizr + +# Access at http://localhost:7860 +``` + +### Interface Components + +#### Text Input +- **Input Text Field**: Enter the text you want to convert to speech +- **Character Limit**: Set maximum characters to process (-1 for unlimited) + +#### Voice Configuration +- **Voice Dropdown**: Select from 20+ available voices +- **Speed Slider**: Adjust playback speed (0.5x to 2.0x) + +#### Hardware Settings +- **Hardware Display**: Shows current GPU/CPU status +- **Automatic Detection**: CUDA GPU automatically detected if available + +#### Output Options +- **Save Audio File**: Enable to save generated audio as WAV file +- **Debug Mode**: Enable for detailed logging +- **Streaming Output**: Real-time audio generation and playback + +#### Controls +- **Generate Button**: Start audio generation +- **Stop Button**: Cancel ongoing generation +- **Audio Player**: Play, pause, and download generated audio + +### Step-by-Step Usage + +1. **Enter Text**: Type or paste your text in the input field +2. **Select Voice**: Choose your preferred voice from the dropdown +3. **Adjust Settings**: Set speed, character limit, and other options +4. **Generate**: Click the "Generate" button +5. **Listen**: Audio will play automatically when ready +6. **Download**: Save audio file if needed + +## Command Line Interface + +### Basic Usage + +```bash +# Start with default settings +vocalizr + +# Custom port +GRADIO_SERVER_PORT=8080 vocalizr + +# Debug mode +DEBUG=true vocalizr + +# Custom server name (for external access) +GRADIO_SERVER_NAME=0.0.0.0 vocalizr +``` + +### Environment Variables + +Set these before running `vocalizr`: + +```bash +export GRADIO_SERVER_NAME=localhost # Server host +export GRADIO_SERVER_PORT=7860 # Server port +export DEBUG=false # Debug mode +export HF_HOME=/path/to/cache # Hugging Face cache directory +``` + +### Production Deployment + +```bash +# Production settings +export GRADIO_SERVER_NAME=0.0.0.0 +export GRADIO_SERVER_PORT=80 +export DEBUG=false + +vocalizr +``` + +## Voice Selection + +Vocalizr offers a variety of voices with different characteristics: + +### American Voices (Female) +- `af_heart` - Heart ❀️ (warm, friendly) +- `af_bella` - Bella πŸ”₯ (energetic, vibrant) +- `af_nicole` - Nicole 🎧 (professional, clear) +- `af_aoede` - Aoede (melodic, smooth) +- `af_kore` - Kore (gentle, calm) +- `af_sarah` - Sarah (natural, conversational) +- `af_nova` - Nova (modern, crisp) +- `af_sky` - Sky (airy, light) +- `af_alloy` - Alloy (strong, confident) +- `af_jessica` - Jessica (friendly, approachable) +- `af_river` - River (flowing, dynamic) + +### American Voices (Male) +- `am_michael` - Michael (authoritative, deep) +- `am_fenrir` - Fenrir (powerful, commanding) +- `am_puck` - Puck (playful, energetic) +- `am_echo` - Echo (resonant, clear) +- `am_eric` - Eric (professional, steady) +- `am_liam` - Liam (youthful, bright) +- `am_onyx` - Onyx (rich, smooth) +- `am_santa` - Santa (jolly, warm) +- `am_adam` - Adam (classic, reliable) + +### British Voices +- `bf_emma` - Emma (female, elegant, refined) +- `bf_isabella` - Isabella (female, sophisticated) +- `bf_alice` - Alice (female, charming, proper) +- `bf_lily` - Lily (female, gentle, sweet) +- `bm_george` - George (male, distinguished) +- `bm_fable` - Fable (male, storytelling) +- `bm_lewis` - Lewis (male, intellectual) +- `bm_daniel` - Daniel (male, professional) + +### Choosing the Right Voice + +Consider these factors when selecting a voice: + +- **Content Type**: Professional content vs. casual conversation +- **Audience**: Age group and preferences +- **Tone**: Formal, friendly, energetic, calm +- **Accent**: American vs. British English +- **Gender**: Male vs. female voice preference + +## Configuration Options + +### Speed Control +- **Range**: 0.5x to 2.0x +- **Default**: 1.0x (normal speed) +- **Use Cases**: + - 0.5x-0.8x: Learning, comprehension + - 1.0x: Normal conversation + - 1.2x-2.0x: Quick information delivery + +### Character Limits +- **Default**: -1 (unlimited) +- **Recommended**: 500-1000 characters for optimal performance +- **Maximum**: Depends on available memory + +### File Output +- **Format**: WAV (24kHz sample rate) +- **Location**: `results/` directory with timestamp +- **Naming**: `YYYY-MM-DD_HH-MM-SS.wav` + +## File Management + +### Output Structure +``` +vocalizr/ +β”œβ”€β”€ results/ # Generated audio files +β”‚ └── 2024-01-15_14-30-25.wav +β”œβ”€β”€ logs/ # Application logs +β”‚ └── 2024-01-15_14-30-25.log +└── cache/ # Model cache (automatic) +``` + +### Cleanup +```bash +# Remove old audio files (older than 7 days) +find results/ -name "*.wav" -mtime +7 -delete + +# Clear logs +rm -rf logs/*.log + +# Clear model cache (will re-download on next use) +rm -rf ~/.cache/huggingface/ +``` + +## Best Practices + +### Performance Optimization +1. **Use GPU**: Enable CUDA for faster generation +2. **Batch Processing**: Process multiple texts together +3. **Reasonable Length**: Keep texts under 1000 characters +4. **Cache Models**: Reuse the same voice for multiple generations + +### Quality Guidelines +1. **Text Preparation**: Clean text of special characters +2. **Punctuation**: Use proper punctuation for natural pauses +3. **Voice Consistency**: Use the same voice for related content +4. **Speed Selection**: Choose appropriate speed for content type + +### Resource Management +1. **Memory**: Monitor RAM usage with large texts +2. **Storage**: Regularly clean up generated files +3. **Network**: Ensure stable connection for model downloads +4. **Logs**: Monitor application logs for issues + +### Production Tips +1. **Error Handling**: Implement retry logic for network issues +2. **Rate Limiting**: Limit concurrent generations +3. **Monitoring**: Track generation metrics +4. **Backup**: Save important generated audio files + +## Troubleshooting + +### Common Issues + +#### Audio Quality Problems +- **Solution**: Try different voices or adjust speed +- **Check**: Text formatting and punctuation + +#### Performance Issues +- **Solution**: Enable GPU acceleration +- **Check**: System memory and CPU usage + +#### Generation Failures +- **Solution**: Reduce text length or character limit +- **Check**: Network connection and logs + +For more troubleshooting help, see the [Troubleshooting Guide](TROUBLESHOOTING.md). + +## Next Steps + +- Explore the [API Documentation](API.md) for detailed technical reference +- Check [Examples](EXAMPLES.md) for real-world usage scenarios +- Review [Configuration](CONFIGURATION.md) for advanced customization +- See [Development Guide](DEVELOPMENT.md) for contributing to the project diff --git a/docs/VOICES.md b/docs/VOICES.md new file mode 100644 index 00000000..05bb2d3b --- /dev/null +++ b/docs/VOICES.md @@ -0,0 +1,632 @@ +# 🎭 Voice Reference Guide + +Complete reference for all available voices in Vocalizr, including characteristics, use cases, and audio samples. + +## Table of Contents + +- [Voice Categories](#voice-categories) +- [American Female Voices](#american-female-voices) +- [American Male Voices](#american-male-voices) +- [British Female Voices](#british-female-voices) +- [British Male Voices](#british-male-voices) +- [Voice Selection Guide](#voice-selection-guide) +- [Technical Specifications](#technical-specifications) +- [Usage Examples](#usage-examples) + +## Voice Categories + +Vocalizr offers **20+ distinct voices** across different accents and genders: + +| Category | Count | Accent | Gender | +|----------|-------|---------|--------| +| American Female | 11 | πŸ‡ΊπŸ‡Έ American | 🚺 Female | +| American Male | 9 | πŸ‡ΊπŸ‡Έ American | 🚹 Male | +| British Female | 4 | πŸ‡¬πŸ‡§ British | 🚺 Female | +| British Male | 4 | πŸ‡¬πŸ‡§ British | 🚹 Male | + +## American Female Voices + +### af_heart - Heart ❀️ +- **ID**: `af_heart` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Heart ❀️ +- **Characteristics**: Warm, friendly, approachable +- **Tone**: Caring and compassionate +- **Best For**: + - Customer service messages + - Healthcare applications + - Emotional content + - Personal assistants +- **Voice Quality**: Clear and natural +- **Recommended Speed**: 0.9 - 1.2x + +### af_bella - Bella πŸ”₯ +- **ID**: `af_bella` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Bella πŸ”₯ +- **Characteristics**: Energetic, vibrant, dynamic +- **Tone**: Enthusiastic and lively +- **Best For**: + - Marketing content + - Fitness applications + - Youth-oriented content + - Motivational messages +- **Voice Quality**: Bright and engaging +- **Recommended Speed**: 1.0 - 1.3x + +### af_nicole - Nicole 🎧 +- **ID**: `af_nicole` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Nicole 🎧 +- **Characteristics**: Professional, clear, articulate +- **Tone**: Business-like and polished +- **Best For**: + - Corporate presentations + - Technical documentation + - News reading + - Professional voiceovers +- **Voice Quality**: Crisp and authoritative +- **Recommended Speed**: 0.8 - 1.1x + +### af_aoede - Aoede +- **ID**: `af_aoede` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Aoede +- **Characteristics**: Melodic, smooth, artistic +- **Tone**: Creative and expressive +- **Best For**: + - Poetry reading + - Creative content + - Storytelling + - Artistic projects +- **Voice Quality**: Musical and flowing +- **Recommended Speed**: 0.9 - 1.2x + +### af_kore - Kore +- **ID**: `af_kore` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Kore +- **Characteristics**: Gentle, calm, soothing +- **Tone**: Peaceful and serene +- **Best For**: + - Meditation apps + - Sleep stories + - Relaxation content + - Therapeutic applications +- **Voice Quality**: Soft and comforting +- **Recommended Speed**: 0.7 - 1.0x + +### af_sarah - Sarah +- **ID**: `af_sarah` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Sarah +- **Characteristics**: Natural, conversational, relatable +- **Tone**: Friendly and down-to-earth +- **Best For**: + - Casual conversations + - Social media content + - Personal blogs + - Everyday applications +- **Voice Quality**: Authentic and approachable +- **Recommended Speed**: 0.9 - 1.2x + +### af_nova - Nova +- **ID**: `af_nova` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Nova +- **Characteristics**: Modern, crisp, contemporary +- **Tone**: Fresh and innovative +- **Best For**: + - Technology content + - Modern applications + - Young adult content + - Innovation-focused messages +- **Voice Quality**: Sharp and precise +- **Recommended Speed**: 1.0 - 1.3x + +### af_sky - Sky +- **ID**: `af_sky` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Sky +- **Characteristics**: Airy, light, uplifting +- **Tone**: Optimistic and bright +- **Best For**: + - Travel content + - Inspirational messages + - Nature documentaries + - Uplifting content +- **Voice Quality**: Clear and upbeat +- **Recommended Speed**: 0.9 - 1.2x + +### af_alloy - Alloy +- **ID**: `af_alloy` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Alloy +- **Characteristics**: Strong, confident, assertive +- **Tone**: Powerful and determined +- **Best For**: + - Leadership content + - Motivational speaking + - Sports commentary + - Empowerment messages +- **Voice Quality**: Bold and commanding +- **Recommended Speed**: 0.9 - 1.2x + +### af_jessica - Jessica +- **ID**: `af_jessica` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 Jessica +- **Characteristics**: Friendly, approachable, welcoming +- **Tone**: Warm and inviting +- **Best For**: + - Welcome messages + - Community content + - Family-friendly applications + - Social interactions +- **Voice Quality**: Pleasant and familiar +- **Recommended Speed**: 0.9 - 1.2x + +### af_river - River +- **ID**: `af_river` +- **Display**: πŸ‡ΊπŸ‡Έ 🚺 River +- **Characteristics**: Flowing, dynamic, natural +- **Tone**: Organic and fluid +- **Best For**: + - Nature content + - Flowing narratives + - Environmental topics + - Continuous reading +- **Voice Quality**: Smooth and continuous +- **Recommended Speed**: 0.8 - 1.1x + +## American Male Voices + +### am_michael - Michael +- **ID**: `am_michael` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Michael +- **Characteristics**: Authoritative, deep, professional +- **Tone**: Confident and reliable +- **Best For**: + - News broadcasting + - Corporate communications + - Documentary narration + - Authority figures +- **Voice Quality**: Rich and commanding +- **Recommended Speed**: 0.8 - 1.1x + +### am_fenrir - Fenrir +- **ID**: `am_fenrir` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Fenrir +- **Characteristics**: Powerful, commanding, intense +- **Tone**: Strong and formidable +- **Best For**: + - Action content + - Gaming applications + - Dramatic readings + - Powerful narratives +- **Voice Quality**: Bold and imposing +- **Recommended Speed**: 0.9 - 1.2x + +### am_puck - Puck +- **ID**: `am_puck` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Puck +- **Characteristics**: Playful, energetic, mischievous +- **Tone**: Fun and lighthearted +- **Best For**: + - Children's content + - Comedy applications + - Gaming characters + - Entertainment content +- **Voice Quality**: Animated and lively +- **Recommended Speed**: 1.0 - 1.4x + +### am_echo - Echo +- **ID**: `am_echo` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Echo +- **Characteristics**: Resonant, clear, distinctive +- **Tone**: Memorable and impactful +- **Best For**: + - Brand messaging + - Announcements + - Memorable content + - Echo effects +- **Voice Quality**: Clear and resonant +- **Recommended Speed**: 0.9 - 1.2x + +### am_eric - Eric +- **ID**: `am_eric` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Eric +- **Characteristics**: Professional, steady, reliable +- **Tone**: Trustworthy and consistent +- **Best For**: + - Business applications + - Technical content + - Educational material + - Professional services +- **Voice Quality**: Steady and dependable +- **Recommended Speed**: 0.8 - 1.1x + +### am_liam - Liam +- **ID**: `am_liam` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Liam +- **Characteristics**: Youthful, bright, modern +- **Tone**: Fresh and contemporary +- **Best For**: + - Young adult content + - Technology topics + - Modern applications + - Casual conversations +- **Voice Quality**: Clear and youthful +- **Recommended Speed**: 1.0 - 1.3x + +### am_onyx - Onyx +- **ID**: `am_onyx` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Onyx +- **Characteristics**: Rich, smooth, sophisticated +- **Tone**: Elegant and refined +- **Best For**: + - Luxury content + - Sophisticated messaging + - Premium services + - Elegant presentations +- **Voice Quality**: Smooth and polished +- **Recommended Speed**: 0.8 - 1.1x + +### am_santa - Santa +- **ID**: `am_santa` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Santa +- **Characteristics**: Jolly, warm, festive +- **Tone**: Cheerful and magical +- **Best For**: + - Holiday content + - Children's stories + - Festive messages + - Seasonal applications +- **Voice Quality**: Warm and jolly +- **Recommended Speed**: 0.8 - 1.1x + +### am_adam - Adam +- **ID**: `am_adam` +- **Display**: πŸ‡ΊπŸ‡Έ 🚹 Adam +- **Characteristics**: Classic, reliable, timeless +- **Tone**: Traditional and trustworthy +- **Best For**: + - Classic content + - Traditional messaging + - Reliable narration + - Timeless applications +- **Voice Quality**: Classic and dependable +- **Recommended Speed**: 0.9 - 1.2x + +## British Female Voices + +### bf_emma - Emma +- **ID**: `bf_emma` +- **Display**: πŸ‡¬πŸ‡§ 🚺 Emma +- **Characteristics**: Elegant, refined, sophisticated +- **Tone**: Polished and cultured +- **Best For**: + - Classical content + - Educational material + - Cultural applications + - Sophisticated messaging +- **Voice Quality**: Crisp and articulate +- **Recommended Speed**: 0.8 - 1.1x + +### bf_isabella - Isabella +- **ID**: `bf_isabella` +- **Display**: πŸ‡¬πŸ‡§ 🚺 Isabella +- **Characteristics**: Sophisticated, graceful, aristocratic +- **Tone**: Noble and distinguished +- **Best For**: + - Luxury brands + - Historical content + - Fine arts + - Premium services +- **Voice Quality**: Graceful and refined +- **Recommended Speed**: 0.8 - 1.1x + +### bf_alice - Alice +- **ID**: `bf_alice` +- **Display**: πŸ‡¬πŸ‡§ 🚺 Alice +- **Characteristics**: Charming, proper, delightful +- **Tone**: Pleasant and well-mannered +- **Best For**: + - Children's literature + - Charming narratives + - Proper etiquette content + - Delightful stories +- **Voice Quality**: Charming and clear +- **Recommended Speed**: 0.9 - 1.2x + +### bf_lily - Lily +- **ID**: `bf_lily` +- **Display**: πŸ‡¬πŸ‡§ 🚺 Lily +- **Characteristics**: Gentle, sweet, endearing +- **Tone**: Soft and caring +- **Best For**: + - Gentle content + - Caring messages + - Sweet narratives + - Tender applications +- **Voice Quality**: Soft and melodious +- **Recommended Speed**: 0.8 - 1.1x + +## British Male Voices + +### bm_george - George +- **ID**: `bm_george` +- **Display**: πŸ‡¬πŸ‡§ 🚹 George +- **Characteristics**: Distinguished, authoritative, regal +- **Tone**: Noble and commanding +- **Best For**: + - Formal announcements + - Historical documentaries + - Authority figures + - Distinguished content +- **Voice Quality**: Rich and authoritative +- **Recommended Speed**: 0.8 - 1.1x + +### bm_fable - Fable +- **ID**: `bm_fable` +- **Display**: πŸ‡¬πŸ‡§ 🚹 Fable +- **Characteristics**: Storytelling, narrative, wise +- **Tone**: Wise and captivating +- **Best For**: + - Storytelling + - Fables and tales + - Narrative content + - Wisdom sharing +- **Voice Quality**: Engaging and wise +- **Recommended Speed**: 0.8 - 1.1x + +### bm_lewis - Lewis +- **ID**: `bm_lewis` +- **Display**: πŸ‡¬πŸ‡§ 🚹 Lewis +- **Characteristics**: Intellectual, scholarly, thoughtful +- **Tone**: Academic and insightful +- **Best For**: + - Academic content + - Intellectual discussions + - Educational material + - Scholarly work +- **Voice Quality**: Clear and thoughtful +- **Recommended Speed**: 0.8 - 1.1x + +### bm_daniel - Daniel +- **ID**: `bm_daniel` +- **Display**: πŸ‡¬πŸ‡§ 🚹 Daniel +- **Characteristics**: Professional, polished, competent +- **Tone**: Reliable and skilled +- **Best For**: + - Professional services + - Business content + - Competent messaging + - Skilled narration +- **Voice Quality**: Professional and clear +- **Recommended Speed**: 0.9 - 1.2x + +## Voice Selection Guide + +### By Use Case + +#### **Corporate & Business** +- **Primary**: `af_nicole`, `am_eric`, `bf_emma`, `bm_daniel` +- **Secondary**: `am_michael`, `bf_isabella` + +#### **Customer Service** +- **Primary**: `af_heart`, `af_jessica`, `af_sarah` +- **Secondary**: `am_adam`, `bf_alice` + +#### **Entertainment & Gaming** +- **Primary**: `am_puck`, `af_bella`, `am_fenrir` +- **Secondary**: `af_nova`, `am_liam` + +#### **Educational Content** +- **Primary**: `af_nicole`, `bm_lewis`, `bf_emma` +- **Secondary**: `am_eric`, `af_sarah` + +#### **Healthcare & Wellness** +- **Primary**: `af_heart`, `af_kore`, `bf_lily` +- **Secondary**: `af_sarah`, `am_adam` + +#### **Technology & Innovation** +- **Primary**: `af_nova`, `am_liam`, `af_nicole` +- **Secondary**: `am_echo`, `af_bella` + +#### **Storytelling & Narration** +- **Primary**: `bm_fable`, `af_aoede`, `af_river` +- **Secondary**: `bf_alice`, `am_santa` + +#### **Luxury & Premium** +- **Primary**: `am_onyx`, `bf_isabella`, `bm_george` +- **Secondary**: `bf_emma`, `am_michael` + +### By Demographic + +#### **Children (5-12)** +- **Best**: `am_santa`, `bf_alice`, `am_puck` +- **Good**: `af_jessica`, `af_heart` + +#### **Teenagers (13-18)** +- **Best**: `af_bella`, `am_liam`, `af_nova` +- **Good**: `af_sarah`, `am_puck` + +#### **Young Adults (19-35)** +- **Best**: `af_nova`, `am_liam`, `af_sarah` +- **Good**: `af_bella`, `am_echo` + +#### **Middle-aged (36-55)** +- **Best**: `af_nicole`, `am_eric`, `af_heart` +- **Good**: `am_michael`, `bf_emma` + +#### **Seniors (55+)** +- **Best**: `am_michael`, `bf_emma`, `bm_george` +- **Good**: `af_heart`, `am_adam` + +### By Content Type + +#### **News & Information** +- **Best**: `af_nicole`, `am_michael`, `bf_emma` +- **Speed**: 0.9 - 1.1x + +#### **Marketing & Sales** +- **Best**: `af_bella`, `am_echo`, `af_alloy` +- **Speed**: 1.0 - 1.3x + +#### **Documentation & Tutorials** +- **Best**: `af_nicole`, `am_eric`, `bm_lewis` +- **Speed**: 0.8 - 1.0x + +#### **Casual Conversation** +- **Best**: `af_sarah`, `am_adam`, `af_jessica` +- **Speed**: 0.9 - 1.2x + +## Technical Specifications + +### Audio Quality +- **Sample Rate**: 24,000 Hz (24 kHz) +- **Bit Depth**: 32-bit float +- **Channels**: Mono (1 channel) +- **Format**: Uncompressed PCM +- **Dynamic Range**: Full 32-bit float range (-1.0 to +1.0) + +### Generation Parameters +- **Speed Range**: 0.5x to 2.0x +- **Default Speed**: 1.0x +- **Character Limit**: Configurable (default: unlimited) +- **Processing Time**: Varies by text length and hardware + +### Model Information +- **Base Model**: Kokoro-82M +- **Provider**: HexGrad +- **Language**: English +- **Voice Count**: 28 total voices +- **Model Size**: ~82 million parameters + +## Usage Examples + +### Basic Voice Selection + +```python +from vocalizr.model import generate_audio_for_text + +# Professional female voice +for sr, audio in generate_audio_for_text( + "Welcome to our corporate presentation.", + voice="af_nicole", + speed=1.0 +): + break + +# Friendly customer service +for sr, audio in generate_audio_for_text( + "Thank you for calling. How can I help you today?", + voice="af_heart", + speed=0.9 +): + break + +# Energetic marketing +for sr, audio in generate_audio_for_text( + "Don't miss our amazing sale happening now!", + voice="af_bella", + speed=1.2 +): + break +``` + +### Voice Comparison Script + +```python +def compare_voices_for_text(text, voices): + """Compare how different voices sound with the same text.""" + + results = {} + + for voice in voices: + print(f"Generating with voice: {voice}") + + audio_chunks = [] + for sr, audio in generate_audio_for_text(text, voice=voice): + audio_chunks.append(audio) + + if audio_chunks: + full_audio = np.concatenate(audio_chunks) + filename = f"comparison_{voice}.wav" + sf.write(filename, full_audio, sr) + results[voice] = filename + + return results + +# Compare professional voices +professional_voices = ["af_nicole", "am_eric", "bf_emma", "bm_daniel"] +test_text = "This is a professional business announcement." + +comparison_files = compare_voices_for_text(test_text, professional_voices) +print("Generated comparison files:", comparison_files) +``` + +### Voice Recommendation Engine + +```python +def recommend_voice(content_type, target_audience, tone_preference): + """Recommend voices based on content characteristics.""" + + recommendations = { + "corporate": { + "professional": ["af_nicole", "am_eric", "bf_emma"], + "friendly": ["af_heart", "af_jessica", "am_adam"], + "authoritative": ["am_michael", "bm_george", "af_alloy"] + }, + "entertainment": { + "energetic": ["af_bella", "am_puck", "af_nova"], + "dramatic": ["am_fenrir", "af_alloy", "bm_fable"], + "playful": ["am_puck", "am_santa", "bf_alice"] + }, + "education": { + "clear": ["af_nicole", "bm_lewis", "bf_emma"], + "engaging": ["af_sarah", "am_liam", "af_aoede"], + "authoritative": ["am_michael", "bm_george", "bf_emma"] + } + } + + voice_list = recommendations.get(content_type, {}).get(tone_preference, ["af_heart"]) + + # Adjust for target audience + if target_audience == "children": + child_friendly = ["am_santa", "bf_alice", "af_jessica", "am_puck"] + voice_list = [v for v in voice_list if v in child_friendly] or child_friendly[:2] + elif target_audience == "seniors": + senior_friendly = ["am_michael", "bf_emma", "af_heart", "bm_george"] + voice_list = [v for v in voice_list if v in senior_friendly] or senior_friendly[:2] + + return voice_list[0] if voice_list else "af_heart" + +# Usage examples +business_voice = recommend_voice("corporate", "adults", "professional") +kids_voice = recommend_voice("entertainment", "children", "playful") +education_voice = recommend_voice("education", "adults", "clear") + +print(f"Business: {business_voice}") +print(f"Kids: {kids_voice}") +print(f"Education: {education_voice}") +``` + +## Voice Quality Tips + +### Optimization Guidelines + +1. **Text Preparation**: + - Use proper punctuation for natural pauses + - Spell out numbers and abbreviations + - Add commas for breath marks + - Avoid special characters + +2. **Speed Selection**: + - **Slow (0.5-0.8x)**: Learning content, elderly audience + - **Normal (0.9-1.1x)**: General content, professional use + - **Fast (1.2-2.0x)**: Quick information, energetic content + +3. **Voice Matching**: + - Match voice personality to content tone + - Consider target audience demographics + - Test multiple voices for important content + - Maintain consistency within projects + +## Next Steps + +- Explore the [Usage Guide](USAGE.md) for practical implementation +- Check [Examples](EXAMPLES.md) for voice-specific use cases +- Review [API Documentation](API.md) for programmatic voice selection +- See [Configuration Guide](CONFIGURATION.md) for optimization settings \ No newline at end of file