A flexible and extensible framework for managing multi-agent conversations with support for speech recognition, text-to-speech, and language model integration.
With the ability for both the AI and user to interrupt each other, the interaction more closely resembles a natural, casual conversation. Both the user and the AI can interject at any point.
This system enables natural conversations between human users and AI agents, supporting:
- Real-time speech recognition
- Natural language processing
- Text-to-speech synthesis
- Multi-agent conversation management
- Interruption handling
- Turn-taking mechanics
- Python >= 3.9, < 3.12
git clone https://github.com/SquirrelModeller/seamless-conversation.git
cd seamless-conversationInstall PortAudio
-
Windows:
python -m pip install pyaudio -
macOS:
brew install portaudio -
Debian/Ununtu:
sudo apt-get install libportaudio2
Install PostgreSQL and Start the Service
-
Windows:
- Download and install PostgreSQL from the official website: postgresql.org/download/windows
- Follow the installer prompts and ensure the PostgreSQL service is running
-
macOS:
brew install postgresql brew services start postgresql@14 -
Debian/Ununtu
sudo apt-get install postgresql postgresql-contrib sudo systemctl start postgresql
Create a virtual environment (using Python 3.10 as an example):
- Windows
python -m venv seampyconvo seampyconvo\Scripts\activate - Unix-based systems (macOS and Linux):
python3.10 -m venv seampyconvo source seampyconvo/bin/activate
With the virtual environment activated, install the required Python packages:
python -m pip install -r requirements.txt
- macOS
Setup database user
createuser postgres createdb conversation_db
Run the Database Setup Script:
python src/database/setup_database.py
To make use of APIs either add your key to the config.yaml file:
api_key: YOUR_API_KEY
Or set the key as an environment variable:
- Unix-based systems (macOS and Linux):
export OPENAI_API_KEY=your_openai_api_key
python download_vosk_model.py
The dicussion tab is open if you have any questions, suggestions or want to talk about the project.
If you want to contribute to the code or work on the project, fork the repository and create a pull request. Please use the conventional commits format when making a commit.
We have a Discord Server.
Alternatively reach out at:
- email: [email protected]
Two system prompts are used. The first prompt is fed a personality and is only activated when the second prompt allows. The second prompt continually receives the conversation transcribed from the speech detected, be that an LLM (agent) or user talking. It then decides what action should be taken. It is informed if an interruption occurs or when an agent/user has finished speaking.