Skip to content

Commit 2e738d8

Browse files
Merge pull request #49 from pescheckit/feature_added-ai-tagging
Fixed ai tagging
2 parents 60c0bbf + 3c92aae commit 2e738d8

File tree

15 files changed

+693
-81
lines changed

15 files changed

+693
-81
lines changed

.github/workflows/ci-cd.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,11 @@ jobs:
8989
python-version: ${{ matrix.python-version }}
9090
cache: 'pip'
9191

92+
- name: Install system dependencies
93+
run: |
94+
sudo apt-get update
95+
sudo apt-get install -y gettext
96+
9297
- name: Install test dependencies
9398
run: |
9499
python -m pip install --upgrade pip

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,5 @@ db.sqlite3
88
.venv
99
python_gpt_po/_version.py
1010
CLAUDE.md
11+
venv/
12+
test_venv/

README.md

Lines changed: 61 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
![PyPI](https://img.shields.io/pypi/v/gpt-po-translator?label=gpt-po-translator)
55
![Downloads](https://pepy.tech/badge/gpt-po-translator)
66

7-
A robust tool for translating gettext (.po) files using AI models from multiple providers (OpenAI, Azure OpenAI, Anthropic / Claude, and DeepSeek). It supports both bulk and individual translations, handles fuzzy entries, and can infer target languages based on folder structures. Available as a Python package and Docker container with support for Python 3.8-3.12.
7+
A robust tool for translating gettext (.po) files using AI models from multiple providers (OpenAI, Azure OpenAI, Anthropic/Claude, and DeepSeek). It supports both bulk and individual translations, handles fuzzy entries, and can infer target languages based on folder structures. Available as a Python package and Docker container with support for Python 3.8-3.12.
88

99
## What is GPT-PO Translator?
1010

@@ -15,6 +15,7 @@ This tool helps you translate gettext (.po) files using AI models. It's perfect
1515
- **Multiple AI providers** - OpenAI, Azure OpenAI, Anthropic/Claude, and DeepSeek
1616
- **Flexible translation modes** - Bulk or entry-by-entry processing
1717
- **Smart language handling** - Auto-detects target languages from folder structure
18+
- **AI translation tracking** - Automatically tags AI-generated translations with comments
1819
- **Production-ready** - Includes retry logic, validation, and detailed logging
1920
- **Easy deployment** - Available as a Python package or Docker container
2021
- **Cross-version support** - Works with Python 3.8-3.12
@@ -68,6 +69,14 @@ docker run -v $(pwd):/data \
6869
-e OPENAI_API_KEY="your_key" \
6970
ghcr.io/pescheckit/python-gpt-po:latest \
7071
--folder /data --lang fr,de --bulk
72+
73+
# Or with Azure OpenAI
74+
docker run -v $(pwd):/data \
75+
-e AZURE_OPENAI_API_KEY="your_key" \
76+
-e AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" \
77+
-e AZURE_OPENAI_API_VERSION="2024-02-01" \
78+
ghcr.io/pescheckit/python-gpt-po:latest \
79+
--provider azure_openai --folder /data --lang fr,de --bulk
7180
```
7281

7382
## Setting Up API Keys
@@ -79,7 +88,11 @@ export OPENAI_API_KEY='your_api_key_here'
7988
# Or for other providers:
8089
export ANTHROPIC_API_KEY='your_api_key_here'
8190
export DEEPSEEK_API_KEY='your_api_key_here'
91+
92+
# For Azure OpenAI:
8293
export AZURE_OPENAI_API_KEY='your_api_key_here'
94+
export AZURE_OPENAI_ENDPOINT='https://your-resource.openai.azure.com/'
95+
export AZURE_OPENAI_API_VERSION='2024-02-01'
8396
```
8497

8598
### Option 2: Command Line
@@ -113,31 +126,61 @@ gpt-po-translator --provider anthropic --folder ./locales --lang de
113126
# Use DeepSeek models
114127
gpt-po-translator --provider deepseek --folder ./locales --lang de
115128

116-
# List available models for openai
117-
gpt-po-translator --provider openai --list-models
129+
# Use Azure OpenAI
130+
gpt-po-translator --provider azure_openai \
131+
--azure-openai-endpoint https://your-resource.openai.azure.com/ \
132+
--azure-openai-api-version 2024-02-01 \
133+
--folder ./locales --lang de
118134

119-
# List available models for azure openai
135+
# List available models for different providers
136+
gpt-po-translator --provider openai --list-models
120137
gpt-po-translator --provider azure_openai \
121-
--azure-openai-endpoint https://<deployment>.cognitiveservices.azure.com/ \
122-
--azure-openai-api-version <api_version> \
138+
--azure-openai-endpoint https://your-resource.openai.azure.com/ \
139+
--azure-openai-api-version 2024-02-01 \
123140
--list-models
124141
```
125142

143+
### AI Translation Tracking
144+
145+
**By default, all AI-generated translations are automatically marked with comments** for easy tracking and compliance:
146+
147+
```bash
148+
# Default behavior - AI translations are tagged with comments
149+
gpt-po-translator --folder ./locales --lang de
150+
151+
# Result in PO file:
152+
#. AI-generated
153+
msgid "Hello"
154+
msgstr "Hallo"
155+
156+
# To disable AI tagging (not recommended)
157+
gpt-po-translator --folder ./locales --lang de --no-ai-comment
158+
```
159+
160+
This helps you:
161+
- Identify which translations were made by AI vs human translators
162+
- Track incremental changes when new AI translations are added
163+
- Comply with requirements to identify AI-generated content
164+
165+
**Note:** Django's `makemessages` removes these comments when updating PO files, but translations are preserved. Re-run the translator after `makemessages` to restore AI tagging.
166+
126167
## Command Reference
127168

128169
| Option | Description |
129170
|--------|-------------|
130171
| `--folder` | Path to your .po files |
131172
| `--lang` | Target language codes (comma-separated, e.g., `de,fr`) |
132173
| `--detail-lang` | Full language names (e.g., `"German,French"`) |
133-
| `--fuzzy` | Remove fuzzy entries before translating |
174+
| `--fuzzy` | Remove fuzzy entries before translating (DEPRECATED - use `--fix-fuzzy`) |
175+
| `--fix-fuzzy` | Translate and fix fuzzy entries properly (recommended) |
134176
| `--bulk` | Enable batch translation (recommended for large files) |
135177
| `--bulksize` | Entries per batch (default: 50) |
136178
| `--model` | Specific AI model to use |
137-
| `--provider` | AI provider: `openai`, `anthropic`, or `deepseek` |
179+
| `--provider` | AI provider: `openai`, `azure_openai`, `anthropic`, or `deepseek` |
138180
| `--list-models` | Show available models for selected provider |
139181
| `--api_key` | Your API key |
140182
| `--folder-language` | Auto-detect languages from folder structure |
183+
| `--no-ai-comment` | Disable AI-generated comment tagging (enabled by default) |
141184

142185
## Advanced Docker Usage
143186

@@ -159,17 +202,19 @@ docker pull ghcr.io/pescheckit/python-gpt-po:0.3.0
159202
Mount any local directory to use in the container:
160203

161204
```bash
162-
# Windows example
205+
# Windows example with OpenAI
163206
docker run -v D:/projects/website/locales:/locales \
164207
-e OPENAI_API_KEY="your_key" \
165208
ghcr.io/pescheckit/python-gpt-po:latest \
166209
--folder /locales --lang fr,de --bulk
167210

168-
# Mac/Linux example
211+
# Mac/Linux example with Azure OpenAI
169212
docker run -v /Users/username/translations:/input \
170-
-e OPENAI_API_KEY="your_key" \
213+
-e AZURE_OPENAI_API_KEY="your_key" \
214+
-e AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" \
215+
-e AZURE_OPENAI_API_VERSION="2024-02-01" \
171216
ghcr.io/pescheckit/python-gpt-po:latest \
172-
--folder /input --lang fr,de --bulk
217+
--provider azure_openai --folder /input --lang fr,de --bulk
173218
```
174219

175220
## Requirements
@@ -194,9 +239,10 @@ docker run --rm -v $(pwd):/app -w /app --entrypoint python python-gpt-po -m pyte
194239

195240
## Documentation
196241

197-
For advanced usage and detailed documentation, please see:
198-
- [Advanced Usage Guide](docs/usage.md)
199-
- [GitHub Repository](https://github.com/pescheckit/python-gpt-po)
242+
For more detailed information:
243+
- **[Advanced Usage Guide](docs/usage.md)** - Comprehensive guide with all options and internal mechanics
244+
- **[Development Guide](docs/development.md)** - For contributors
245+
- **[GitHub Repository](https://github.com/pescheckit/python-gpt-po)** - Source code and issues
200246

201247
## License
202248

docs/usage.md

Lines changed: 101 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This guide provides an in-depth look at what really happens when you run `gpt-po
66

77
## Overview
88

9-
`gpt-po-translator` is a multi-provider tool for translating gettext (.po) files using AI models. It supports OpenAI, Anthropic, and DeepSeek. The tool offers two primary translation modes:
9+
`gpt-po-translator` is a multi-provider tool for translating gettext (.po) files using AI models. It supports OpenAI, Azure OpenAI, Anthropic, and DeepSeek. The tool offers two primary translation modes:
1010
- **Bulk Mode:** Processes a list of texts in batches to reduce the number of API calls.
1111
- **Individual Mode:** Translates each text entry one-by-one for more fine-grained control.
1212

@@ -22,9 +22,9 @@ It also manages fuzzy translations (by disabling or removing them) and can infer
2222

2323
- **API Key Setup:**
2424
The tool collects API keys from multiple sources:
25-
- Specific arguments (`--openai-key`, `--anthropic-key`, `--deepseek-key`)
25+
- Specific arguments (`--openai-key`, `--azure-openai-key`, `--anthropic-key`, `--deepseek-key`)
2626
- A fallback argument (`--api_key`) for OpenAI if no dedicated key is provided
27-
- Environment variables (e.g., `OPENAI_API_KEY`)
27+
- Environment variables (e.g., `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`)
2828

2929
It then initializes a `ProviderClients` instance that creates API client objects for the chosen providers.
3030

@@ -76,7 +76,7 @@ It also manages fuzzy translations (by disabling or removing them) and can infer
7676
The tool checks whether the translation is excessively verbose compared to the original text. If so, it retries the translation to ensure it remains concise.
7777

7878
- **Updating PO Files:**
79-
After translation, each PO file entry is updated with the new translation using `polib`. The tool logs a summary of how many entries were successfully translated and warns if any remain untranslated.
79+
After translation, each PO file entry is updated with the new translation using `polib`. By default, AI-generated translations are marked with a comment (`#. AI-generated`) for easy identification. The tool logs a summary of how many entries were successfully translated and warns if any remain untranslated.
8080

8181
---
8282

@@ -114,9 +114,14 @@ Below is a detailed explanation of all command-line arguments:
114114
*Behind the scenes:* These names are used in the translation prompts to give the AI clearer context, potentially improving translation quality.
115115
*Note:* The number of detailed names must match the number of language codes.
116116

117-
- **`--fuzzy`**
118-
*Description:* A flag that, when set, instructs the tool to remove fuzzy entries from the PO files before translation.
119-
*Behind the scenes:* The tool calls a dedicated method to strip fuzzy markers and flags from both the file content and metadata.
117+
- **`--fuzzy`** *(DEPRECATED)*
118+
*Description:* A flag that, when set, instructs the tool to remove fuzzy entries from the PO files before translation. **This option is DEPRECATED due to its risky behavior of removing fuzzy markers without actually translating the content.**
119+
*Behind the scenes:* The tool calls a dedicated method to strip fuzzy markers and flags from both the file content and metadata.
120+
*Warning:* This can lead to data loss and confusion. Use `--fix-fuzzy` instead.
121+
122+
- **`--fix-fuzzy`**
123+
*Description:* Translate and clean fuzzy entries safely (recommended over `--fuzzy`).
124+
*Behind the scenes:* The tool filters for entries with the 'fuzzy' flag and attempts to translate them, removing the flag upon successful translation. AI-generated translations are marked as usual unless `--no-ai-comment` is used.
120125

121126
- **`--bulk`**
122127
*Description:* Enables bulk translation mode, meaning multiple texts will be translated in a single API call.
@@ -131,7 +136,7 @@ Below is a detailed explanation of all command-line arguments:
131136
*Behind the scenes:* This key is merged with keys provided through other command-line arguments or environment variables.
132137

133138
- **`--provider <provider>`**
134-
*Description:* Specifies the AI provider to use for translations. Acceptable values are `openai`, `anthropic`, or `deepseek`.
139+
*Description:* Specifies the AI provider to use for translations. Acceptable values are `openai`, `azure_openai`, `anthropic`, or `deepseek`.
135140
*Behind the scenes:* If not specified, the tool auto-selects the first provider for which an API key is available.
136141

137142
- **`--model <model>`**
@@ -150,6 +155,18 @@ Below is a detailed explanation of all command-line arguments:
150155
*Description:* Provides the Anthropic API key directly.
151156
*Behind the scenes:* This key is used to initialize the Anthropic client.
152157

158+
- **`--azure-openai-key`**
159+
*Description:* Provides the Azure OpenAI API key directly.
160+
*Behind the scenes:* This key is used to initialize the Azure OpenAI client.
161+
162+
- **`--azure-openai-endpoint`**
163+
*Description:* Provides the Azure OpenAI endpoint URL (e.g., `https://your-resource.openai.azure.com/`).
164+
*Behind the scenes:* Required for Azure OpenAI connections along with the API version.
165+
166+
- **`--azure-openai-api-version`**
167+
*Description:* Specifies the Azure OpenAI API version (e.g., `2024-02-01`).
168+
*Behind the scenes:* Different API versions support different features and models.
169+
153170
- **`--deepseek-key`**
154171
*Description:* Provides the DeepSeek API key directly.
155172
*Behind the scenes:* This key is required to make API calls to DeepSeek’s translation service.
@@ -158,13 +175,89 @@ Below is a detailed explanation of all command-line arguments:
158175
*Description:* Enables inferring the target language from the folder structure.
159176
*Behind the scenes:* The tool inspects the path components (directory names) of each PO file and matches them against the provided language codes.
160177

178+
- **`--no-ai-comment`**
179+
*Description:* Disables the automatic addition of 'AI-generated' comments to translated entries.
180+
*Behind the scenes:* **By default (without this flag), every translation made by the AI is marked with a `#. AI-generated` comment in the PO file.** This flag prevents that marking, making AI translations indistinguishable from human translations in the file.
181+
*Note:* AI tagging is enabled by default for tracking, compliance, and quality assurance purposes.
182+
183+
---
184+
185+
## AI Translation Tracking
186+
187+
### Overview
188+
189+
**AI translation tracking is enabled by default.** The tool automatically tracks which translations were generated by AI versus human translators. This is particularly useful for:
190+
- Quality assurance and review processes
191+
- Compliance with requirements to identify AI-generated content
192+
- Incremental translation workflows where you need to track changes
193+
194+
### How It Works
195+
196+
When a translation is generated by the AI, the tool adds a translator comment to the PO entry:
197+
198+
```po
199+
#. AI-generated
200+
msgid "Hello, world!"
201+
msgstr "Hola, mundo!"
202+
```
203+
204+
These comments are:
205+
- **Persistent**: They're saved in the PO file and preserved across edits
206+
- **Standard-compliant**: Using the official gettext translator comment syntax (`#.`)
207+
- **Tool-friendly**: Visible in PO editors like Poedit, Lokalize, etc.
208+
- **Searchable**: Easy to find with grep or other search tools
209+
210+
### Managing AI Comments
211+
212+
**Finding AI translations:**
213+
```bash
214+
# Count AI-generated translations
215+
grep -c "^#\. AI-generated" locales/es/LC_MESSAGES/messages.po
216+
217+
# List files with AI translations
218+
grep -l "^#\. AI-generated" locales/**/*.po
219+
```
220+
221+
**Important: Django Workflow Consideration**
222+
Django's `makemessages` command removes translator comments (including AI-generated tags) when updating PO files. This means:
223+
224+
- **After running our translator**: AI comments are preserved in PO files
225+
- **After running Django makemessages**: AI comments are removed, but translations remain
226+
- **Best practice**: Re-run the AI translator after Django makemessages to restore AI tagging on any remaining untranslated entries
227+
228+
**Disabling AI comments:**
229+
If you don't want AI translations to be marked, use the `--no-ai-comment` flag:
230+
```bash
231+
gpt-po-translator --folder ./locales --lang de --no-ai-comment
232+
```
233+
234+
### Programmatic Access
235+
236+
The tool provides helper methods for working with AI-generated translations programmatically:
237+
238+
```python
239+
from python_gpt_po.services.po_file_handler import POFileHandler
240+
import polib
241+
242+
# Load a PO file
243+
po_file = polib.pofile('messages.po')
244+
245+
# Get all AI-generated entries
246+
ai_entries = POFileHandler.get_ai_generated_entries(po_file)
247+
248+
# Remove AI-generated comments if needed
249+
POFileHandler.remove_ai_generated_comments(po_file)
250+
po_file.save()
251+
```
252+
161253
---
162254

163255
## Behind the Scenes: API Calls and Post-Processing
164256

165257
- **Provider-Specific API Calls:**
166258
The tool constructs different API requests based on the selected provider. For example:
167259
- **OpenAI:** Uses the OpenAI Python client to create a chat completion.
260+
- **Azure OpenAI:** Uses the OpenAI Python client configured for Azure endpoints.
168261
- **Anthropic:** Sends a request to Anthropic’s API using custom headers.
169262
- **DeepSeek:** Uses the `requests` library to post JSON data, and then cleans up responses that may be wrapped in markdown code blocks.
170263

python_gpt_po/main.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from argparse import Namespace
1212
from typing import Dict, List, Optional
1313

14-
from .models.config import TranslationConfig
14+
from .models.config import TranslationConfig, TranslationFlags
1515
from .models.enums import ModelProvider
1616
from .models.provider_clients import ProviderClients
1717
from .services.model_manager import ModelManager
@@ -180,15 +180,25 @@ def main():
180180
logging.error(str(e))
181181
sys.exit(1)
182182

183+
# Check for deprecated --fuzzy option
184+
if args.fuzzy:
185+
logging.warning(
186+
"WARNING: --fuzzy is DEPRECATED and has risky behavior. "
187+
"Use --fix-fuzzy instead to properly translate and clean fuzzy entries."
188+
)
183189
# Create translation configuration
190+
flags = TranslationFlags(
191+
bulk_mode=args.bulk,
192+
fuzzy=args.fuzzy,
193+
fix_fuzzy=args.fix_fuzzy,
194+
folder_language=args.folder_language,
195+
mark_ai_generated=not args.no_ai_comment
196+
)
184197
config = TranslationConfig(
185198
provider_clients=provider_clients,
186199
provider=provider,
187200
model=model,
188-
bulk_mode=args.bulk,
189-
fuzzy=args.fuzzy,
190-
fix_fuzzy=args.fix_fuzzy,
191-
folder_language=args.folder_language
201+
flags=flags
192202
)
193203

194204
# Process translations

python_gpt_po/models/config.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,20 @@
88
from .provider_clients import ProviderClients
99

1010

11+
@dataclass
12+
class TranslationFlags:
13+
"""Boolean flags for translation behavior."""
14+
bulk_mode: bool = False
15+
fuzzy: bool = False
16+
fix_fuzzy: bool = False
17+
folder_language: bool = False
18+
mark_ai_generated: bool = True
19+
20+
1121
@dataclass
1222
class TranslationConfig:
1323
"""Class to hold configuration parameters for the translation service."""
1424
provider_clients: ProviderClients
1525
provider: ModelProvider
1626
model: str
17-
bulk_mode: bool = False
18-
fuzzy: bool = False
19-
fix_fuzzy: bool = False
20-
folder_language: bool = False
27+
flags: TranslationFlags

0 commit comments

Comments
 (0)