How to solve the problem of mixed Chinese and English in building a knowledge graph after upgrading to v0.19.1? #8653
Replies: 3 comments 2 replies
-
|
The mixed Chinese and English output happens because RAGFlow v0.19.1 sets the knowledge base language at creation, based on the server's LANG environment variable—if it doesn't contain "zh_CN", it defaults to English, which then drives all downstream LLM prompts and output, even if your documents are pure Chinese. There's no UI option to change this after creation, and the system doesn't auto-detect language from document content reference reference. To fix this, before creating a new knowledge base, set the LANG environment variable to include "zh_CN" (for example, Currently, there’s no built-in way to change the KB language from the UI or to auto-detect language per document. For long-term stability, consider requesting or implementing a UI language selector or automatic language detection on upload. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for this answer, but before creating a new knowledge base, how to set the LANG environment variable to include "zh_CN" (e.g. export LANG=zh_CN. UTF-8) on the full edition v0.19.1, and in which configuration file should this parameter be set? |
Beta Was this translation helpful? Give feedback.
-
|
@dosu According to your method, in docker-compose.yml:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
2025-07-03 10:30:29,503 INFO 22 task_executor_c86c4a59dc26_0 reported heartbeat: {"name": "task_executor_c86c4a59dc26_0", "now": "2025-07-03T17:30:29.500+08:00", "boot_at": "2025-06-30T09:47:28.695+08:00", "pending": 2, "lag": 0, "done": 14, "failed": 29, "current": {"a3d9c04257ea11f08d22ca13e91a6761": {"id": "a3d9c04257ea11f08d22ca13e91a6761", "doc_id": "dd498d0e57e911f0b9f0ca13e91a6761", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "bae1a54e57e911f0ac6fca13e91a6761", "parser_id": "naive", "parser_config": {"pages": [], "auto_keywords": 2, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true}, "chunk_token_num": 4, "delimiter": "
#&"}, "name": "\u75c5\u5386\u4e66\u5199\u57fa\u672c\u89c4\u8303-\u89e3\u6790\u7528.txt", "type": "doc", "location": "\u75c5\u5386\u4e66\u5199\u57fa\u672c\u89c4\u8303-\u89e3\u6790\u7528.txt", "size": 17875, "tenant_id": "73f1982e03a211f098630ecb533b92c9", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 4, "delimiter": "#&", "auto_keywords": 2, "auto_questions": 1, "html4excel": false, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light", "resolution": true, "community": true}}, "img2txt_id": "Qwen/QVQ-72B-Preview@SILICONFLOW", "asr_id": "", "llm_id": "deepseek-r1-32b___OpenAI-API@OpenAI-API-Compatible", "update_time": 1751532573144, "task_type": "graphrag"}}}This is the output log of ragflow when creating a knowledge base and building a knowledge graph. Here you can see the settings parameters of the parsing file. In the settings page of ragflow v0.19.1 version, there is no language setting item. From the log, it can be seen that it defaults to using English as the output ("language": "English"), but my knowledge base content is all in pure Chinese. When it calls LLM to build the knowledge graph, it will return English information, causing a mixed phenomenon of Chinese and English in the knowledge graph. How to solve this problem?
Beta Was this translation helpful? Give feedback.
All reactions