How to solve the problem of mixed Chinese and English in building a knowledge graph after upgrading to v0.19.1? #8653

BillGaici20230401 · 2025-07-03T10:03:12Z

BillGaici20230401
Jul 3, 2025

2025-07-03 10:30:29,503 INFO 22 task_executor_c86c4a59dc26_0 reported heartbeat: {"name": "task_executor_c86c4a59dc26_0", "now": "2025-07-03T17:30:29.500+08:00", "boot_at": "2025-06-30T09:47:28.695+08:00", "pending": 2, "lag": 0, "done": 14, "failed": 29, "current": {"a3d9c04257ea11f08d22ca13e91a6761": {"id": "a3d9c04257ea11f08d22ca13e91a6761", "doc_id": "dd498d0e57e911f0b9f0ca13e91a6761", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "bae1a54e57e911f0ac6fca13e91a6761", "parser_id": "naive", "parser_config": {"pages": [], "auto_keywords": 2, "auto_questions": 1, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true}, "chunk_token_num": 4, "delimiter": "#&"}, "name": "\u75c5\u5386\u4e66\u5199\u57fa\u672c\u89c4\u8303-\u89e3\u6790\u7528.txt", "type": "doc", "location": "\u75c5\u5386\u4e66\u5199\u57fa\u672c\u89c4\u8303-\u89e3\u6790\u7528.txt", "size": 17875, "tenant_id": "73f1982e03a211f098630ecb533b92c9", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 4, "delimiter": "#&", "auto_keywords": 2, "auto_questions": 1, "html4excel": false, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light", "resolution": true, "community": true}}, "img2txt_id": "Qwen/QVQ-72B-Preview@SILICONFLOW", "asr_id": "", "llm_id": "deepseek-r1-32b___OpenAI-API@OpenAI-API-Compatible", "update_time": 1751532573144, "task_type": "graphrag"}}}

This is the output log of ragflow when creating a knowledge base and building a knowledge graph. Here you can see the settings parameters of the parsing file. In the settings page of ragflow v0.19.1 version, there is no language setting item. From the log, it can be seen that it defaults to using English as the output ("language": "English"), but my knowledge base content is all in pure Chinese. When it calls LLM to build the knowledge graph, it will return English information, causing a mixed phenomenon of Chinese and English in the knowledge graph. How to solve this problem?

2025-07-03T10:08:09Z

dosubot[bot]
bot Jul 3, 2025

The mixed Chinese and English output happens because RAGFlow v0.19.1 sets the knowledge base language at creation, based on the server's LANG environment variable—if it doesn't contain "zh_CN", it defaults to English, which then drives all downstream LLM prompts and output, even if your documents are pure Chinese. There's no UI option to change this after creation, and the system doesn't auto-detect language from document content reference reference.

To fix this, before creating a new knowledge base, set the LANG environment variable to include "zh_CN" (for example, export LANG=zh_CN.UTF-8), then restart RAGFlow and create the KB. If the KB is already created, you can update the language field for your KB directly in the database to "Chinese" before building the knowledge graph. This will ensure all LLM output and graph extraction happens in Chinese.

Currently, there’s no built-in way to change the KB language from the UI or to auto-detect language per document. For long-term stability, consider requesting or implementing a UI language selector or automatic language detection on upload.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

BillGaici20230401 · 2025-07-04T01:05:52Z

BillGaici20230401
Jul 4, 2025
Author

Thank you for this answer, but before creating a new knowledge base, how to set the LANG environment variable to include "zh_CN" (e.g. export LANG=zh_CN. UTF-8) on the full edition v0.19.1, and in which configuration file should this parameter be set?

1 reply

dosubot[bot] bot Jul 4, 2025

To make sure RAGFlow v0.19.1 uses Chinese for your knowledge base, you need to set the LANG environment variable to zh_CN.UTF-8 before starting the service—there’s no RAGFlow-specific config file for this, it’s a standard environment variable.

If you’re using Docker Compose (the most common for the full edition), add this line to your docker/.env file:

LANG=zh_CN.UTF-8

Or, add it under the environment section for the ragflow service in docker-compose.yml:

environment:
  - LANG=zh_CN.UTF-8

Then restart your containers and create the new knowledge base.

If you’re running RAGFlow directly on a server (not in Docker), run this in your shell before starting RAGFlow:

export LANG=zh_CN.UTF-8

If you use systemd to manage the service, add this to the [Service] section of your systemd unit file:

Environment=LANG=zh_CN.UTF-8

Then reload systemd and restart the service.

This ensures the language is set to Chinese for all new knowledge bases you create reference reference.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

BillGaici20230401 · 2025-07-10T07:15:44Z

BillGaici20230401
Jul 10, 2025
Author

@dosu According to your method, in docker-compose.yml:
environment:

LANG=zh_CN.UTF-8
After setting, the generated knowledge graph is still mixed in Chinese and English, and when checking the running logs, it is still:
"language": "English", "embd_id": "BAAI/ bge-large-zh-v1.5@BAAI "......
So, should we interact with LLM in English mode? Not changed?

1 reply

dosubot[bot] bot Jul 10, 2025

Setting LANG=zh_CN.UTF-8 in your Docker config only affects new knowledge bases created after the change. If your existing KB was created when LANG was not set (or set to English), its language field is still "English", so all LLM prompts and knowledge graph outputs will remain in English—even if your documents are pure Chinese. Changing LANG or the embedding model after KB creation has no effect on this field or the LLM output language (reference).

To fix this for an existing KB, you need to update the language field in the database for that KB to "Chinese" before building the knowledge graph. This will ensure all LLM output and graph extraction happens in Chinese. If you create a new KB after setting LANG, it will pick up the correct language automatically.

There’s currently no UI or API option to change the KB language after creation, so a direct DB update is required. Make sure to back up your data before making changes.

Let me know if you need step-by-step instructions for updating the database.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

How to solve the problem of mixed Chinese and English in building a knowledge graph after upgrading to v0.19.1? #8653

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

How to solve the problem of mixed Chinese and English in building a knowledge graph after upgrading to v0.19.1? #8653

Uh oh!

BillGaici20230401 Jul 3, 2025

Replies: 3 comments · 2 replies

Uh oh!

dosubot[bot] bot Jul 3, 2025

Uh oh!

BillGaici20230401 Jul 4, 2025 Author

Uh oh!

dosubot[bot] bot Jul 4, 2025

Uh oh!

BillGaici20230401 Jul 10, 2025 Author

Uh oh!

dosubot[bot] bot Jul 10, 2025

BillGaici20230401
Jul 3, 2025

Replies: 3 comments 2 replies

dosubot[bot]
bot Jul 3, 2025

BillGaici20230401
Jul 4, 2025
Author

BillGaici20230401
Jul 10, 2025
Author