write chunks in TextParserStreamer #3025

pavel-esir · 2025-11-17T11:58:56Z

Description

Return DeltaMessage in incremental TextStreamerParser instead of accumualated msg. But accumulated is still available via get_parsed_message()

CVS-CVS-176146

Fixes #(issue)

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation. TODO

Copilot

Pull Request Overview

This PR refactors the TextParserStreamer to return incremental delta messages in the write() callback instead of accumulated messages, while still maintaining accumulated message access via get_parsed_message(). The key changes update the reasoning parser implementation to properly handle delta messages and modify the concatenation logic in the streamer.

Modified TextParserStreamer to write delta messages instead of accumulated messages
Updated ReasoningIncrementalParser to populate message fields for incremental output
Added concatenate_json_containers() helper function for message accumulation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
tests/python_tests/test_parsers.py	Updated tests to use `get_parsed_message()` for assertions and removed manual message accumulation in custom streamers
src/cpp/src/text_streamer.cpp	Added `concatenate_json_containers()` function and modified `write()` to return delta messages while maintaining internal accumulation
src/cpp/src/parsers.cpp	Refactored `ReasoningIncrementalParser` to populate `content` field in delta messages and removed `keep_original_content` logic from helper methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-17T12:00:00Z

src/cpp/src/text_streamer.cpp

+void concatenate_json_containers(JsonContainer& from, const JsonContainer& to, std::vector<std::string> keys_to_concatenate) {
+    for (const auto& key : keys_to_concatenate) {
+        if (to.contains(key) && from.contains(key)) {
+            // If both are strings, concatenate
+            if (to[key].is_string() && from[key].is_string()) {
+                to[key] = to[key].get_string() + from[key].get_string();


The parameter order is reversed: from should be const and to should be mutable. Currently, modifications are applied to to (line 152) but the function signature suggests from is the destination. Either swap the parameter names or reverse their constness to match the intended behavior.

src/cpp/src/text_streamer.cpp

src/cpp/src/parsers.cpp

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

tests/python_tests/test_parsers.py:1

Assignment to reason_str loses previously accumulated reasoning content. The variable should be appended to, not replaced, to preserve partial reasoning text from previous chunks.

# Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T12:10:02Z

tests/python_tests/test_parsers.py

-        # When parser is called from streamer, it is expected that content is accumulated inside streamer.
-        # Here we are calling parser manually therefore we need to accumulate content manually.
-        msg['content'] += subword  
+        print(msg)


Debug print statement should be removed before merging to production code.

Suggested change

print(msg)

Copilot · 2025-11-21T12:10:02Z

src/cpp/src/text_streamer.cpp

-    return write(m_pimpl->m_parsed_message);
+
+    // std::cout << msg["content"].get_string() << std::endl;
+    // std::cout << msg["reasoning_content"].get_string() << std::endl;


Commented-out debug statements should be removed before merging.

Suggested change

// std::cout << msg["reasoning_content"].get_string() << std::endl;

Copilot · 2025-11-21T12:10:02Z

src/cpp/src/parsers.cpp

+        // if (!message.contains("reasoning_content")) {
            message["reasoning_content"] = "";
-        }
-        if (!message.contains("content")) {
+        // }
+        // if (!message.contains("content")) {
            message["content"] = "";
-        }
+        // }


Commented-out conditional checks should be removed. If unconditional initialization is the intended behavior, the comments add unnecessary clutter.

Copilot · 2025-11-21T12:10:03Z

src/cpp/src/parsers.cpp


        if (!m_keep_original_content) {
-            delta_text = std::string(txt_chunk.substr(close_idx + m_close_tag.size()));
+            // Despite the fact that we put txt_chung to delta_text it's correct.


Corrected spelling of 'txt_chung' to 'txt_chunk'.

Suggested change

// Despite the fact that we put txt_chung to delta_text it's correct.

// Despite the fact that we put txt_chunk to delta_text it's correct.

Copilot · 2025-11-21T12:10:03Z

src/cpp/src/parsers.cpp

+            delta_text.clear();
        }


[nitpick] When think tag is not yet opened and accumulating in cache, delta_text is cleared. This means no delta content is returned to the user during this phase. Consider adding a comment explaining this behavior is intentional to avoid confusion.

Suggested change

delta_text.clear();

}

// Intentionally clear delta_text: no delta content is returned to the user during this phase

// (we are waiting for the <think> tag to be fully detected in the cache).

delta_text.clear();

Copilot

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T12:24:57Z

src/cpp/src/parsers.cpp

            // Keep potential partial close tag in cache
            m_text_cache = std::string(txt_chunk.substr(txt_chunk.size() - num_chars_to_keep));
-            reason_str.append(txt_chunk.substr(0, txt_chunk.size() - num_chars_to_keep));
+            reason_str = txt_chunk.substr(0, txt_chunk.size() - num_chars_to_keep);


Assignment instead of append changes semantics. Previously accumulated reasoning content in reason_str will be lost. This should use reason_str.append() or += to maintain accumulated content, or the accumulated content should be preserved through a different mechanism.

Copilot · 2025-11-21T12:24:58Z

src/cpp/src/parsers.cpp

        } else {
            // No partial close tag, accumulate all text
-            reason_str.append(txt_chunk);
+            reason_str = txt_chunk;


Assignment instead of append discards previously accumulated reasoning content. This should use reason_str.append() or += to maintain accumulated content from previous calls.

apaniukov · 2025-11-24T14:35:55Z

src/cpp/src/text_streamer.cpp

+
 };

+void concatenate_json_containers(const JsonContainer& from, JsonContainer& to, std::vector<std::string> keys_to_concatenate) {


It is better to have it as the json container method.

apaniukov · 2025-11-24T14:48:50Z

src/cpp/src/parsers.cpp

    /**
     * @brief Ensure required fields exist in the message container.
     */
    void ensure_message_fields(JsonContainer& message) {


This method is either not needed or has to be renamed.

apaniukov · 2025-11-25T14:09:22Z

src/cpp/src/text_streamer.cpp

    // Iterate over all parsers and apply them to the message
    for (auto& parser: m_pimpl->m_parsers) {
-        message = parser->parse(m_pimpl->m_parsed_message, message, flushed_tokens);
+        message = parser->parse(msg, message, flushed_tokens);


Duplicated names: msg and message

apaniukov · 2025-11-25T14:18:50Z

src/cpp/src/parsers.cpp

-            reason_str = std::move(message["reasoning_content"].get_string());
-        }
+        std::string txt_chunk = m_text_cache + delta_text;
+        std::string reason_str = message.contains("reasoning_content") ? std::move(message["reasoning_content"].get_string()) : "";


Suggested change

std::string reason_str = message.contains("reasoning_content") ? std::move(message["reasoning_content"].get_string()) : "";

Copilot AI review requested due to automatic review settings November 17, 2025 11:58

github-actions bot added no-match-files category: GGUF GGUF file reader category: text streamer labels Nov 17, 2025

pavel-esir requested a review from apaniukov November 17, 2025 11:59

Copilot AI reviewed Nov 17, 2025

View reviewed changes

write chunks in TextParserStreamer

2be242f

pavel-esir force-pushed the write_chunks_during_parse branch from 9b7cd5c to a41f0d0 Compare November 21, 2025 12:08

Copilot AI review requested due to automatic review settings November 21, 2025 12:08

Copilot AI reviewed Nov 21, 2025

View reviewed changes

pavel-esir force-pushed the write_chunks_during_parse branch 2 times, most recently from d71fc3f to 4017fb5 Compare November 21, 2025 12:23

Copilot AI review requested due to automatic review settings November 21, 2025 12:23

Copilot AI reviewed Nov 21, 2025

View reviewed changes

Fix pytests and gtests for parsers

4017fb5

apaniukov reviewed Nov 24, 2025

View reviewed changes

apaniukov reviewed Nov 25, 2025

View reviewed changes

	// Despite the fact that we put txt_chung to delta_text it's correct.
	// Despite the fact that we put txt_chunk to delta_text it's correct.


		};

		void concatenate_json_containers(const JsonContainer& from, JsonContainer& to, std::vector<std::string> keys_to_concatenate) {

write chunks in TextParserStreamer #3025

Are you sure you want to change the base?

write chunks in TextParserStreamer #3025

Conversation

pavel-esir commented Nov 17, 2025

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants