Skip to content

Conversation

@Retribution98
Copy link
Contributor

@Retribution98 Retribution98 commented Nov 13, 2025

Description

  • Using Tensor from openvino-node in Tokenizer encode and decode
  • Extended the Tensor API
  • Added constructors for Tensor
  • Moved Tokenizer to the separate TS file
  • Use BigInt for tokenId
  • Store openvino-node addon in the AddondData to manipulate with entity from the core part
  • Aligned tests with the new API and verify tokenizer and binding behavior.
  • Updated benchmark sample with using Tensor encode
  • Update documentation - https://retribution98.github.io/openvino.genai/

Ticket

CVS-174909

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Signed-off-by: Kirill Suvorov <[email protected]>
Copilot AI review requested due to automatic review settings November 13, 2025 20:47
@github-actions github-actions bot added category: LLM samples GenAI LLM samples category: JS API GenAI JS API labels Nov 13, 2025
@Retribution98
Copy link
Contributor Author

Retribution98 commented Nov 13, 2025

This PR is blocked by both PR#32802 and PR#32850 in openvino repository

@Retribution98 Retribution98 requested review from almilosz and removed request for Copilot November 13, 2025 20:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the JavaScript tokenizer implementation to use Tensor objects from openvino-node and extends the tokenizer API with encode/decode methods. The changes enable direct manipulation of tokenized inputs using Tensor objects rather than raw arrays, improving type safety and integration with the OpenVINO ecosystem.

Key changes:

  • Added encode and decode methods to the Tokenizer that work with Tensor objects
  • Created a separate TypeScript interface file (tokenizer.ts) for better type definitions
  • Integrated openvino-node addon storage in AddonData for cross-module entity access
  • Updated tests to verify new tokenizer behavior and aligned imports across test files

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/js/tests/tokenizer.test.js Added comprehensive tests for new encode/decode methods and removed async markers from synchronous tests
src/js/tests/bindings.test.js Updated import to use named export for LLMPipeline
src/js/src/tokenizer.cpp Implemented encode/decode methods with support for multiple input types and chat template getters/setters
src/js/src/helper.cpp Added conversion helpers for int64_t, vectors of int64_t, Tensor, and TokenizedInputs
src/js/src/addon.cpp Added setOpenvinoAddon function to store openvino-node addon reference
src/js/lib/tokenizer.ts Created TypeScript interface definitions for Tokenizer and related types
src/js/lib/pipelines/textEmbeddingPipeline.ts Updated to use named import for TextEmbeddingPipeline
src/js/lib/pipelines/llmPipeline.ts Moved Tokenizer interface to separate file and updated imports
src/js/lib/index.ts Exported tokenizer types
src/js/lib/addon.ts Integrated openvino-node addon and updated exports
src/js/include/tokenizer.hpp Added method declarations for new tokenizer functionality
src/js/include/helper.hpp Added template specializations for new conversion types
src/js/include/addon.hpp Added openvino_addon reference to AddonData
samples/js/text_generation/benchmark_genai.js Added example usage of encode method to get prompt token size

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exports.Set(class_name, prototype);
}

Napi::Value init_ov_addon(const Napi::CallbackInfo& info) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Napi::Value init_ov_addon(const Napi::CallbackInfo& info) {
void init_ov_addon(const Napi::CallbackInfo& info) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used as a JavaScript function, so we usually return a Napi::Value even if nothing is returned. I understand your confusion because it is called like init_class, but they have different usages. To be clearer, I will rename this function to set_ov_node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are functions that return Napi::Value even if they are void, but I do not think we should repeat that pattern. Here it pop_back is used as js function and it's void

void ChatHistoryWrap::pop_back(const Napi::CallbackInfo& info) {

Comment on lines +214 to +220
it("getChatTemplate return string", () => {
const template = tokenizer.getChatTemplate();
assert.strictEqual(typeof template, "string");
});

it("setChatTemplate updates template", () => {
const originalTemplate = tokenizer.getChatTemplate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it("getChatTemplate return string", () => {
const template = tokenizer.getChatTemplate();
assert.strictEqual(typeof template, "string");
});
it("setChatTemplate updates template", () => {
const originalTemplate = tokenizer.getChatTemplate();
it("setChatTemplate updates template", () => {
const originalTemplate = tokenizer.getChatTemplate();
assert.strictEqual(typeof originalTemplate, "string");

Copy link
Contributor

@almilosz almilosz Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to have a separate test to check return type of getChatTemplate(). It'll be enough to get this error from e.g. setChatTemplate updates template.

@yatarkan
Copy link
Contributor

yatarkan commented Nov 14, 2025

Docs also can be updated with new JS tokenizer functionality - in Node.js Bindings page and in Tokenization Guide (add js code example)

@github-actions github-actions bot added the category: GH Pages Docs Github Pages documentation label Nov 14, 2025
Copilot AI review requested due to automatic review settings November 14, 2025 21:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 14, 2025 21:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 20, 2025 12:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


template <>
int64_t js_to_cpp<int64_t>(const Napi::Env& env, const Napi::Value& value) {
OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number.");
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message states 'must be of type Number' but the function accepts both Number and BigInt. Update to 'Passed argument must be of type Number or BigInt.'

Suggested change
OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number.");
OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number or BigInt.");

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +114
applyChatTemplate(
chatHistory: Record<string, any>[] | ChatHistory,
addGenerationPrompt: boolean,
chatTemplate?: string,
tools?: Record<string, any>[],
extraContext?: Record<string, any>,
): string;
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace generic 'any' types with more specific types. Consider using 'ChatMessage[]' for chatHistory parameter, 'ToolDefinition[]' for tools, and 'ExtraContext' for extraContext, which are exported from './chatHistory.js'.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GH Pages Docs Github Pages documentation category: JS API GenAI JS API category: LLM samples GenAI LLM samples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants