-
Notifications
You must be signed in to change notification settings - Fork 302
[JS] Update tokenizer methods #3012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[JS] Update tokenizer methods #3012
Conversation
Signed-off-by: Kirill Suvorov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the JavaScript tokenizer implementation to use Tensor objects from openvino-node and extends the tokenizer API with encode/decode methods. The changes enable direct manipulation of tokenized inputs using Tensor objects rather than raw arrays, improving type safety and integration with the OpenVINO ecosystem.
Key changes:
- Added encode and decode methods to the Tokenizer that work with Tensor objects
- Created a separate TypeScript interface file (tokenizer.ts) for better type definitions
- Integrated openvino-node addon storage in AddonData for cross-module entity access
- Updated tests to verify new tokenizer behavior and aligned imports across test files
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js/tests/tokenizer.test.js | Added comprehensive tests for new encode/decode methods and removed async markers from synchronous tests |
| src/js/tests/bindings.test.js | Updated import to use named export for LLMPipeline |
| src/js/src/tokenizer.cpp | Implemented encode/decode methods with support for multiple input types and chat template getters/setters |
| src/js/src/helper.cpp | Added conversion helpers for int64_t, vectors of int64_t, Tensor, and TokenizedInputs |
| src/js/src/addon.cpp | Added setOpenvinoAddon function to store openvino-node addon reference |
| src/js/lib/tokenizer.ts | Created TypeScript interface definitions for Tokenizer and related types |
| src/js/lib/pipelines/textEmbeddingPipeline.ts | Updated to use named import for TextEmbeddingPipeline |
| src/js/lib/pipelines/llmPipeline.ts | Moved Tokenizer interface to separate file and updated imports |
| src/js/lib/index.ts | Exported tokenizer types |
| src/js/lib/addon.ts | Integrated openvino-node addon and updated exports |
| src/js/include/tokenizer.hpp | Added method declarations for new tokenizer functionality |
| src/js/include/helper.hpp | Added template specializations for new conversion types |
| src/js/include/addon.hpp | Added openvino_addon reference to AddonData |
| samples/js/text_generation/benchmark_genai.js | Added example usage of encode method to get prompt token size |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/js/src/addon.cpp
Outdated
| exports.Set(class_name, prototype); | ||
| } | ||
|
|
||
| Napi::Value init_ov_addon(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Napi::Value init_ov_addon(const Napi::CallbackInfo& info) { | |
| void init_ov_addon(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used as a JavaScript function, so we usually return a Napi::Value even if nothing is returned. I understand your confusion because it is called like init_class, but they have different usages. To be clearer, I will rename this function to set_ov_node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are functions that return Napi::Value even if they are void, but I do not think we should repeat that pattern. Here it pop_back is used as js function and it's void
| void ChatHistoryWrap::pop_back(const Napi::CallbackInfo& info) { |
| it("getChatTemplate return string", () => { | ||
| const template = tokenizer.getChatTemplate(); | ||
| assert.strictEqual(typeof template, "string"); | ||
| }); | ||
|
|
||
| it("setChatTemplate updates template", () => { | ||
| const originalTemplate = tokenizer.getChatTemplate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| it("getChatTemplate return string", () => { | |
| const template = tokenizer.getChatTemplate(); | |
| assert.strictEqual(typeof template, "string"); | |
| }); | |
| it("setChatTemplate updates template", () => { | |
| const originalTemplate = tokenizer.getChatTemplate(); | |
| it("setChatTemplate updates template", () => { | |
| const originalTemplate = tokenizer.getChatTemplate(); | |
| assert.strictEqual(typeof originalTemplate, "string"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to have a separate test to check return type of getChatTemplate(). It'll be enough to get this error from e.g. setChatTemplate updates template.
|
Docs also can be updated with new JS tokenizer functionality - in Node.js Bindings page and in Tokenization Guide (add js code example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| template <> | ||
| int64_t js_to_cpp<int64_t>(const Napi::Env& env, const Napi::Value& value) { | ||
| OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number."); |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message states 'must be of type Number' but the function accepts both Number and BigInt. Update to 'Passed argument must be of type Number or BigInt.'
| OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number."); | |
| OPENVINO_ASSERT(value.IsNumber() || value.IsBigInt(), "Passed argument must be of type Number or BigInt."); |
| applyChatTemplate( | ||
| chatHistory: Record<string, any>[] | ChatHistory, | ||
| addGenerationPrompt: boolean, | ||
| chatTemplate?: string, | ||
| tools?: Record<string, any>[], | ||
| extraContext?: Record<string, any>, | ||
| ): string; |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace generic 'any' types with more specific types. Consider using 'ChatMessage[]' for chatHistory parameter, 'ToolDefinition[]' for tools, and 'ExtraContext' for extraContext, which are exported from './chatHistory.js'.
Description
Ticket
CVS-174909
Checklist: