Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions samples/js/text_generation/benchmark_genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@ async function main() {
pipe = await LLMPipeline(modelsPath, device, { schedulerConfig: schedulerConfig });
}

const inputData = await pipe.getTokenizer().encode(prompt);
const promptTokenSize = inputData.input_ids.getShape()[1];
console.log(`Prompt token size: ${promptTokenSize}`);

for (let i = 0; i < numWarmup; i++) {
await pipe.generate(prompt, config);
}
Expand Down
6 changes: 6 additions & 0 deletions site/docs/bindings/node-js.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ Node.js bindings currently support:
- Structured output
- ReAct agent support
- `TextEmbeddingPipeline`: Generate text embeddings for semantic search and RAG applications
- `Tokenizer`: Fast tokenization / detokenization and chat prompt formatting
- Encode strings into token id and attention mask tensors
- Decode token sequences
- Apply chat template
- Access special tokens (BOS/EOS/PAD)
- Supports paired input

## Installation

Expand Down
53 changes: 53 additions & 0 deletions site/docs/guides/tokenization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,18 @@ It can be initialized from the path, in-memory IR representation or obtained fro
auto tokenzier = pipe.get_tokenizer();
```
</TabItemCpp>
<TabItemJS>
```js
import { LLMPipeline, Tokenizer } from 'openvino-genai-node';

// Initialize from the path
const tokenizer = new Tokenizer(models_path);

// Or get tokenizer instance from LLMPipeline
const pipe = await LLMPipeline(models_path, "CPU");
const tokenizer = pipe.getTokenizer();
```
</TabItemJS>
</LanguageTabs>

`Tokenizer` has `encode()` and `decode()` methods which support the following arguments: `add_special_tokens`, `skip_special_tokens`, `pad_to_max_length`, `max_length`.
Expand All @@ -51,6 +63,11 @@ It can be initialized from the path, in-memory IR representation or obtained fro
auto tokens = tokenizer.encode("The Sun is yellow because", ov::genai::add_special_tokens(false));
```
</TabItemCpp>
<TabItemJS>
```js
const tokens = tokenizer.encode("The Sun is yellow because", { add_special_tokens: false });
```
</TabItemJS>
</LanguageTabs>

The `encode()` method returns a [`TokenizedInputs`](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.TokenizedInputs.html) object containing `input_ids` and `attention_mask`, both stored as `ov::Tensor`.
Expand Down Expand Up @@ -121,4 +138,40 @@ If `pad_to_max_length` is set to true, then instead of padding to the longest se
// out_shape: [1, 128]
```
</TabItemCpp>
<TabItemJS>
```js
import { Tokenizer } from 'openvino-genai-node';

const tokenizer = new Tokenizer(models_path);
const prompts = ["The Sun is yellow because", "The"];
let tokens;

// Since prompt is definitely shorter than maximal length (which is taken from IR) will not affect shape.
// Resulting shape is defined by length of the longest tokens sequence.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="longest", truncation=True)
tokens = tokenizer.encode(["The Sun is yellow because", "The"]);
// or is equivalent to
tokens = tokenizer.encode(["The Sun is yellow because", "The"], { pad_to_max_length: false });
console.log(tokens.input_ids.shape);
// out_shape: [2, 6]

// Resulting tokens tensor will be padded to 1024, sequences which exceed this length will be truncated.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="max_length", truncation=True, max_length=1024)
tokens = tokenizer.encode([
"The Sun is yellow because",
"The",
"The longest string ever" * 2000,
], {
pad_to_max_length: true,
max_length: 1024,
});
console.log(tokens.input_ids.shape);
// out_shape: [3, 1024]

// For single string prompts truncation and padding are also applied.
tokens = tokenizer.encode("The Sun is yellow because", { pad_to_max_length: true, max_length: 128 });
console.log(tokens.input_ids.shape);
// out_shape: [1, 128]
```
</TabItemJS>
</LanguageTabs>
7 changes: 7 additions & 0 deletions src/js/eslint.config.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ module.exports = defineConfig([
"json_schema",
"structured_output_config",
"structural_tags_config",
"skip_special_tokens",
"add_special_tokens",
"pad_to_max_length",
"max_length",
"padding_side",
"add_second_input",
"number_of_inputs",
],
},
],
Expand Down
1 change: 1 addition & 0 deletions src/js/include/addon.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ struct AddonData {
Napi::FunctionReference tokenizer;
Napi::FunctionReference perf_metrics;
Napi::FunctionReference chat_history;
Napi::ObjectReference openvino_addon;
};

void init_class(Napi::Env env,
Expand Down
14 changes: 14 additions & 0 deletions src/js/include/helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,13 @@ ov::AnyMap js_to_cpp<ov::AnyMap>(const Napi::Env& env, const Napi::Value& value)
/** @brief A template specialization for TargetType std::string */
template <>
std::string js_to_cpp<std::string>(const Napi::Env& env, const Napi::Value& value);
template <>
int64_t js_to_cpp<int64_t>(const Napi::Env& env, const Napi::Value& value);
/** @brief A template specialization for TargetType std::vector<std::string> */
template <>
std::vector<std::string> js_to_cpp<std::vector<std::string>>(const Napi::Env& env, const Napi::Value& value);
template <>
std::vector<int64_t> js_to_cpp<std::vector<int64_t>>(const Napi::Env& env, const Napi::Value& value);
/** @brief A template specialization for TargetType GenerateInputs */
template <>
GenerateInputs js_to_cpp<GenerateInputs>(const Napi::Env& env, const Napi::Value& value);
Expand All @@ -58,6 +62,8 @@ ov::genai::StructuredOutputConfig::Tag js_to_cpp<ov::genai::StructuredOutputConf
/** @brief A template specialization for TargetType ov::genai::StructuredOutputConfig::StructuralTag */
template <>
ov::genai::StructuredOutputConfig::StructuralTag js_to_cpp<ov::genai::StructuredOutputConfig::StructuralTag>(const Napi::Env& env, const Napi::Value& value);
template <>
ov::Tensor js_to_cpp<ov::Tensor>(const Napi::Env& env, const Napi::Value& value);
/**
* @brief Unwraps a C++ object from a JavaScript wrapper.
* @tparam TargetType The C++ class type to extract.
Expand Down Expand Up @@ -110,6 +116,12 @@ Napi::Value cpp_to_js<std::vector<size_t>, Napi::Value>(const Napi::Env& env, co

template <>
Napi::Value cpp_to_js<ov::genai::JsonContainer, Napi::Value>(const Napi::Env& env, const ov::genai::JsonContainer& json_container);

template <>
Napi::Value cpp_to_js<ov::Tensor, Napi::Value>(const Napi::Env& env, const ov::Tensor& tensor);

template <>
Napi::Value cpp_to_js<ov::genai::TokenizedInputs, Napi::Value>(const Napi::Env& env, const ov::genai::TokenizedInputs& tokenized_inputs);
/**
* @brief Template function to convert C++ map into Javascript Object. Map key must be std::string.
* @tparam MapElementType C++ data type of map elements.
Expand All @@ -130,3 +142,5 @@ bool is_chat_history(const Napi::Env& env, const Napi::Value& value);
std::string json_stringify(const Napi::Env& env, const Napi::Value& value);

Napi::Value json_parse(const Napi::Env& env, const std::string& value);

Napi::Function get_prototype_from_ov_addon(const Napi::Env& env, const std::string& ctor_name);
6 changes: 6 additions & 0 deletions src/js/include/tokenizer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ class TokenizerWrapper : public Napi::ObjectWrap<TokenizerWrapper> {
Napi::Value get_eos_token_id(const Napi::CallbackInfo& info);
Napi::Value get_pad_token(const Napi::CallbackInfo& info);
Napi::Value get_pad_token_id(const Napi::CallbackInfo& info);
Napi::Value get_chat_template(const Napi::CallbackInfo& info);
Napi::Value get_original_chat_template(const Napi::CallbackInfo& info);
Napi::Value set_chat_template(const Napi::CallbackInfo& info);
Napi::Value supports_paired_input(const Napi::CallbackInfo& info);
Napi::Value encode(const Napi::CallbackInfo& info);
Napi::Value decode(const Napi::CallbackInfo& info);
private:
ov::genai::Tokenizer _tokenizer;
};
9 changes: 7 additions & 2 deletions src/js/lib/addon.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ import { createRequire } from "module";
import { platform } from "node:os";
import { join, dirname, resolve } from "node:path";
import type { ChatHistory as IChatHistory } from "./chatHistory.js";
import type { Tokenizer as ITokenizer } from "./tokenizer.js";
import { addon as ovAddon } from "openvino-node";

export type EmbeddingResult = Float32Array | Int8Array | Uint8Array;
export type EmbeddingResults = Float32Array[] | Int8Array[] | Uint8Array[];
Expand Down Expand Up @@ -60,6 +62,8 @@ interface OpenVINOGenAIAddon {
TextEmbeddingPipeline: TextEmbeddingPipelineWrapper;
LLMPipeline: any;
ChatHistory: IChatHistory;
Tokenizer: ITokenizer;
setOpenvinoAddon: (ovAddon: any) => void;
}

// We need to use delayed import to get an updated Path if required
Expand All @@ -78,7 +82,8 @@ function getGenAIAddon(): OpenVINOGenAIAddon {
}

const addon = getGenAIAddon();
addon.setOpenvinoAddon(ovAddon);

export const { ChatHistory } = addon;
export const { TextEmbeddingPipeline, LLMPipeline, ChatHistory, Tokenizer } = addon;
export type ChatHistory = IChatHistory;
export default addon;
export type Tokenizer = ITokenizer;
2 changes: 2 additions & 0 deletions src/js/lib/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ export const { LLMPipeline, TextEmbeddingPipeline } = PipelineFactory;
export { DecodedResults } from "./pipelines/llmPipeline.js";
export * from "./utils.js";
export * from "./addon.js";
export type { TokenizedInputs, EncodeOptions, DecodeOptions } from "./tokenizer.js";
export type { ChatMessage, ExtraContext, ToolDefinition } from "./chatHistory.js";
22 changes: 3 additions & 19 deletions src/js/lib/pipelines/llmPipeline.ts
Original file line number Diff line number Diff line change
@@ -1,30 +1,14 @@
import util from "node:util";
import addon, { ChatHistory } from "../addon.js";
import { ChatHistory, LLMPipeline as LLMPipelineWrap } from "../addon.js";
import { GenerationConfig, StreamingStatus, LLMPipelineProperties } from "../utils.js";
import { Tokenizer } from "../tokenizer.js";

export type ResolveFunction = (arg: { value: string; done: boolean }) => void;
export type Options = {
disableStreamer?: boolean;
max_new_tokens?: number;
};

interface Tokenizer {
/** Applies a chat template to format chat history into a prompt string. */
applyChatTemplate(
chatHistory: Record<string, any>[] | ChatHistory,
addGenerationPrompt: boolean,
chatTemplate?: string,
tools?: Record<string, any>[],
extraContext?: Record<string, any>,
): string;
getBosToken(): string;
getBosTokenId(): number;
getEosToken(): string;
getEosTokenId(): number;
getPadToken(): string;
getPadTokenId(): number;
}

/** Structure with raw performance metrics for each generation before any statistics are calculated. */
export type RawMetrics = {
/** Durations for each generate call in milliseconds. */
Expand Down Expand Up @@ -167,7 +151,7 @@ export class LLMPipeline {
async init() {
if (this.isInitialized) throw new Error("LLMPipeline is already initialized");

this.pipeline = new addon.LLMPipeline();
this.pipeline = new LLMPipelineWrap();

const initPromise = util.promisify(this.pipeline.init.bind(this.pipeline));
const result = await initPromise(this.modelPath, this.device, this.properties);
Expand Down
5 changes: 3 additions & 2 deletions src/js/lib/pipelines/textEmbeddingPipeline.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import util from "node:util";
import addon, {
import {
TextEmbeddingPipelineWrapper,
EmbeddingResult,
EmbeddingResults,
TextEmbeddingConfig,
TextEmbeddingPipeline as TextEmbeddingPipelineWrap,
} from "../addon.js";

export class TextEmbeddingPipeline {
Expand All @@ -29,7 +30,7 @@ export class TextEmbeddingPipeline {
async init() {
if (this.pipeline) throw new Error("TextEmbeddingPipeline is already initialized");

this.pipeline = new addon.TextEmbeddingPipeline();
this.pipeline = new TextEmbeddingPipelineWrap();

const initPromise = util.promisify(this.pipeline.init.bind(this.pipeline));
await initPromise(this.modelPath, this.device, this.config, this.ovProperties);
Expand Down
Loading
Loading