-
Notifications
You must be signed in to change notification settings - Fork 199
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Version
0.0.4
Operating System
Windows
Python Version
3.8
What happened?
Hi, I tried to run command CLI: synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa
with yaml config file:
Master configuration file for Synthetic Data Kit
paths:
input:
pdf: "data/pdf"
html: "data/html"
youtube: "data/youtube"
docx: "data/docx"
ppt: "data/ppt"
txt: "data/txt"
output:
parsed: "data/output"
generated: "data/generated"
cleaned: "data/cleaned"
final: "data/final"
llm:
provider: "api-endpoint"
api-endpoint:
api_base: "http://localhost:11434/v1"
model: "llama2:latest" # Replace with the exact model name (run `ollama list` to verify)
# vllm:
# api_base: "http://localhost:11434/api"
# port: 8000
# model: "llama3-3b-instruct"
# max_retries: 3
# retry_delay: 1.0
ingest:
default_format: "txt"
youtube_captions: "auto"
generation:
temperature: 0.7
top_p: 0.95
chunk_size: 1022
overlap: 64
max_tokens: 512
num_pairs: 25
cleanup:
threshold: 1.0
batch_size: 4
temperature: 0.3
format:
default: "jsonl"
include_metadata: true
pretty_json: true
prompts:
summary: |
Summarize this document in 3-5 sentences, focusing on the main topic and key concepts.
qa_generation: |
Create 25 question-answer pairs from this text for LLM training.
Rules:
1. Questions must be about important facts in the text
2. Answers must be directly supported by the text
3. Return JSON format only.
Text:
{text}
qa_rating: |
Rate each of these question-answer pairs for quality and return JSON:
[
{"question": "same question", "answer": "same answer", "rating": n}
]
with API endpoint format.
but it always return error:
synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa
Loading config from: /Users/lv/Documents/Opensources/LAZYAI/backend/.venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
get_llm_provider returning: api-endpoint
L Using api-endpoint provider
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
API_ENDPOINT_KEY from environment: Found
Using API key: From env var
Using API base URL: http://localhost:11434/v1
L Using api-endpoint provider
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
L Error: 'NoneType' object cannot be interpreted as an integer
Relevant log output
Steps to reproduce
- update yaml file with API endpoint
- run CLI command synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa
- error: Error: 'NoneType' object cannot be interpreted as an integer
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working