[Bug]: I faced the problem when run synthetic-data-kit CLI with API endpoint

### Version

0.0.4

### Operating System

Windows

### Python Version

3.8

### What happened?

Hi, I tried to run command CLI: `synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa`
with yaml config file:
# Master configuration file for Synthetic Data Kit
```

paths:
  input:
    pdf: "data/pdf"
    html: "data/html"
    youtube: "data/youtube"
    docx: "data/docx"
    ppt: "data/ppt"
    txt: "data/txt"
  output:
    parsed: "data/output"
    generated: "data/generated"
    cleaned: "data/cleaned"
    final: "data/final"
llm:
  provider: "api-endpoint"

api-endpoint:
  api_base: "http://localhost:11434/v1"
  model: "llama2:latest"   # Replace with the exact model name (run `ollama list` to verify)


# vllm:
#   api_base: "http://localhost:11434/api"
#   port: 8000
#   model: "llama3-3b-instruct"
#   max_retries: 3
#   retry_delay: 1.0

ingest:
  default_format: "txt"
  youtube_captions: "auto"

generation:
  temperature: 0.7
  top_p: 0.95
  chunk_size: 1022
  overlap: 64
  max_tokens: 512
  num_pairs: 25

cleanup:
  threshold: 1.0
  batch_size: 4
  temperature: 0.3

format:
  default: "jsonl"
  include_metadata: true
  pretty_json: true

prompts:
  summary: |
    Summarize this document in 3-5 sentences, focusing on the main topic and key concepts.

  qa_generation: |
    Create 25 question-answer pairs from this text for LLM training.

    Rules:
    1. Questions must be about important facts in the text
    2. Answers must be directly supported by the text
    3. Return JSON format only.

    Text:
    {text}

  qa_rating: |
    Rate each of these question-answer pairs for quality and return JSON:
    [
      {"question": "same question", "answer": "same answer", "rating": n}
    ]
```

with API endpoint format.
but it always return error: 
```

synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa
Loading config from: /Users/lv/Documents/Opensources/LAZYAI/backend/.venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
get_llm_provider returning: api-endpoint
L Using api-endpoint provider
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
API_ENDPOINT_KEY from environment: Found
Using API key: From env var
Using API base URL: http://localhost:11434/v1
L Using api-endpoint provider
Loading config from: synthetic_data_kit_config.yaml
Config has LLM provider set to: api-endpoint
L Error: 'NoneType' object cannot be interpreted as an integer
```

### Relevant log output

```shell

```

### Steps to reproduce

1. update yaml file with API endpoint
2. run CLI command synthetic-data-kit -c synthetic_data_kit_config.yaml create "data/output/week3_chunk0.txt" --num-pairs 25 --type qa
3. error: Error: 'NoneType' object cannot be interpreted as an integer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: I faced the problem when run synthetic-data-kit CLI with API endpoint #56

Version

Operating System

Python Version

What happened?

Master configuration file for Synthetic Data Kit

Relevant log output

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: I faced the problem when run synthetic-data-kit CLI with API endpoint #56

Description

Version

Operating System

Python Version

What happened?

Master configuration file for Synthetic Data Kit

Relevant log output

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions