Update on the development branch #1316
kaiyux
announced in
Announcements
Replies: 1 comment
-
|
Does |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this March 19, 2024.
This update includes:
GptSessionwithout OpenMPI Run GptSession without openmpi? #1220executorAPI, see documentation and examples inexamples/bindingsexamples/gpt/README.mdfor the latest commandsexamples/qwen/README.mdfor the latest commands.trtllm-buildcommand, to generalize the feature better to more models.trtllm-build --max_prompt_embedding_table_sizeinstead.trtllm-build --world_sizeflag to--auto_parallelflag, the option is used for auto parallel planner only.AsyncLLMEngineis removed,tensorrt_llm.GenerationExecutorclass is refactored to work with both explicitly launching withmpirunin the application level, and accept an MPI communicator created bympi4pyexamples/serverare removed, seeexamples/appinstead.SamplingConfigtensors inModelRunnerCppModelRunnerCppdoes not transferSamplingConfigTensor fields correctly #1183examples/run.pyonly load one line from--input_filebenchmarks/cpp/README.mdnvcr.io/nvidia/pytorch:24.02-py3nvcr.io/nvidia/tritonserver:24.02-py3executorAPI, seedocs/source/executor.mdThanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions