[Bug]: LoRA Support Missing for Encoder-Decoder Models in TensorRT-LLM CPP Implementation

### System Info

NVIDIA Driver: 580.105.08

CUDA Version: 13.0

GPU: RTX 3090

Triton Server Image: [nvcr.io/nvidia/tritonserver:25.05-trtllm-python-py3](https://nvcr.io/nvidia/tritonserver:25.05-trtllm-python-py3)

### Who can help?

@laikhtewari

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. Build the TensorRT-LLM engine using the following command:
```
trtllm-build --checkpoint_dir $TLLM_MODEL_DIR \
             --output_dir $TLLM_ENGINE_DIR \
             --moe_plugin disable \
             --max_beam_width ${MAX_BEAM_WIDTH} \
             --max_batch_size 64 \
             --max_input_len 1 \
             --max_seq_len 300 \
             --max_encoder_input_len 1000 \
             --gemm_plugin ${INFERENCE_PRECISION} \
             --bert_attention_plugin ${INFERENCE_PRECISION} \
             --gpt_attention_plugin ${INFERENCE_PRECISION} \
             --lora_plugin float16 \
             --max_lora_rank 64 \
             --lora_target_modules attn_q attn_k attn_v attn_dense mlp_h_to_4h mlp_4h_to_h cross_attn_q cross_attn_k cross_attn_v cross_attn_dense
```
2. Deploy the generated engine using Triton Server with tensorrtllm backend and inflight_fused_batching
3. Send a request to the deployed model.



### Expected behavior

The TensorRT-LLM engine should properly support LoRA for encoder-decoder models




### actual behavior

triton server error 
```
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Input tensor 'host_encoder_input_lengths' not found; expected shape: (-1) (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:524)
1       0x7f467669df2b tensorrt_llm::runtime::TllmRuntime::setInputTensorsImpl(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&, bool) + 827
2       0x7f46766a0ed6 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 70
```

### additional notes

1. The same configuration works for decoder-only models with LoRA.
2. The issue specifically affects encoder-decoder architectures when LoRA is enabled.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: LoRA Support Missing for Encoder-Decoder Models in TensorRT-LLM CPP Implementation #10258

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: LoRA Support Missing for Encoder-Decoder Models in TensorRT-LLM CPP Implementation #10258

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions