-
Notifications
You must be signed in to change notification settings - Fork 629
Closed
Labels
RFCRequest For CommentsRequest For Comments
Description
Motivation.
Disaggregated Encoder
A disaggregated encoder runs the vision-encoder stage of a multimodal LLM in a process that is separate from the prefill / decoder stage. Deploying these two stages in independent vLLM instances brings three practical benefits:
- Independent, fine-grained scaling
- Lower time-to-first-token (TTFT)
- Cross-process reuse and caching of encoder outputs
Proposed Change.
Encoder-side (producer):
- Within execute_model, when get_ec_transfer().is_producer is True, the runner enters with maybe_get_ec_connector_output(..., encoder_cache=self.encoder_cache): before running the multimodal encoder.
- The encode pass computes embeddings and writes them into encoder_cache[mm_hash].
- Immediately after finishing the encode for a given mm_hash, the runner calls maybe_save_ec_to_connector(self.encoder_cache, mm_hash) which invokes ECConnectorBase.save_caches(encoder_cache=..., mm_hash=...).
- On context exit, wait_for_save() is invoked (if enabled) to ensure the persisted EC is durable/visible to consumers; get_finished(...) is queried to surface completion status back to the scheduler.
PD-side (consumer):
- For requests scheduled on PD, the scheduler supplies ec_connector_metadata that lists the mm_hash items needing loads.
- The runner binds this metadata and calls start_load_caches(encoder_cache=self.encoder_cache) prior to _gather_mm_embeddings, allowing the connector to populate encoder_cache[mm_hash] from the external store.
- _gather_mm_embeddings then reads the loaded tensors from encoder_cache and returns them as multimodal embeddings for the subsequent decoder input embedding construction.
- After the forward step, the runner clears metadata; any connector-furnished completion info is recorded into ECConnectorOutput for the scheduler to free resources when safe.
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments