Skip to content

Conversation

@majiayu000
Copy link

Summary

  • Fixed _process_colbert_vecs to correctly exclude EOS token
  • Changed from tokens_num - 1 to tokens_num - 2

Problem

The function was including the EOS token in the output. Since CLS is already excluded by colbert_embedding (via last_hidden_state[:, 1:]), using tokens_num - 1 still included EOS.

Test plan

  • Added unit tests verifying EOS exclusion
  • Tests pass locally

Fixes #1490

The _process_colbert_vecs function was incorrectly including the EOS
token. Since CLS is already excluded in colbert_embedding (via
last_hidden_state[:, 1:]), we need to use tokens_num - 2 instead of
tokens_num - 1 to also exclude the EOS token.

Fixes FlagOpen#1490

Signed-off-by: majiayu000 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_process_colbert_vecs function includes eos token

1 participant