Hi authors, thank you for your excellent work on ∞BENCH — it's an impressive and much-needed benchmark for evaluating long-context capabilities of LLMs.
While reading the paper and exploring the dataset, I noticed that the QA tasks (e.g., En.QA, Zh.QA, En.MC) do not include annotations for the evidence sentences or supporting spans from the long context that lead to the correct answer.
Do you have this information internally, or is there any plan to release it?
Thank you again for your great work, and looking forward to your thoughts!
Best regards,