Need these kernels in priority: 1. Batch invariant allreduce 2. Batch invariant reduce-scatter 3. batch invariant MoE kernels 4. Batch invariant quantized gemms