Skip to content

Questions about KAdaptation implementation #6

@vishaal27

Description

@vishaal27

Hi, thanks for the great work and releasing the code to reproduce it.

I have a few questions regarding the kronecker adaptation forward pass through the adapter modules:

(1) The scaling factor you use for the KAdaptation is 1/5 times the scaling used in standard LoRA:

scale_factor = self.lora_attn_alpha / self.lora_attn_dim * 5

Is there a justification for this or is it simply an empirical magic number?

(2) While forwarding through your adapter for the value matrix, it seems like you reuse the query weight matrix (A as defined in the paper as I understand it). Is this a typo/bug?

"Perform kronecker adaptation to Q and K matrices"
if matrix == 'q':
if self.factorized_phm_rule:
phm_rule1 = torch.bmm(self.phm_rule1_left, self.phm_rule1_right)
H = kronecker_product_einsum_batched(phm_rule1, Wq).sum(0)
elif matrix == 'v':
if self.factorized_phm_rule:
phm_rule2 = torch.bmm(self.phm_rule2_left, self.phm_rule2_right)
H = kronecker_product_einsum_batched(phm_rule2, Wq).sum(0)

Shouldn't line 580 be H = kronecker_product_einsum_batched(phm_rule2, Wv).sum(0) instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions