Skip to content

Embedding+MLP , 99% accuracy, 3.03sec training #2

@vegansquirrel

Description

@vegansquirrel
Image

current approach - epochs :12
Baseline MLP also has early rise but suffers at the final stages around and beyond 90%.

The network learns a vector (embedding) for each possible integer value of a and b. It then concatenates those two vectors and uses a small MLP to map them to logits across mod_value classes .

embedding_dim = 16
Adamw: 5e-4 , w_d =1e-4

MLP head: sequence of Linear + ReLU (+ optional Dropout) layers, final Linear with output dimension mod_value (the logits).

Is this a record?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions