Info

This project is just a self-test for trying to recreate the GPT2 model architecture
Will also try to add a pytorch data loader class (later) instead of the custom Data Loading followed in the initial tutorial
Currently not intending to set up a training loop in this project, if I do so, will add the validation split too

Results

Partial Success

Could remember the overall architecture (made a mistake in initial tryout of missing the final layer norm and lm_head)
Divided Attention into MultiHeads and Head in initial try
- This is okay, but having all heads operate as a singular matrix operation instead of a list is more optimal
- This also deviates from the structre of the original model
Could not remember the code for the buffer that masks out attention output
Missed out on adding residual connections in Block on first try
- Not a very bad miss, would have added if I had kept a diagram near me
- Need to add the normalization due to multiple residual connections on the variance

Fixed all diffs for the basic model

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
layers		layers
model.py		model.py
model_input.py		model_input.py
readme.md		readme.md