Inside the GPT-3
You must log in or register to comment.
What are you eating which needs that large of a napkin?
I’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.