https://arxiv.org/abs/2106.11342 https://d2l.ai/ google collab for the transformer model: https://colab.research.google.com/github/d2l-ai/d2l-pytorch-colab/blob/master/chapter_attention-mechanisms-and-transformers/transformer.ipynb