A transformer is a type of deep neural network architecture commonly used in natural language processing tasks such as text generation, classification, translation, and sentiment analysis. The primary function of the transformer is to encode sequential data inputs such as sentences or paragraphs into fixed-length vectors, which can then be processed by downstream layers for task-specific computations. Transformers achieve this encoding via self attention, a mechanism inspired by human cognition, allowing the model to weigh the importance of each word relative to others instead of relying exclusively on positional encodings or recurrent connections. Since transformers do not rely on explicit recurrence or convolutions, they offer improved parallelizability and generalize well across diverse NLP domains without requiring domain-specific fine-tuning. This has led to significant improvements in state-of-the-art results for several NLP benchmarks since their introduction in 2017 by Google researchers. While there are some variants, most transformer architectures follow a simple hierarchy of layers: Multi-Head Self Attention > Encoder Layer > Positional Embedding > Feedforward Network > Positional Dropout/Layer Normalization > Residual Connection/Skip Connection > Output Layers