• About
  • Disclaimer
  • Privacy Policy
  • Contact
Sunday, June 15, 2025
Cyber Defense GO
  • Login
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration
No Result
View All Result
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration
No Result
View All Result
Cyber Defense Go
No Result
View All Result
Home Machine Learning

Behind the Magic: How Tensors Drive Transformers

Md Sazzad Hossain by Md Sazzad Hossain
0
Behind the Magic: How Tensors Drive Transformers
585
SHARES
3.2k
VIEWS
Share on FacebookShare on Twitter

You might also like

Bringing which means into expertise deployment | MIT Information

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

NVIDIA CEO Drops the Blueprint for Europe’s AI Growth


Transformers have modified the way in which synthetic intelligence works, particularly in understanding language and studying from information. On the core of those fashions are tensors (a generalized sort of mathematical matrices that assist course of info) . As information strikes via the completely different elements of a Transformer, these tensors are topic to completely different transformations that assist the mannequin make sense of issues like sentences or pictures. Studying how tensors work inside Transformers might help you perceive how immediately’s smartest AI programs really work and suppose.

What This Article Covers and What It Doesn’t

✅ This Article IS About:

  • The circulate of tensors from enter to output inside a Transformer mannequin.
  • Guaranteeing dimensional coherence all through the computational course of.
  • The step-by-step transformations that tensors endure in numerous Transformer layers.

❌ This Article IS NOT About:

  • A common introduction to Transformers or deep studying.
  • Detailed structure of Transformer fashions.
  • Coaching course of or hyper-parameter tuning of Transformers.

How Tensors Act Inside Transformers

A Transformer consists of two important elements:

  • Encoder: Processes enter information, capturing contextual relationships to create significant representations.
  • Decoder: Makes use of these representations to generate coherent output, predicting every factor sequentially.

Tensors are the elemental information constructions that undergo these elements, experiencing a number of transformations that guarantee dimensional coherence and correct info circulate.

Picture From Analysis Paper: Transformer normal archictecture

Enter Embedding Layer

Earlier than coming into the Transformer, uncooked enter tokens (phrases, subwords, or characters) are transformed into dense vector representations via the embedding layer. This layer capabilities as a lookup desk that maps every token vector, capturing semantic relationships with different phrases.

Picture by creator: Tensors passing via Embedding layer

For a batch of 5 sentences, every with a sequence size of 12 tokens, and an embedding dimension of 768, the tensor form is:

  • Tensor form: [batch_size, seq_len, embedding_dim] → [5, 12, 768]

After embedding, positional encoding is added, guaranteeing that order info is preserved with out altering the tensor form.

Modified Picture from Analysis Paper: State of affairs of the workflow

Multi-Head Consideration Mechanism

Probably the most vital elements of the Transformer is the Multi-Head Consideration (MHA) mechanism. It operates on three matrices derived from enter embeddings:

  • Question (Q)
  • Key (Okay)
  • Worth (V)

These matrices are generated utilizing learnable weight matrices:

  • Wq, Wk, Wv of form [embedding_dim, d_model] (e.g., [768, 512]).
  • The ensuing Q, Okay, V matrices have dimensions 
    [batch_size, seq_len, d_model].
Picture by creator: Desk displaying the shapes/dimensions of Embedding, Q, Okay, V tensors

Splitting Q, Okay, V into A number of Heads

For efficient parallelization and improved studying, MHA splits Q, Okay, and V into a number of heads. Suppose we’ve got 8 consideration heads:

  • Every head operates on a subspace of d_model / head_count.
Picture by creator: Multihead Consideration
  • The reshaped tensor dimensions are [batch_size, seq_len, head_count, d_model / head_count].
  • Instance: [5, 12, 8, 64] → rearranged to [5, 8, 12, 64] to make sure that every head receives a separate sequence slice.
Picture by creator: Reshaping the tensors
  • So every head will get the its share of Qi, Ki, Vi
Picture by creator: Every Qi,Ki,Vi despatched to completely different head

Consideration Calculation

Every head computes consideration utilizing the components:

As soon as consideration is computed for all heads, the outputs are concatenated and handed via a linear transformation, restoring the preliminary tensor form.

Picture by creator: Concatenating the output of all heads
Modified Picture From Analysis Paper: State of affairs of the workflow

Residual Connection and Normalization

After the multi-head consideration mechanism, a residual connection is added, adopted by layer normalization:

  • Residual connection: Output = Embedding Tensor + Multi-Head Consideration Output
  • Normalization: (Output − μ) / σ to stabilize coaching
  • Tensor form stays [batch_size, seq_len, embedding_dim]
Picture by creator: Residual Connection

Feed-Ahead Community (FFN)

Within the decoder, Masked Multi-Head Consideration ensures that every token attends solely to earlier tokens, stopping leakage of future info.

Modified Picture From Analysis Paper: Masked Multi Head Consideration

That is achieved utilizing a decrease triangular masks of form [seq_len, seq_len] with -inf values within the higher triangle. Making use of this masks ensures that the Softmax operate nullifies future positions.

Picture by creator: Masks matrix

Cross-Consideration in Decoding

For the reason that decoder doesn’t absolutely perceive the enter sentence, it makes use of cross-attention to refine predictions. Right here:

  • The decoder generates queries (Qd) from its enter ([batch_size, target_seq_len, embedding_dim]).
  • The encoder output serves as keys (Ke) and values (Ve).
  • The decoder computes consideration between Qd and Ke, extracting related context from the encoder’s output.
Modified Picture From Analysis Paper: Cross Head Consideration

Conclusion

Transformers use tensors to assist them be taught and make sensible selections. As the info strikes via the community, these tensors undergo completely different steps—like being was numbers the mannequin can perceive (embedding), specializing in vital elements (consideration), staying balanced (normalization), and being handed via layers that be taught patterns (feed-forward). These adjustments maintain the info in the proper form the entire time. By understanding how tensors transfer and alter, we will get a greater thought of how AI fashions work and the way they’ll perceive and create human-like language.

Tags: DriveMagicTensorsTransformers
Previous Post

Microsoft Analysis Introduces MMInference to Speed up Pre-filling for Lengthy-Context Imaginative and prescient-Language Fashions

Next Post

6 Widespread Errors to Keep away from When Creating a Knowledge Technique

Md Sazzad Hossain

Md Sazzad Hossain

Related Posts

Bringing which means into expertise deployment | MIT Information
Machine Learning

Bringing which means into expertise deployment | MIT Information

by Md Sazzad Hossain
June 12, 2025
Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options
Machine Learning

Google for Nonprofits to develop to 100+ new international locations and launch 10+ new no-cost AI options

by Md Sazzad Hossain
June 12, 2025
NVIDIA CEO Drops the Blueprint for Europe’s AI Growth
Machine Learning

NVIDIA CEO Drops the Blueprint for Europe’s AI Growth

by Md Sazzad Hossain
June 14, 2025
When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025
Machine Learning

When “Sufficient” Nonetheless Feels Empty: Sitting within the Ache of What’s Subsequent | by Chrissie Michelle, PhD Survivors Area | Jun, 2025

by Md Sazzad Hossain
June 10, 2025
Decoding CLIP: Insights on the Robustness to ImageNet Distribution Shifts
Machine Learning

Apple Machine Studying Analysis at CVPR 2025

by Md Sazzad Hossain
June 14, 2025
Next Post
6 Widespread Errors to Keep away from When Creating a Knowledge Technique

6 Widespread Errors to Keep away from When Creating a Knowledge Technique

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Kinesiska MiniMax lanserar öppna källkodsmodeller

LightLab: ljusmanipulering i bilder med diffusionsbaserad teknik

May 20, 2025
Google for Startups opens new program for govtech startups

Google for Startups opens new program for govtech startups

May 22, 2025

Categories

  • Artificial Intelligence
  • Computer Networking
  • Cyber Security
  • Data Analysis
  • Disaster Restoration
  • Machine Learning

CyberDefenseGo

Welcome to CyberDefenseGo. We are a passionate team of technology enthusiasts, cybersecurity experts, and AI innovators dedicated to delivering high-quality, insightful content that helps individuals and organizations stay ahead of the ever-evolving digital landscape.

Recent

Ctrl-Crash: Ny teknik för realistisk simulering av bilolyckor på video

June 15, 2025
Addressing Vulnerabilities in Positioning, Navigation and Timing (PNT) Companies

Addressing Vulnerabilities in Positioning, Navigation and Timing (PNT) Companies

June 14, 2025

Search

No Result
View All Result

© 2025 CyberDefenseGo - All Rights Reserved

No Result
View All Result
  • Home
  • Cyber Security
  • Artificial Intelligence
  • Machine Learning
  • Data Analysis
  • Computer Networking
  • Disaster Restoration

© 2025 CyberDefenseGo - All Rights Reserved

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In