Seq2Seq

Info

Consists of two RNN models (an encoder and decoder)

Encoder learns contextual information from input words and then hands over the knowledge to the decoder side through a “context/thought vector”.

Decoder then consumes the context vector and generates thoughtful responses

Key Components

1. Encoder

The Encoder processes the input sequence $x = {x_{1}, x_{2}, ..., x_{T}}$ .
It encodes the sequence into a fixed-length context vector (also called the latent vector or hidden state), which summarizes the input sequence.

2. Decoder

The Decoder takes the context vector from the Encoder and generates the output sequence one step at a time.
It predicts the next token based on the current hidden state, the context vector, and the previously generated tokens.

How Seq2Seq Works

Step 1: Encoding

The input sequence $x = {x_{1}, x_{2}, ..., x_{T}}$ is passed through the Encoder (e.g., LSTM, GRU).
At each time step $t$ , the Encoder updates its hidden state $h_{t}$ based on the current input $x_{t}$ and the previous hidden state $h_{t - 1}$ : $h_{t}, c_{t} = LSTM (x_{t}, h_{t - 1}, c_{t - 1})$
The final hidden state $h_{T}$ and optionally the cell state $c_{T}$ are passed to the Decoder.

Step 2: Decoding

The Decoder uses the context vector (final hidden state of the Encoder) as its initial hidden state.
At each time step $t$ , the Decoder predicts the next token $y_{t}$ based on:
- The previous hidden state $h_{t - 1}$ ,
- The previously generated token $y_{t - 1}$ ,
- And optionally the context vector.
The prediction at each step is given by:
$\overset{y}{^}_{t} = Softmax (W h_{t} + b)$
The predicted token $\overset{y}{^}_{t}$ is fed back into the Decoder for the next time step (autoregressive modeling).

Key Concepts

1. Context Vector

The fixed-length vector (final hidden state of the Encoder) summarizes the entire input sequence.
Acts as the “memory” that the Decoder uses to generate the output sequence.

2. Teacher Forcing

During training, the Decoder is provided with the true previous token from the target sequence as input rather than its own prediction.
Helps stabilize training but creates differences between training and inference

Brayden Zhang

Explorer

Seq2Seq

Key Components

1. Encoder

2. Decoder

How Seq2Seq Works

Step 1: Encoding

Step 2: Decoding

Key Concepts

1. Context Vector

2. Teacher Forcing

Table of Contents

Graph View

Backlinks