DeepSeek - Into the Unknown

Shared Conversation

Expert

Shared Conversation

Expert

This shared conversation is AI-generated, for reference only.

Thank you. To clarify, when you say "Input: The conversation has the context 'Write a short sentence about an animal.'", do you mean that a user of an AI chatbot asks it to write a short sentence about an animal? Then, when you say "Prediction 1: The model processes this instruction. The artisans' workshop generates its forecast. The highest probabilities for the first token are: 'The' (75%)," which forecast are you referring to? And what probabilities are you referring to? And when you talk about "Autoregression," what do you mean by "context"?

Yes, we'll go over each of those points. You've identified the key concepts for understanding the process.

1. On the "Entry" and the "Context"

Yes, exactly. When I say "The conversation has the context 'Write a short sentence about an animal'", I mean that a user has typed that instruction into an AI chatbot.

At that point, the "context" is the entire text of the conversation. If it's the first interaction, the context will only be that instruction, technically formatted for the model, for example: [Usuario] Escribe una frase corta sobre un animal. [Asistente].

This [Asistente]is the marker that tells the model: "Now it's your turn, start generating." This entire context is the "shipment" sent to our City-AI customs.

2. On "Forecast" and "Probabilities"

Here's the technical core of the matter. The "forecast" and "probabilities" in the example are a didactic simplification, not a quote from a real model, but they perfectly describe what happens.

The forecast is the result of the Softmax mathematical function applied in the model's output layer. The output layer receives the context vector (the "final packet" that represents the meaning of the user's instruction) and produces a probability distribution over absolutely all the tokens in the vocabulary (which can be 50,000, 100,000, or more).

The probabilities are, literally, a number between 0 and 1 (or 0% and 100%) assigned to each token. It's the model's statistical confidence that that specific token is the most suitable continuation for the sequence.

When I write "The" (75%) in the example, I mean that:

The mathematical representation of the instruction "Write a sentence..." activates certain patterns in the model parameters.
These patterns determine that, when starting a sentence in Spanish after that instruction, the article "El" is an extremely common and grammatically correct option.
The model doesn't "know" Spanish, but it has seen millions of examples where a similar instruction is followed by an article like "El". The 75% probability reflects that statistical regularity.
The other tokens ("Un" with 15%, "La" with 8%) are also valid articles. The remaining 2% is divided among the other 99,997 tokens in the vocabulary.

3. On "Self-Regression" and the expanded "Context"

Autoregression is the fundamental loop. It literally means "regression upon itself," and implies that the output of the system in one step becomes the input of the system in the next step.

The "context" is the complete text that the model "sees" at each step. It is not static; it grows with each word generated. The complete sequence of the process is:

Step 1: The context is: "... [Asistente]". The model predicts and generates "The".
Step 2: The token "El" is concatenated to the end of the context. Now the context is: "... [Asistente] El". The model predicts and generates "cat", based not only on the initial instruction, but also on the fact that the last word generated was "El".
Step 3: The new context is: "... El gato". The model predicts and generates "sleep".
Step 4: The new context is: "... El gato duerme". The model predicts that the most likely token is ".", which closes the sentence.

Each step is a complete new run of the model, which processes all the text generated up to that point to decide on the next piece. The AI "rereads" the entire conversation plus its last word before deciding on the next one.