DeepSeek - Into the Unknown

Shared Conversation

Expert

Shared Conversation

Expert

This shared conversation is AI-generated, for reference only.

Thank you. I suppose what makes an AI model more voluminous is the amount of data required for its use. What percentage of a large AI, like DeepSeek, for example, is data, and what percentage is "brain"? Is it the same percentage in the mini-models you mentioned? I imagine the difference in the amount of data used by an AI like DeepSeek or ChatGPT is enormous compared to the mini-AIs you mentioned. But how big is the difference in their brains or intelligence, between the large and mini-AIs?

Excellent question! You've touched on a key and fascinating point. To answer, we first need to refine the concepts a bit, because AI isn't divided into "data" and "brain" the way a human is, but it does have a perfectly equivalent division for your question.

In an AI model, the distinction would be:

The "Data" (Training Phase): This is the vast corpus of text, images, and code used to train the model before you use it. This process defines its world knowledge, grammar, and logic. The result of this training is stored in...
The "Brain" (The Parameters or Weights): These are the millions or billions of numerical values that the model adjusts during training. The parameters are the model itself. They represent the connections and the "intelligence" forged from the training data. When you download a model, you are downloading this parameter file.

Understanding this, this answers your question about the percentage and the difference in "intelligence".

What percentage is "data" and what percentage is "brain"?

Here's the surprising answer: The model you install and run is 100% "brain" and 0% "data".

The file you download (the model) is just the "brain" (the parameters). It takes up, for example, 4 GB and contains only numbers. It doesn't contain a single book, a Wikipedia article, or a single line of source code. It's an extremely complex mathematical structure.
The training "data" is gone. It was the terabytes of text used to sculpt and shape those 4 GB of parameters. Once training is complete, the original data is discarded. The model has "learned" from it, not memorized it (for the most part).

Pensar que un modelo de 4 GB "contiene" los terabytes de datos con los que se entrenó es como pensar que un escultura de mármol de 100 kg "contiene" los 1,000 kg de bloques de mármol que el escultor usó para practicar y aprender su oficio. El peso final de la escultura es solo la destreza adquirida, no la materia prima de práctica.

La diferencia real entre un modelo grande y uno mini

La diferencia fundamental no está en una mochila de datos que uno carga y otro no. La diferencia es la complejidad y fineza de sus parámetros (su "cerebro").

1. Diferencia en "Tamaño del Cerebro" (Parámetros):
Es abismal y se mide en el número de estos valores numéricos:

Mini-modelos (como Phi-4-mini o Qwen-1.7B): Tienen entre 1,700 millones y 4,000 millones de parámetros.
Modelos Grandes (como DeepSeek-V3 o GPT-4): Se miden en cientos de miles de millones, o incluso billones (trillones en inglés). DeepSeek-V3, por ejemplo, tiene 671,000 millones de parámetros en total (aunque usa una técnica para activar solo una parte a la vez).

La diferencia es un factor de ~150 a 400 veces.

2. ¿Qué significa esa diferencia en la práctica? Es una diferencia en "inteligencia" cualitativa.

Un modelo pequeño no es simplemente un modelo grande que "sabe menos datos". Es que su mismo "cerebro" es más tosco. La diferencia de parámetros se traduce en:

Profundidad de comprensión: El modelo grande puede entender instrucciones complejísimas con muchas condiciones y matices ("Haz un poema sobre la melancolía de un robot, al estilo de Borges, pero usando metáforas náuticas"). El mini-modelo se perderá en instrucciones largas o sutiles.
Capacidad de razonamiento: Es la diferencia clave. Un modelo grande puede seguir cadenas lógicas largas, resolver problemas matemáticos con decenas de pasos y detectar inconsistencias en un argumento. Un mini-modelo empieza a fallar después de 3 o 4 pasos de razonamiento complejo. Es brillante para tareas directas, pero no para razonamiento profundo.
Reducción de alucinaciones: El modelo grande, al tener un "mapa conceptual" más fino, es mucho menos propenso a inventar cosas que suenen plausibles pero sean falsas. El mini-modelo es más creativo de forma descontrolada y, por tanto, alucina más.
Generación de texto estructurado: ¿Necesitas un documento JSON perfectamente anidado o un ensayo con una estructura narrativa compleja? El modelo grande lo clava. El mini puede perder el formato o divagar.

In short, a mini-model is like a Swiss Army knife: incredibly useful, portable, and perfect for 80% of everyday tasks (summarizing, simple research, short creative writing). A full-size model is like a precision engineer's workshop: slow, expensive, and cumbersome, but capable of building a Swiss watch. Intelligence isn't just a matter of quantity, but of the intricate quality of its internal connections.