I mean, you’ve never seen a purple elephant with a tennis racket. None of that exists in the data set since elephants are neither purple nor tennis players. Exposure to all the individual elements allows for generation of concepts outside the existing data, even though they don’t exit in reality or in the data set.
Try to get an image generator to create an image of a tennis racket, with all racket-like objects or relevant sport data removed from the training data.
Explain the concept to it with words alone, accurately enough to get something that looks exactly like the real thing. Maybe you can give it pictures, but one won’t really be enough, you’ll basically have to give it that chunk of training data you removed.
That’s the problem you’ll run into the second you want to realize a new game genre.
There are more forms of guidance than just raw words. Just off the top of my head, there’s inpainting, outpainting, controlnets, prompt editing, and embeddings. The researchers who pulled this off definitely didn’t do it with text prompts.
But at what point does that guidance just become the dataset you removed from the training data?
The whole point is that it didn’t know the concepts beforehand, and no it doesn’t become the dataset. Observations made of the training data are added to the model’s weights after training, the dataset is never relevant again as the model’s weights are locked in.
To get it to run Doom, they used Doom.
To realize a new genre, you’ll “just” have to make that game the old fashion way, first.
Or you could train a more general model. These things happen in steps, research is a process.
I know the input doesn’t alter the model, that’s not what I mean.
And “general” models are only “general” in the sense that they are massively bloated and still crap at dealing with shit that they weren’t trained on.
And no, “comprehending” new concepts by palette swapping something and smashing two existing things together isn’t the kind of creativity I’m saying these systems are incapable of.
Bloated, as in large and heavy. More expensive, more power hungry, less efficient.
I already brought it up. They can’t deal with something completely new.
When you discuss what you want with a human artist or programmer or whatever, there is a back and forth process where both parties explain and ask until comprehension is achieved, and this improves the result. The creativity on display is the kind that can unfold and realize a complex idea based on simple explanations even when it is completely novel.
It doesn’t matter if the programmer has played games with regenerating health before, one can comprehend and implement the concept based on just a couple sentences.
Now how would you do the same with a “general” model that didn’t have any games that work like that in the training data?
My point is that “general” models aren’t a thing. Not really. We can make models that are really, really big, but they remain very bad at filling in gaps in reality that weren’t in the training data. They don’t start magically putting two and two together and comprehending all the rest.
I mean, you’ve never seen a purple elephant with a tennis racket. None of that exists in the data set since elephants are neither purple nor tennis players. Exposure to all the individual elements allows for generation of concepts outside the existing data, even though they don’t exit in reality or in the data set.
Ok.
Try to get an image generator to create an image of a tennis racket, with all racket-like objects or relevant sport data removed from the training data.
Explain the concept to it with words alone, accurately enough to get something that looks exactly like the real thing. Maybe you can give it pictures, but one won’t really be enough, you’ll basically have to give it that chunk of training data you removed.
That’s the problem you’ll run into the second you want to realize a new game genre.
There are more forms of guidance than just raw words. Just off the top of my head, there’s inpainting, outpainting, controlnets, prompt editing, and embeddings. The researchers who pulled this off definitely didn’t do it with text prompts.
Obviously.
But at what point does that guidance just become the dataset you removed from the training data?
To get it to run Doom, they used Doom.
To realize a new genre, you’ll “just” have to make that game the old fashion way, first.
The whole point is that it didn’t know the concepts beforehand, and no it doesn’t become the dataset. Observations made of the training data are added to the model’s weights after training, the dataset is never relevant again as the model’s weights are locked in.
Or you could train a more general model. These things happen in steps, research is a process.
You are completely missing what I’m saying.
I know the input doesn’t alter the model, that’s not what I mean.
And “general” models are only “general” in the sense that they are massively bloated and still crap at dealing with shit that they weren’t trained on.
And no, “comprehending” new concepts by palette swapping something and smashing two existing things together isn’t the kind of creativity I’m saying these systems are incapable of.
What kind of creativity are you talking about then? I’ve also never heard of a bloated model. Which models are bloated?
Bloated, as in large and heavy. More expensive, more power hungry, less efficient.
I already brought it up. They can’t deal with something completely new.
When you discuss what you want with a human artist or programmer or whatever, there is a back and forth process where both parties explain and ask until comprehension is achieved, and this improves the result. The creativity on display is the kind that can unfold and realize a complex idea based on simple explanations even when it is completely novel.
It doesn’t matter if the programmer has played games with regenerating health before, one can comprehend and implement the concept based on just a couple sentences.
Now how would you do the same with a “general” model that didn’t have any games that work like that in the training data?
My point is that “general” models aren’t a thing. Not really. We can make models that are really, really big, but they remain very bad at filling in gaps in reality that weren’t in the training data. They don’t start magically putting two and two together and comprehending all the rest.
Do you have any examples of how they fail? There are plenty of ways to explain new concepts to models.
https://arxiv.org/abs/2404.19427 https://arxiv.org/abs/2406.11643 https://arxiv.org/abs/2403.12962 https://arxiv.org/abs/2404.06425 https://arxiv.org/abs/2403.18922 https://arxiv.org/abs/2406.01300