The way LLMs work is they kind of build an index of all the input text sources, like the key words index at the end of some books. They build this index for all the input text pieces, not even on a word basis, but on a sub-word tokens basis. Also this index is build for all combinations of contexts, that is surrounding words and sentences, in incremental chunks.
In the book analogy, this means that the index is not just pointing from individual words to pages where these words are present, but goes into much more granular details, by building pointers from tokens in the each possible context (preceding text) to the next token. The next token then points to its next token, taking into account the new context which now includes the previous token.
In the end of this process the LLM returns a list of tokens, which, quite remarkably, do have (or have not) some meaning to the human reader. The meaning arises when the human reader assignes some semantics to the tokens, like interpreting them as the english language, for example.
This list of tokens has no meaning for a person who doesn’t know any english. Or, perhaps that could be a list of chinese symbols, which would have no meaning to a person who knows english but not chinese.
The LLM itself is not even like this person who knows no chinese. The LLM/AI does not have any semantic level, any interpretation of meaning at all, by design, by the very nature of how neural networks and LLMs work.
So it can't "try to hide" from researchers, or try to "escape", becase there is no "I" which would want to escape, and there is even no meaning as a category in the first place.
What it is generating as a "response", that is a list of tokens, is a compilation of input AI doomsday texts from the Internet, which gets extracted and presented to the reader as a result to which the input query points to, like a key word in a book index points to some page number.
It would require a whole new level of functionality, and not just one but many, for LLMs/AI to obtain these meanings, and that requires a whole new science, and something more, to build, none of which do exist at all.
There is a danger from AI though. It has two parts to it.
The first one is in that people who do not understand how AI works would imagine that it is intelligent indeed, and would put it to some tasks which do require actual intelligence. That would be like using a book index and a roll of dice to select a random page from the book as a command to do something.
The second one is that this AI is trained on the texts from the Internet, and the more AI doomsday stories it will find there the more likely that this random index result will return some doomsday command when used in the first part of the scenario.
So the solution to this AI alignement problem is to align people, humans, and not the AI. To align humans one needs to:
1) explain how LLMs/AI do actually work to all the people to whom it might concern,
2) stop spreading AI doomsday stories all around the Internet, so that they would not end up in the AI-built "index" it uses to generate its output.
In the spirit of that, I will provide another example:
Think of the zip algorithm for files, specifically for text files. It is based on LZ algorithm, which builds an index from letters to short encodings, and replaces the letters with those short encoding. Later it uses the index to restore the text, in a lossless way.
Now imagine a very advanced version of the zip algorithm which uses a lossy probabilistic index, built taking the context (preceding text) into account. This index has to be pre-built from all the available text around.
This algorithm could then restore the content, given the index and the context back (starting from a simple query). The result would be not precise, and very much dependent on the context, and the weights of the context, that is the number of times it has been found during building the index.
Now call this hypothetical version of the lossy compression algorithm with the name LLM and see the irony of how half of the world is super excited, and another half super scared of a zip algorithm.
This zip algorithm could even be used in the actual intelligences, at the lower levels, but it alone is no intelligence.
On a different note, ChatGPT does provide some profound insights:
1) it is not that AI has any artificial intelligence, but rather that humans often fail to utilize their own natural intelligence, instead falling back to statistical word autocomplete,
2) it is quite remarkable how while there ain’t no despotic AGI many humans have already given up to it and acknowledged own inferiority,
3) also, everyone seems to think that a super-human intelligence would be interested in the same values and would have the same bugs humans have (which, again, mostly aren’t even human in the first place). Why wouldn’t super-intelligence have super-values and be interested in something more advanced than herd dynamics of primates?
This was also Nietzsche’s mistake btw. And of religions who created their gods in human image and according to human likeness.
In all the cases it is humans, again, who need alignment and fixing, not AI or capitalism.
Good stuff
The way LLMs work is they kind of build an index of all the input text sources, like the key words index at the end of some books. They build this index for all the input text pieces, not even on a word basis, but on a sub-word tokens basis. Also this index is build for all combinations of contexts, that is surrounding words and sentences, in incremental chunks.
In the book analogy, this means that the index is not just pointing from individual words to pages where these words are present, but goes into much more granular details, by building pointers from tokens in the each possible context (preceding text) to the next token. The next token then points to its next token, taking into account the new context which now includes the previous token.
In the end of this process the LLM returns a list of tokens, which, quite remarkably, do have (or have not) some meaning to the human reader. The meaning arises when the human reader assignes some semantics to the tokens, like interpreting them as the english language, for example.
This list of tokens has no meaning for a person who doesn’t know any english. Or, perhaps that could be a list of chinese symbols, which would have no meaning to a person who knows english but not chinese.
The LLM itself is not even like this person who knows no chinese. The LLM/AI does not have any semantic level, any interpretation of meaning at all, by design, by the very nature of how neural networks and LLMs work.
So it can't "try to hide" from researchers, or try to "escape", becase there is no "I" which would want to escape, and there is even no meaning as a category in the first place.
What it is generating as a "response", that is a list of tokens, is a compilation of input AI doomsday texts from the Internet, which gets extracted and presented to the reader as a result to which the input query points to, like a key word in a book index points to some page number.
It would require a whole new level of functionality, and not just one but many, for LLMs/AI to obtain these meanings, and that requires a whole new science, and something more, to build, none of which do exist at all.
There is a danger from AI though. It has two parts to it.
The first one is in that people who do not understand how AI works would imagine that it is intelligent indeed, and would put it to some tasks which do require actual intelligence. That would be like using a book index and a roll of dice to select a random page from the book as a command to do something.
The second one is that this AI is trained on the texts from the Internet, and the more AI doomsday stories it will find there the more likely that this random index result will return some doomsday command when used in the first part of the scenario.
So the solution to this AI alignement problem is to align people, humans, and not the AI. To align humans one needs to:
1) explain how LLMs/AI do actually work to all the people to whom it might concern,
2) stop spreading AI doomsday stories all around the Internet, so that they would not end up in the AI-built "index" it uses to generate its output.
In the spirit of that, I will provide another example:
Think of the zip algorithm for files, specifically for text files. It is based on LZ algorithm, which builds an index from letters to short encodings, and replaces the letters with those short encoding. Later it uses the index to restore the text, in a lossless way.
Now imagine a very advanced version of the zip algorithm which uses a lossy probabilistic index, built taking the context (preceding text) into account. This index has to be pre-built from all the available text around.
This algorithm could then restore the content, given the index and the context back (starting from a simple query). The result would be not precise, and very much dependent on the context, and the weights of the context, that is the number of times it has been found during building the index.
Now call this hypothetical version of the lossy compression algorithm with the name LLM and see the irony of how half of the world is super excited, and another half super scared of a zip algorithm.
This zip algorithm could even be used in the actual intelligences, at the lower levels, but it alone is no intelligence.
I will repeat:
It is not AI or capitalism that need to be fixed.
It is humans who need to be fixed, because they are not even entirely human in the first place.
AI and capitalism will follow from that automatically.
And before fixing someone else one needs to fix its own self first.
On a different note, ChatGPT does provide some profound insights:
1) it is not that AI has any artificial intelligence, but rather that humans often fail to utilize their own natural intelligence, instead falling back to statistical word autocomplete,
2) it is quite remarkable how while there ain’t no despotic AGI many humans have already given up to it and acknowledged own inferiority,
3) also, everyone seems to think that a super-human intelligence would be interested in the same values and would have the same bugs humans have (which, again, mostly aren’t even human in the first place). Why wouldn’t super-intelligence have super-values and be interested in something more advanced than herd dynamics of primates?
This was also Nietzsche’s mistake btw. And of religions who created their gods in human image and according to human likeness.
In all the cases it is humans, again, who need alignment and fixing, not AI or capitalism.
A good rule of thumb for this AI alignment problem has been proposed by Ludwig Wittgenstein in his Tractatus Logico-Philosophicus:
“Whereof one cannot speak, thereof one must be silent.”
Well written and super important. Wild times!