One of the ways to mitigate the core issue of an LLM, which is confabulation/inaccuracy, is to have a layer of either confirmation or simply forgiveness intrinsic to the task. Use the favor test. If you asked a friend to do you a favor and perform these actions, they’d give you results that you can either/both look over yourself to confirm they’re correct enough, or you’re willing to simply live with minor errors. If that works for you, go for it. But if you’re doing something that absolutely 100% must be correct, you are entirely dependent on independently reviewing the results.
But one thing Apple is doing is training LLMs with action semantics, so you don’t have to think of its output as strictly textual. When you’re dealing with computers, the term “language” is much looser than you or I tend to understand it. You can have a “grammar” that is inclusive of the entirety of the English language but also includes commands and parameters, for example. So it will kinda speak English, but augmented with the ability to access data and perform actions within iOS as well.
How would it do that? Would LLMs not just take input as voice or text and then guess an output as text?
Wouldn’t the text output that is suppose to be commands for action, need to be correct and not a guess?
It’s the whole guessing part that makes LLMs not useful, so imo they should only be used to improve stuff we already need to guess.
One of the ways to mitigate the core issue of an LLM, which is confabulation/inaccuracy, is to have a layer of either confirmation or simply forgiveness intrinsic to the task. Use the favor test. If you asked a friend to do you a favor and perform these actions, they’d give you results that you can either/both look over yourself to confirm they’re correct enough, or you’re willing to simply live with minor errors. If that works for you, go for it. But if you’re doing something that absolutely 100% must be correct, you are entirely dependent on independently reviewing the results.
But one thing Apple is doing is training LLMs with action semantics, so you don’t have to think of its output as strictly textual. When you’re dealing with computers, the term “language” is much looser than you or I tend to understand it. You can have a “grammar” that is inclusive of the entirety of the English language but also includes commands and parameters, for example. So it will kinda speak English, but augmented with the ability to access data and perform actions within iOS as well.