A calculator app is also incapable of working with letters, does that show that the calculator is not reliable?
What it shows, badly, is that LLMs offer confident answers in situations where their answers are likely wrong. But it’d be much better to show that with examples that aren’t based on inherent technological limitations.
The difference is that Google decided this was a task best suited for their LLM.
If someone seeked out an LLM specifically for this question, and Google didn’t market their LLM as an assistant that you can ask questions, you’d have a point.
But that’s not the case, so alas, you do not have a point.
A calculator app is also incapable of working with letters, does that show that the calculator is not reliable?
What it shows, badly, is that LLMs offer confident answers in situations where their answers are likely wrong. But it’d be much better to show that with examples that aren’t based on inherent technological limitations.
The difference is that Google decided this was a task best suited for their LLM.
If someone seeked out an LLM specifically for this question, and Google didn’t market their LLM as an assistant that you can ask questions, you’d have a point.
But that’s not the case, so alas, you do not have a point.