If an LLM can’t be trusted with a fast food order, I can’t imagine what it is reliable enough for. I really was expecting this was the easy use case for the things.
It sounds like most orders still worked, so I guess we’ll see if other chains come to the same conclusion.
Capping waters fixes that one specific issue but not the problem.
A suspicious order isn’t easy to define and no person who has ever participated in software development would underestimate the infinite ways a User can break software.
There are machine learning algorithms for anomaly detection though. They actually work decently well because exploits like this do in fact differ significantly from regular orders. Because they assume all anomalies are attempted exploits, their false negative rate is rather low while their false positive rate can be a bit higher.
Taco Bell has the capability to create a decently large training set from all recorded orders (which must all be valid and non-malicious) so they shouldn’t have too many issues developing this model.
If an anomaly is detected, make a human verify it is indeed an irregular order.
Surely if the person making the order sees 18,000 waters they would think, hold on this doesn’t seem right maybe I should ask the customer if they really want 18,000 waters?
The same applies for the ice cream with bacon on it which was mentioned in the article. I believe a lot of these could be resolved with a bit of common sense.
If you think bacon on ice cream is weird enough to cancel an order, I can only imagine you’ve never worked a customer service job.
Does it, though? Unlike the 18,000 waters, if I were working a drive through I wouldn’t even blink at an order for bacon ice cream. Heck, I might make a little extra to try it for myself!
Sure, in the most extreme cases it would be obvious to the crew. But simply making mistakes at a higher rate than humans will result in a lot of unhappy customers.
Sure, but how do you distill this into a rule a computer can follow? “Suspicious” is not an objectively measurable thing that a program can just check against
Think the easiest way would be to collect order data for at least a good number of months if not a couple years and feed it in and use that as a baseline of what a typical human order looks like, anything that deviates too far from that baseline needs to be handled by a human until someone can validate it as a good order, though I imagine you could get false positives for new menu items unless you set a reasonable instruction for items that have never appeared in the dataset before.
there is an incredibly finite number of ways to mess with this, they just need a button to send a report to the engineers with how they got messed with and eventually they’ll have a complete list. I really doubt it’d take long to iron out the vast majority of ways that can be thought of.
This isn’t something you can input any text into, it’s fixed, that joke doesn’t apply, you can’t do an sql injection here.
Close one, a joke was related to but not a perfect match for the present situation. Something terrible could have happened like… Uh…
Let me get back to you on that.
Sounds like you’ve never programmed before.
If what I was saying was false, there would be the same issues with the touchscreen ones.