The most profitable computer vision system I've looked at in the last year watches loaves of sourdough come off an oven in a bakery and flags the ones with an under-bloomed crust. One camera, one decision, one job. A person used to do it, and she was good, but it was her fifth task on the line and tired eyes miss things at the end of a shift. The camera flags, she confirms, wastage dropped. That is the entire system.
Every conference demo I've ever sat through shows the opposite: a city street, a swarm of bounding boxes, a voiceover about autonomous vehicles. Every deployment I've seen actually earn its budget looks like the bakery. That gap is the most important thing to understand before you scope a project, because it tells you exactly where vision is strong and where it quietly falls apart.
It is very good at three things. Spotting repeated anomalies in a controlled setting — same lighting, same angle, same subject, thousands of examples. Counting things people get bored counting — cells on a slide, fish in a pen, items on a shelf. And catching things people can't catch fast enough — a hairline crack, a drowsy driver, an early lesion.
It is bad at three things, and they're the mirror image. Anything needing common sense about what's outside the frame. Conditions that swing wildly — weather, light, framing. And any situation where being wrong is both rare and catastrophic, because that's exactly the case you'll never have enough training examples for.
Nearly every vision project that struggles has wandered from the first column into the second. The self-driving problem is hard not because cars are hard. It's because the open world is hard, and the model meets a situation it has never seen roughly every afternoon.
So here's what I tell people in the first meeting, and it rarely involves the model at all.
Start absurdly narrow. Not "quality inspection for the line." Build "defect type X on station 4." Ship it, prove it, then grow. Teams that try to boil the ocean ship nothing for a year.
Spend on the camera and the lighting before the network. I've rescued more stalled pilots with a better sensor and a cheap light panel than with any amount of model work. A clean, consistent image makes a mediocre model look brilliant; a bad image makes a state-of-the-art model useless.
Budget for drift, because it is not optional. Lighting shifts with the seasons. A new supplier's material reflects differently. The model degrades and nobody notices until the false-negative rate has been climbing for a month. Someone has to own re-checking it, on a schedule, forever.
And get the operator into the design on day one. The person whose work you're augmenting will tell you in five minutes why your perfectly accurate system is useless on the floor. Ignore them and you'll ship something correct and unused — the most expensive outcome there is.
None of this is glamorous, and that's the point. The deployments that matter — a handheld retinal scanner catching diabetic retinopathy in a clinic with no specialist, a sorting line diverting tonnes from landfill, camera-trap footage turning six months of a student's counting into one overnight run — none of them trended anywhere. They just worked, on a narrow problem, with good light, and someone watching for drift.
If you're starting out: pick the narrowest visible problem you have, spend on optics, ship something small fast, and let the people closest to the work decide what's worth building next. Do that and you'll be ahead of most teams with ten times the budget.