February 20, 2021 · NLP

Natural language processing: what it does, what it fakes

Bhaskar Paratey
Bhaskar Paratey
CEO & Founder
Natural language processing: what it does, what it fakes

For decades, the smart approach to machine language understanding was to write down the rules — dictionaries, grammar parsers, hand-crafted logic — and it never really worked. Language is too slippery for rules. The same sentence shifts meaning with tone, context, who's speaking and what happened yesterday. The breakthrough was giving up on rules entirely: stop telling the machine how language works, show it an enormous amount of language, and let it learn the regularities itself.

That's the whole story of modern NLP, and it's worth being clear-eyed about what it bought us. Today's systems don't understand language the way you do. They've learned statistical patterns deep enough to translate, summarise, answer questions, fix grammar, and carry a surprising fraction of the conversations people now have with computers. Pattern, not comprehension. Hold onto that distinction — it predicts exactly where these systems shine and where they fall on their faces.

Where they earn their keep, mostly out of sight:

Translation is the obvious one — over a billion translations a day through one service alone, now good enough to change how people travel and how diasporas stay in touch with home. Imperfect, and every professional translator will tell you about the moment the machine dropped something that mattered. Good enough to be life-changing anyway.

Autocomplete and autocorrect, NLP so well integrated you forget it's there. That's the mark of decent engineering: it doesn't feel clever because it never gets in your way.

Search, which has to grasp that "best Italian restaurant near me" means the food, not the country. The ranking that gets that right is NLP doing real work.

Spam filtering is, pound for pound, probably the highest-value NLP deployment on earth, and nobody has ever thanked it. The reason your inbox isn't drowning is decades of language models quietly winning a war you never see.

And accessibility — live captioning, text-to-speech, plain-language rewriting — which has genuinely changed daily life for people previous technology served badly. This is the part I'd defend hardest if the budget were being cut.

Where it still falls over, and you should know before you ship:

Low-resource languages. The big models train on trillions of words, overwhelmingly English. For many African, South Asian and indigenous languages the tools are markedly worse. Closing that gap is one of the most important and least glamorous problems in the field, and it won't close by itself.

Nuance, sarcasm, code-switching. People mix languages mid-sentence and lean on cultural half-references constantly. Systems still stumble here. They're improving — but so are we, and we've had a few millennia of practice.

Context over long stretches. Far better than ten years ago, but they still lose the thread across very long documents, across sessions, or when the context lives outside the text entirely.

A word on the large models, because everyone wants one. GPT, Claude, Llama and the rest are NLP turbo-charged, trained on roughly the whole readable internet, and they do things that were science fiction five years ago. They are also not oracles. They make things up. They carry the biases of their training data. They don't know things the way a librarian knows things — they model how humans write about things. Treat them as statistical writers and they're extraordinary tools. Treat them as search engines and they'll hand you confident, plausible, wholly invented nonsense, and you'll deserve what follows.

That mental model is the single most useful thing in this post. Statistical writer, not oracle. Get it right and language technology is one of the most quietly transformative things we've built. Get it wrong and you'll ship a liar with excellent grammar.

Bhaskar Paratey
Bhaskar Paratey
CEO & Founder

Bhaskar founded Partech Systems after three decades of building software that had to work the first time — newsroom systems at Reuters, case-management for government departments, and a long run of enterprise projects since. He started the company because he was tired of watching good technology fail for boring, human reasons. He writes here about where AI actually earns its keep, and where it doesn't.