Verbal2025

Any healthcare executive you ask will (hopefully) say care quality is a top priority. They’ll likely point to a long list of protocols, accreditations, trainings and surveys to show their organization is doing all it can to ensure patient safety, drive positive outcomes and generally give patients the best possible experience.

But ask how they know that those rules are being followed, that those trainings are sticking, that that feedback is reliable…and the conversation may get uncomfortable.

The truth is, for most healthcare organizations, care quality is less a verified reality than an educated guess. Any responsible organization will have rules in place. But far fewer can actually point to evidence that adherence is consistent across providers and patient interactions.

Now AI is in exam rooms and on the other end of telehealth calls, documenting encounters and interacting directly with patients. And it’s scaling quickly. But still that gap between protocols and proof is massive.

Flying fast and blind isn’t a good idea in any field, much less when patient lives hang in the balance.

But if everyone agrees safety and quality are important, why can’t more organizations show they’re actually keeping a close eye on it?

The data on limited data

So how bad is the gap between assumption and reality? The numbers are stark.

Based on a Verbal survey, nearly 70% of healthcare organizations conduct quality assurance on patient interactions once per month or less. Nearly one in five have no formal QA program at all. And even those that are auditing interactions only review a fraction of them.

In the conversations I’ve had with dozens of leaders at major health systems, I’ve found that even in well-resourced organizations, manual quality and compliance auditing rarely goes beyond 5% coverage. That means 95% or more of what actually happens between providers and patients is simply never reviewed.

But it gets worse: Based on Verbal’s survey, 96% of QA reviews rely on clinical chart notes rather than the actual interaction (audio recording or transcript). So most organizations are: A) Only looking at a tiny fraction of interactions and B) Only measuring adherence by checking the clinician's self-reported account of the encounter.

In short: A big chunk of healthcare orgs aren’t even looking. And those that are can’t see much.

Imagine a manufacturing plant that checks one widget out of every thousand and uses that sample to certify the entire production line. In most industries, that would be considered reckless. In healthcare, it's standard practice.

Let me be clear, though: This doesn’t actually surprise me.

The traditional model of manual auditing was never built to scale. A single chart auditor reviewing 20 charts a day cannot keep pace with a platform facilitating tens of thousands of patient interactions a month. The math has never made sense.

AI and workforce shortages: A perfect storm

Most clinical and operational leaders are acutely aware that their QA coverage is inadequate. But the constraint has always felt structural. Not enough staff, not enough time, not enough budget to do more than a token sample.

Unfortunately the problem’s only getting worse.

The industry is facing a projected shortage of tens of thousands of nurses, with hospital RN turnover hovering around 16%. As experienced clinicians leave, institutional knowledge walks out the door with them.

New staff are placed in high-acuity roles without adequate mentorship, and studies show that even moderate turnover correlates with higher rates of missed care, from feeding patients to missed medications. One study even found nurses in high turnover environments missed more than a third of documentation tasks.

Now, partly in response to these labor shortages, AI has entered the picture, promising to reduce administrative burden, curb "pajama time," and expand access to care.

To be clear, AI scribes, documentation tools, and care agents are delivering real value. But these tools also introduce a new category of risk that most organizations are simply not yet equipped to manage.

For example, while AI scribes have been shown to have lower error rates than older speech-recognition dictation systems, they also come with unique failure modes. This includes AI hallucinations, critical omissions, misattributions and misinterpretations, many of which, as is practically unavoidable with large language models, appear plausible but have no basis in reality. In fact, one study found that nearly 40% of AI scribe hallucinations were harmful or clinically concerning in some way.

Phantom exams, undocumented diagnoses and other errors are being dropped straight into medical records, threatening organizational survival and, more importantly, patient safety.

And while AI tool vendors may promise “99% accuracy,” research has shown that AI models have a hard time self-correcting, with an average blind spot rate of nearly 65%.

In other words, models are grading their own homework, and they’re still failing.

It makes sense, then, that bodies like Coalition for Health AI (CHAI) and URAC are formalizing accreditation frameworks that explicitly reject vendor self-attestation.

Now think back to those QA numbers. Nearly 20% of organizations with no program whatsoever, and even well-resourced organizations leaving 95% of interactions unreviewed. And that’s only covering human-to-human interactions.

The scale promised by AI could easily turn validating healthcare quality from a serious problem into an urgent crisis.

A call for independent observability

High reliability organizations are preoccupied with failure. They treat near-misses as signals, not lucky saves.

But you can't manage what you can't see.

Clinical quality oversight needs to move from periodic, retrospective sampling to continuous, independent monitoring.

Not because regulators are demanding it (though increasingly they are) but because the alternative is healthcare leaders that cannot answer a fundamental question: Is the care we're delivering consistent with the standards we've set?

The tools to monitor care quality at scale — across human providers and AI agents alike — exist.

The question is: Will we treat closing the gap like a strategic priority? Or cross our fingers and wait until something goes wrong?

‍