How MassMutual and Mass General Brigham turned pilot AI growth into production results

Enterprise AI programs rarely fail because of bad ideas. More often than not, they get stuck in unmanaged pilot mode and never make it to production. At a recent VentureBeat event, technology leaders from MassMutual and Mass General Brigham explained how they avoided this trap—and what the results look like when discipline replaces growth.

At MassMutual, the results are concrete: 30% increase in developer productivity, IT help desk problem resolution times reduced from 11 minutes to one, and customer service calls reduced from 15 minutes to just one or two.

« We always start with why do we care about this problem? » Sears Merritt, MassMutual’s head of enterprise technology and experience, said at the event. « If we solve the problem, how will we know we’ve solved it? And what value is associated with that? »

Defining metrics, establishing strong feedback loops

MassMutual, a 175-year-old company serving millions of policyholders and customers, has put AI into production across the business—customer support, IT, customer acquisition, underwriting, service, claims and other areas.

Merritt said his team follows the scientific method, starting with a hypothesis and testing whether there is an outcome that will tangibly move the business forward. Some ideas are great, but may not be « business feasible » due to factors such as lack of data or access or regulatory constraints.

« We’re not going to move forward with an idea until we’re crystal clear on how we’re going to measure and define success. »

Ultimately, it’s up to different departments and leaders to define what quality means: Choose a metric and define the minimum level of quality before a tool is put into the hands of teams and partners.

This starting point creates a fast feedback loop. « The things we find that slow us down are where there’s no shared clarity about what outcome we’re trying to achieve, » which can lead to confusion and constant readjustment, Merritt said. « We don’t go into production until we have a business partner who says, ‘Yes, this works.’

His team is strategic about evaluating emerging tools and is « extremely rigorous » when testing and measuring what "good" means. For example, they perform confidence assessments to reduce hallucination levels, establish thresholds and assessment criteria, and monitor the deviation of characteristics and outcomes.

Merritt also operates with a no-commitment policy — meaning the company doesn’t lock itself into using a particular model. It has what he calls an « incredibly heterogeneous » technology environment, combining best-of-breed models alongside COBOL-powered mainframes. This flexibility is no accident. His team built common layers of services, microservices, and APIs that sit between the AI ​​layer and everything below it—so that when a better model comes along, replacing it doesn’t mean starting over.

Because, explained Merritt, « the best of breed today could be the worst of breed tomorrow, and we don’t want to be left behind. »

Weeding instead of letting thousands of flowers bloom

General Brigham (MGB), for his part, took a more spray-and-pray approach—at first.

About 15,000 researchers in the nonprofit health system have used AI, ML and deep learning over the past 10 to 15 years, CTO Nalan « Sri » Sriraman said at the same VB event.

But last year, he made a bold choice: His team shut down scattered unmanned AI pilots. At first, « we were following the thousand-flower blooming (methodology), but we didn’t have a thousand flowers, we probably had a few dozen flowers trying to bloom, » he said.

Like Merritt’s team at MassMutual, MGB took a more holistic view, exploring why they develop certain tools for specific workflow departments. They asked themselves what capabilities they wanted and needed and what investments they required.

Sriraman’s team also talked to their major platform vendors — Epic, Workday, ServiceNow, Microsoft — about their roadmaps. It was a « pivotal moment, » he noted, as they realized they were building internal tools that vendors already provided (or planned to release).

As Sriraman said, « Why are we building it ourselves? We’re already on the platform. It’s going to be in the workflow. Use it. »

However, the market is still in its infancy, which can lead to difficult decisions. « The analogy I’ll give is when you ask six blind men to touch an elephant and say what does that elephant look like? » Sriraman said. « You’ll get six different answers. »

There was nothing wrong with that, he noted; it’s just that everyone is discovering and experimenting as the landscape continues to change.

Instead of a wild west environment, Sriraman’s team is rolling out Microsoft Copilot to users across the business and using a « small landing zone » where they can safely test more complex products and control token usage.

They have also begun to “deliberately embed AI champions” in business groups. « It’s kind of like the opposite of letting a thousand flowers bloom, carefully planting and nurturing them, » Sriraman said.

Observability is another important consideration; he describes real-time dashboards that manage model drift and safety and allow IT teams to manage AI « a little more pragmatically. » Health monitoring is critical for AI systems, he noted, and his team has established principles and policies around AI use, not to mention least-privilege access.

In clinical settings, safeguards are absolute: AI systems never issue the final decision. "There will always be a physician or physician assistant in the loop to close the solution," Sriraman said. He cites the generation of radiology reports as one area where AI is heavily used, but where the radiologist always signs off.

Sriraman was clear: "Don’t: Don’t show PHI (protected health information) on Perplexity. So simple, right?"

And importantly, there should be safety mechanisms. « We need a big red button, kill it, » Sriraman emphasized. « We don’t put anything into the operational setup without that. »

Ultimately, while agent AI is a transformative technology, the corporate approach to it should not be dramatically different. « There is nothing new about it, » Sriraman said. « You can replace the word BPM (business process management) from the 1990s and 2000s with AI. The same concepts apply. »

Orchestration

#MassMutual #Mass #General #Brigham #turned #pilot #growth #production #results

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *