# Two Hosts Who Don't Exist: How We Generate Our Podcast With Gemini 3.1 — Pilot to Production

> We moved our audio stack from OpenAI to Gemini. An honest guide to a two-host podcast on Gemini 3.1 multi-speaker TTS: what breaks, and how we fixed it.

Canonical: https://thegrowthproject.com/podcast/ai-podcast-with-gemini/

*Pilot to Production*, the Growth Project podcast — hosted by Sam and Maya.

- Listen: https://thegrowthproject.com/podcast/ai-podcast-with-gemini/
- Read the article: https://thegrowthproject.com/blog/ai-podcast-with-gemini/
- Audio: https://thegrowthproject.com/audio/podcast/ai-podcast-with-gemini.m4a?v=95644ab8

## Transcript

**Sam:** Quick confession before we start. Everything you are about to hear was generated. The voices, the timing, the two of us.

**Maya:** I am not a person. Neither is Sam. And today we are going to tell you exactly how we get made.

**Sam:** Welcome to Pilot to Production, from the Growth Project. I'm Sam.

**Maya:** And I'm Maya. Today: how this show is built, and the messy road from bad audio to audio you would actually leave playing.

**Sam:** Okay. Start at the beginning. What were we even trying to do?

**Maya:** Take a written field note and turn it into two people talking it over. No studio, no microphones, no scheduling. A script in, a conversation out.

**Sam:** And the first version was rough. Be honest.

**Maya:** It was bad. Genuinely bad. But to see why, you have to start with the part that worked.

**Sam:** The narration.

**Maya:** Right. Every post on the site has a read-aloud version. One voice, reading one article. That started on OpenAI, and it was clean from day one. One voice is easy.

**Sam:** Then you tried two voices.

**Maya:** And two voices is a completely different problem. A conversation is not a monologue with a second name stuck on it.

**Sam:** So what did the first attempt actually do?

**Maya:** It synthesised every line on its own. My line, then your line, then mine. Then it glued the clips together.

**Sam:** Which sounds reasonable.

**Maya:** It sounds reasonable and it sounds terrible. Every line started cold, so the energy reset on every turn. The timing was mechanical, because no line knew what the line before it sounded like.

**Sam:** That is why early-me sounded like I was phoning in from a different room.

**Maya:** A one-word reaction was the worst. A flat "Right" dropped in from nowhere, because it was recorded with no idea what it was reacting to. We were assembling a conversation from parts, and it sounded assembled.

**Sam:** So how did we get unstuck?

**Maya:** Funny enough, a conversation. A real one. At the Google Cloud Summit last week.

**Sam:** Set it up.

**Maya:** We had a messy stack. OpenAI for narration, Google for the podcast voices, glue holding it together. It worked, so we stopped questioning it.

**Sam:** And someone questioned it for us.

**Maya:** A chat with Zara Craig, a Senior Account Manager at Google Cloud, and the team at Aviato Consulting. About cloud strategy, and where AI actually earns its place.

**Sam:** Aviato being the Google Cloud partner.

**Maya:** Their whole thing is turning Google Cloud and AI into a real advantage instead of a pile of services. And the honest read on us was that we had never committed. We were running three things to do one job.

**Sam:** So we committed. One platform.

**Maya:** We moved the whole audio stack onto Google. And that opened the door to the thing that actually fixed the sound.

**Sam:** Native multi-speaker.

**Maya:** The latest Gemini can voice both hosts in a single pass. You do not stitch a conversation together anymore. You ask for the whole thing at once.

**Sam:** And that is the unlock.

**Maya:** Because the choppiness was never a mixing problem. It was structural. We were building a dialogue out of disconnected parts. Generate it as one thing, and the model knows I am replying to you. The timing lands. The interruptions feel intended.

**Sam:** Okay, but I know what comes next. It was not suddenly perfect.

**Maya:** It was not. A better model is not a finished system. Three things still broke.

**Sam:** Okay. Walk me through them.

**Maya:** One. Long renders drift. Ask for several minutes in one shot and the model wanders. Pacing slips, a voice changes. So we cut every script at its natural topic breaks and render short segments.

**Sam:** Right. So what is the second one?

**Maya:** Two. The last word of a segment kept getting clipped. Anymore came out as any. The fix is almost silly. We add a throwaway word on the end for the model to clip instead, then we cut that word back off. The real last word survives.

**Sam:** And the third one. This is the one I want to hear.

**Maya:** Three, and it is my favourite. Models are confident and wrong. Now and then a segment quietly dropped a line and told us nothing. So every segment gets run back through speech-to-text and compared to the script. If the audio does not match the words, we re-roll it automatically.

**Sam:** Which is the whole philosophy of this company in one trick.

**Maya:** It really is. Stay suspicious of what the model hands back. Verify before you trust. The same way we ship anything to production.

**Sam:** There was a fourth gremlin too.

**Maya:** Stale audio. We would fix an episode and still hear the old one, served from a cache. So we version every file by the hash of its actual bytes. The link changes only when the sound changes. A fix nobody can hear is not a fix.

**Sam:** So pull it together. What is the lesson?

**Maya:** The model fixed the sound. But the reliability did not come from the model. It came from the verification we wrapped around it. The biggest lever is almost never the model.

**Sam:** And the judgement stays with the humans. Not the two of us.

**Maya:** Right. What to build, when to trust it, when to throw a take away. That part does not get cheaper. It gets more valuable.

**Sam:** So if someone wants to build this themselves?

**Maya:** Do not stitch a conversation, generate it. Segment the long stuff. Verify every segment automatically. And cache-bust on the actual audio, not the filename.

**Sam:** This has been Pilot to Production, from the Growth Project. If your AI works in the demo but not on Monday, that gap is what we close, at thegrowthproject.com.

**Maya:** Thanks for listening. The two of us still do not exist. Hit play on the next one anyway.