{"id":28450,"date":"2026-04-17T07:12:00","date_gmt":"2026-04-17T07:12:00","guid":{"rendered":"https:\/\/www.refrens.com\/grow\/?p=28450"},"modified":"2026-04-17T07:13:56","modified_gmt":"2026-04-17T07:13:56","slug":"building-ai-voice-calling-workflow-for-sales-qualification","status":"publish","type":"post","link":"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/","title":{"rendered":"How to Build a Production-Ready AI Voice Calling Workflow for Sales Qualification"},"content":{"rendered":"\n<p id=\"ember223\">Most AI voice demos look impressive for a few minutes.<\/p>\n\n\n\n<p id=\"ember224\">But building a voice calling workflow that actually works inside a sales process is a very different challenge. Once you move beyond the demo, you start dealing with real operational questions: who should get called, when should the call go out, what should happen if the user does not answer, how natural should the agent sound, how should the workflow connect to CRM and WhatsApp, and how do you make the whole system usable enough for a sales team to rely on it?<\/p>\n\n\n\n<p id=\"ember225\">This article is about how we approached that problem at Refrens.<\/p>\n\n\n\n<p id=\"ember226\">For context, Refrens is a B2B SaaS platform serving 150k+ businesses across 170+ countries for invoicing, accounting, payments, compliance, sales, inventory, and other core business workflows.<\/p>\n\n\n\n<p id=\"ember227\">Every month, tens of thousands of new users sign up to the platform. But our sales team cannot speak to all of them. So, like most growing teams, we have to prioritize. The stronger-looking opportunities get attention first, while many others remain untouched. The challenge is that some of those users could still be a strong fit for Refrens, but we had no scalable way to qualify them early and route the right ones back to sales.<\/p>\n\n\n\n<p id=\"ember228\">That is what led us to build an AI voice calling workflow for sales lead qualification.<\/p>\n\n\n\n<p id=\"ember229\">This article is a practical walkthrough of what it took to make the system work in a real environment &#8211;&nbsp; from narrowing the use case and choosing the stack, to designing the voice, shaping the conversation, handling retries, integrating WhatsApp and CRM, and fixing the issues that only show up once real users start answering.<\/p>\n\n\n\n<p id=\"ember230\">If you are trying to build a production-ready AI voice workflow for qualification, this is the part that matters: not just what tools we used, but how the workflow was designed, where it broke, and what made it usable in practice.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_62 ez-toc-wrap-center counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title \" >Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #161c26;color:#161c26\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #161c26;color:#161c26\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#At_a_glance\" title=\"At a glance\">At a glance<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#1_The_problem_we_were_trying_to_solve\" title=\"1. The problem we were trying to solve\">1. The problem we were trying to solve<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#2_Where_we_started\" title=\"2. Where we started\">2. Where we started<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#21_Why_voice_calling_made_sense_for_this_problem\" title=\"2.1) Why voice calling made sense for this problem\">2.1) Why voice calling made sense for this problem<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#22_Start_narrow_not_wide\" title=\"2.2) Start narrow, not wide\">2.2) Start narrow, not wide<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#23_What_the_agent_needed_to_learn\" title=\"2.3) What the agent needed to learn\">2.3) What the agent needed to learn<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#24_Benchmark_first_then_choose_tools\" title=\"2.4) Benchmark first, then choose tools\">2.4) Benchmark first, then choose tools<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#3_Figuring_out_the_stack\" title=\"3. Figuring out the stack\">3. Figuring out the stack<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#31_Why_we_chose_VideoSDK\" title=\"3.1) Why we chose VideoSDK\">3.1) Why we chose VideoSDK<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#32_Why_we_chose_Gemini\" title=\"3.2) Why we chose Gemini\">3.2) Why we chose Gemini<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#33_Why_we_chose_Elision\" title=\"3.3) Why we chose Elision\">3.3) Why we chose Elision<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#34_Why_we_chose_AiSensy\" title=\"3.4) Why we chose AiSensy\">3.4) Why we chose AiSensy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#The_final_stack\" title=\"The final stack\">The final stack<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#4_Designing_the_agent_experience\" title=\"4. Designing the agent experience\">4. Designing the agent experience<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#41_Finding_the_right_voice\" title=\"4.1) Finding the right voice\">4.1) Finding the right voice<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_1_We_started_with_a_robotic_voice_%E2%80%93_and_it_failed_quickly\" title=\"Step 1: We started with a robotic voice &#8211;&nbsp; and it failed quickly\">Step 1: We started with a robotic voice &#8211;&nbsp; and it failed quickly<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_2_Once_we_knew_it_had_to_sound_human_we_had_to_decide_what_kind_of_human_voice_fit_the_workflow\" title=\"Step 2: Once we knew it had to sound human, we had to decide what kind of human voice fit the workflow\">Step 2: Once we knew it had to sound human, we had to decide what kind of human voice fit the workflow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_3_We_needed_a_clearer_benchmark_for_what_%E2%80%9Cgood%E2%80%9D_should_sound_like\" title=\"Step 3: We needed a clearer benchmark for what \u201cgood\u201d should sound like\">Step 3: We needed a clearer benchmark for what \u201cgood\u201d should sound like<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_4_Once_the_voice_direction_was_clear_we_also_thought_about_identity\" title=\"Step 4: Once the voice direction was clear, we also thought about identity\">Step 4: Once the voice direction was clear, we also thought about identity<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_5_We_then_built_the_final_voice_from_an_internal_sample\" title=\"Step 5: We then built the final voice from an internal sample\">Step 5: We then built the final voice from an internal sample<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#42_Language_and_script_choices\" title=\"4.2) Language and script choices\">4.2) Language and script choices<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_1_We_avoided_overly_formal_Hindi\" title=\"Step 1: We avoided overly formal Hindi\">Step 1: We avoided overly formal Hindi<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_2_We_also_chose_not_to_reveal_upfront_that_it_was_an_AI_call\" title=\"Step 2: We also chose not to reveal upfront that it was an AI call\">Step 2: We also chose not to reveal upfront that it was an AI call<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Step_3_We_learned_that_open-ended_questions_made_the_workflow_worse\" title=\"Step 3: We learned that open-ended questions made the workflow worse\">Step 3: We learned that open-ended questions made the workflow worse<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#43_Why_we_kept_the_agent_static\" title=\"4.3) Why we kept the agent static\">4.3) Why we kept the agent static<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#44_Context_inside_the_call\" title=\"4.4) Context inside the call\">4.4) Context inside the call<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#5_Setting_up_the_workflow\" title=\"5. Setting up the workflow\">5. Setting up the workflow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#From_agent_to_operating_workflow\" title=\"From agent to operating workflow\">From agent to operating workflow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Guardrails_and_timing\" title=\"Guardrails and timing\">Guardrails and timing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#The_actual_sequence\" title=\"The actual sequence\">The actual sequence<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-31\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#The_overall_flow\" title=\"The overall flow\">The overall flow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-32\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#The_retry_sequence\" title=\"The retry sequence\">The retry sequence<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-33\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#Why_WhatsApp_mattered_in_the_workflow\" title=\"Why WhatsApp mattered in the workflow\">Why WhatsApp mattered in the workflow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-34\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#What_happened_after_the_call\" title=\"What happened after the call\">What happened after the call<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-35\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#How_the_live_voice_workflow_worked\" title=\"How the live voice workflow worked\">How the live voice workflow worked<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-36\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#6_What_broke_and_how_we_fixed_it\" title=\"6. What broke, and how we fixed it\">6. What broke, and how we fixed it<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-37\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#61_Answer_rate_and_spam_trust\" title=\"6.1) Answer rate and spam trust\">6.1) Answer rate and spam trust<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-38\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#62_Changing_the_call_opening\" title=\"6.2) Changing the call opening\">6.2) Changing the call opening<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-39\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#63_Fixing_pronunciation_issues\" title=\"6.3) Fixing pronunciation issues\">6.3) Fixing pronunciation issues<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-40\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#64_Reducing_latency_through_script_changes\" title=\"6.4) Reducing latency through script changes\">6.4) Reducing latency through script changes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-41\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#65_Handling_silence_during_delay\" title=\"6.5) Handling silence during delay\">6.5) Handling silence during delay<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-42\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#66_Detecting_IVR_and_bot_responses\" title=\"6.6) Detecting IVR and bot responses\">6.6) Detecting IVR and bot responses<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-43\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#67_Controlling_rollout_by_language\" title=\"6.7) Controlling rollout by language\">6.7) Controlling rollout by language<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-44\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#7_Cost_value_and_efficiency\" title=\"7. Cost, value, and efficiency\">7. Cost, value, and efficiency<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-45\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#8_What_this_opens_up_next\" title=\"8. What this opens up next\">8. What this opens up next<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-46\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/#9_Endnotes\" title=\"9. Endnotes\">9. Endnotes<\/a><\/li><\/ul><\/nav><\/div>\n<h3 id=\"ember231\"><span class=\"ez-toc-section\" id=\"At_a_glance\"><\/span>At a glance<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<ul><li><strong>The problem:<\/strong> A large pool of supposedly non-priority users was not being actively attended to, even though some of them could still convert.<\/li><li><strong>The approach:<\/strong> Build an AI voice workflow focused on first-level sales qualification.<\/li><li><strong>The outcome:<\/strong> A system that could send a pre-call WhatsApp message, place an AI call, gather qualification signals, summarize the interaction, classify the user, and route the right opportunities back to sales.<\/li><li><strong>The biggest lesson:<\/strong> Success in voice AI is not just about the model. It depends on the use case, conversation design, telephony setup, latency, trust, language fit, and how tightly the workflow connects back into the Sales CRM.<\/li><\/ul>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember233\"><span class=\"ez-toc-section\" id=\"1_The_problem_we_were_trying_to_solve\"><\/span>1. The problem we were trying to solve<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember234\"><em>We were not trying to solve for more leads. We were trying to solve for better coverage.<\/em><\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-4640b62e wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Solving-for-the-blind-spot-1024x901.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Solving-for-the-blind-spot.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Solving-for-the-blind-spot.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Solving-for-the-blind-spot-1024x901.webp\" alt=\"\" class=\"uag-image-28457\" width=\"514\" height=\"452\" title=\"Solving for the blind spot\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<p id=\"ember236\">Because we receive a high volume of inbound users, our sales team has to prioritize where it spends time.<\/p>\n\n\n\n<p id=\"ember237\">So we built an internal system where users were grouped into two broad buckets:<\/p>\n\n\n\n<ul><li><strong>Priority users<\/strong><\/li><li><strong>Non-priority users<\/strong><\/li><\/ul>\n\n\n\n<p id=\"ember239\">This classification was based on the signals available to us from user data, activity, and the information submitted while creating a business on our platform.<\/p>\n\n\n\n<p id=\"ember240\">Priority users were actively worked on by sales.<\/p>\n\n\n\n<p id=\"ember241\">Non-priority users, on the other hand, were usually left untouched unless they later showed stronger intent through product usage or other behavioral signals.<\/p>\n\n\n\n<p id=\"ember242\">That helped us protect sales bandwidth, but it also created a blind spot.<\/p>\n\n\n\n<p id=\"ember243\">Some of those non-priority users may not have looked valuable at first glance, yet they could still have real business intent and conversion potential. The issue was not that these users were irrelevant. The issue was that we had no scalable way to speak to them early and determine which ones deserved human follow-up.<\/p>\n\n\n\n<p id=\"ember244\"><strong>The real problem was qualification at scale.<\/strong><\/p>\n\n\n\n<p id=\"ember245\">We needed a system that could:<\/p>\n\n\n\n<ul><li>engage this ignored pool<\/li><li>collect the right signals<\/li><li>identify which users were worth routing back to a salesperson<\/li><\/ul>\n\n\n\n<p id=\"ember247\">That became the starting point for our AI voice calling workflow.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember248\"><span class=\"ez-toc-section\" id=\"2_Where_we_started\"><\/span>2. Where we started<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember249\"><em>Before choosing tools, we first had to decide what the system was actually supposed to do.<\/em><\/p>\n\n\n\n<h3 id=\"ember250\"><span class=\"ez-toc-section\" id=\"21_Why_voice_calling_made_sense_for_this_problem\"><\/span>2.1) Why voice calling made sense for this problem<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember251\">One obvious question for us was:<\/p>\n\n\n\n<p id=\"ember252\">Why voice calling at all? Why not just rely on product messaging, email, or WhatsApp?<\/p>\n\n\n\n<p id=\"ember253\">The answer was not that those channels were useless or that voice was somehow the easiest option. In fact, lighter-touch channels are much easier to implement.<\/p>\n\n\n\n<p id=\"ember254\">But for the users we serve at core (India SMEs), these channels were not enough on their own.<\/p>\n\n\n\n<p id=\"ember255\">In this segment, trust often has to be built through conversation. Many users may ignore emails, skim through product messaging, or not respond meaningfully on WhatsApp. A phone call creates a very different kind of interaction. It gives you a direct way to establish contact, understand intent, and build trust in real time.<\/p>\n\n\n\n<p id=\"ember256\">That is why voice calling made sense for this problem. We were trying to qualify users who were not being meaningfully engaged through lighter-touch channels alone, and calling gave us a better way to do that.<\/p>\n\n\n\n<h3 id=\"ember257\"><span class=\"ez-toc-section\" id=\"22_Start_narrow_not_wide\"><\/span>2.2) Start narrow, not wide<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember258\">Voice AI can be used for many different functions:<\/p>\n\n\n\n<ul><li>onboarding<\/li><li>renewals<\/li><li>collections<\/li><li>support<\/li><li>qualification<\/li><\/ul>\n\n\n\n<p id=\"ember260\">That was also exactly the trap we wanted to avoid.<\/p>\n\n\n\n<p id=\"ember261\">If we tried to make the first version do too much, it would become:<\/p>\n\n\n\n<ul><li>harder to test<\/li><li>harder to debug<\/li><li>harder to judge properly<\/li><\/ul>\n\n\n\n<p id=\"ember263\">So instead of trying to build a general-purpose calling agent, we focused only on <strong>first-level qualification for non-priority users whom our sales team was not actively calling<\/strong>.<\/p>\n\n\n\n<p id=\"ember264\">That gave the project a clear job.<\/p>\n\n\n\n<p id=\"ember265\">The agent did <strong>not<\/strong> need to:<\/p>\n\n\n\n<ul><li>sell<\/li><li>explain the whole product<\/li><li>replace a salesperson<\/li><\/ul>\n\n\n\n<p id=\"ember267\">It only needed to collect enough signal to help us decide whether a user deserved a stronger follow-up from sales.<\/p>\n\n\n\n<h3 id=\"ember268\"><span class=\"ez-toc-section\" id=\"23_What_the_agent_needed_to_learn\"><\/span>2.3) What the agent needed to learn<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember269\">These questions mattered because they helped us make a first-level qualification judgment.<\/p>\n\n\n\n<p id=\"ember270\">We wanted to understand:<\/p>\n\n\n\n<ul><li>Whether the user looked like a meaningful fit for Refrens<\/li><li>How strong their intent seemed<\/li><li>How relevant our product was likely to be for their business<\/li><li>How much priority should they receive from sales<\/li><\/ul>\n\n\n\n<p id=\"ember272\">In simple terms, these inputs helped us estimate business fit, likely product usage, and the chances that the user could move toward a premium subscription.<\/p>\n\n\n\n<p id=\"ember273\">So the kinds of questions we asked were:<\/p>\n\n\n\n<ul><li>What type of business do they run<\/li><li>What they are looking for<\/li><li>How old their business is<\/li><\/ul>\n\n\n\n<p id=\"ember275\">Those questions gave us enough signal to make a first-level qualification judgment. If a user matched the criteria we cared about, the next step could move to a salesperson. If not, we still learned something useful without spending human bandwidth on every conversation.<\/p>\n\n\n\n<h3 id=\"ember276\"><span class=\"ez-toc-section\" id=\"24_Benchmark_first_then_choose_tools\"><\/span>2.4) Benchmark first, then choose tools<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember277\">Before choosing the stack, we wanted a benchmark for what a good AI voice actually looked like in practice.<\/p>\n\n\n\n<p id=\"ember278\">So we tested other AI voice agents (Boardy.ai, for example) in the market and looked closely at how natural it should be, what the optimal latency should be, and what the tone of voice should be.<\/p>\n\n\n\n<p id=\"ember279\">This gave us a more realistic standard for future vendor evaluation.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember280\"><span class=\"ez-toc-section\" id=\"3_Figuring_out_the_stack\"><\/span>3. Figuring out the stack<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember281\"><em>Once the use case was clear, the next question was not \u201cWhat is the fanciest stack?\u201d It was \u201cWhat stack actually works in practice in terms of cost, quality, and reliability?\u201d<\/em><\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-00d9399c wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Our-Core-Tech-Stack-for-AI-Voice-Calling-Workflow-.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Our-Core-Tech-Stack-for-AI-Voice-Calling-Workflow-.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Our-Core-Tech-Stack-for-AI-Voice-Calling-Workflow-.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Our-Core-Tech-Stack-for-AI-Voice-Calling-Workflow-.webp\" alt=\"\" class=\"uag-image-28451\" width=\"576\" height=\"554\" title=\"Our Core Tech Stack for AI Voice Calling Workflow\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<h3 id=\"ember283\"><span class=\"ez-toc-section\" id=\"31_Why_we_chose_VideoSDK\"><\/span>3.1) Why we chose VideoSDK<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember284\">For a workflow like this, we needed more than just a voice agent. We needed infrastructure that could support the full system behind it &#8211; real-time voice interaction, telephony connectivity, and the complete loop of speech input, model processing, and voice output working together smoothly.<\/p>\n\n\n\n<p id=\"ember285\">Some of the options we explored felt more enterprise-focused. At our stage, we were not sure they would give us the flexibility or support we needed to build this.<\/p>\n\n\n\n<p id=\"ember286\"><a href=\"https:\/\/www.linkedin.com\/company\/video-sdk\/\" target=\"_blank\" rel=\"noopener\">videosdk.live<\/a> fit that need better. Their platform was built around low-latency real-time communication and AI agents, with support for telephony flows, modular STT\u2013LLM\u2013TTS pipelines, agent runtimes, tracing, observability, deployment options, and self-hosting.<\/p>\n\n\n\n<p id=\"ember287\">In simple terms, they had the infrastructure we needed to run this workflow end-to-end. Plus, their team felt more supportive and easier to work with than the more enterprise-focused options we had looked at.<\/p>\n\n\n\n<h3 id=\"ember288\"><span class=\"ez-toc-section\" id=\"32_Why_we_chose_Gemini\"><\/span>3.2) Why we chose Gemini<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember289\">Inside VideoSDK, we had the flexibility to test different foundation models for the voice workflow. After testing, we chose Gemini.<\/p>\n\n\n\n<p id=\"ember290\">The biggest factor here was<strong> language performance<\/strong>. Since our users speak Hindi and the quality of spoken interaction mattered a lot, we tested multiple models on Hindi voice conversations. Gemini performed better for this use case, especially in how naturally and clearly it handled spoken Hindi.<\/p>\n\n\n\n<p id=\"ember291\">The other important factors were <strong>reliability<\/strong> and <strong>cost<\/strong>. Gemini gave us more confidence on the reliability side, partly because it comes from Google, and it also offered a better quality-to-cost balance than the other options we tested.<\/p>\n\n\n\n<h3 id=\"ember292\"><span class=\"ez-toc-section\" id=\"33_Why_we_chose_Elision\"><\/span>3.3) Why we chose Elision<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember293\">We evaluated different telephony vendors, but chose <a href=\"https:\/\/www.linkedin.com\/company\/elision-technolab-llp\/\" target=\"_blank\" rel=\"noopener\">Elision Technologies Pvt. Ltd<\/a> because of two things:<\/p>\n\n\n\n<p id=\"ember294\">1) a pay-as-you-go model, 2) no minimum commitment<\/p>\n\n\n\n<p id=\"ember295\">This gave us the flexibility we needed while we were still testing and refining the workflow, and made the rollout more cost-effective.<\/p>\n\n\n\n<p id=\"ember296\">With Elision, we got:<\/p>\n\n\n\n<ul><li>2 numbers<\/li><li>5 channels per number<\/li><li>10 total channels<\/li><\/ul>\n\n\n\n<p id=\"ember298\">That meant we could run up to <strong>10 concurrent calls<\/strong>.<\/p>\n\n\n\n<p id=\"ember299\">Without channels, one number would only support one active call at a time. With channels, we could do multiple calls in parallel from the same number. That gave us enough throughput to test and operate meaningfully without overcomplicating the setup.<\/p>\n\n\n\n<h3 id=\"ember300\"><span class=\"ez-toc-section\" id=\"34_Why_we_chose_AiSensy\"><\/span>3.4) Why we chose AiSensy<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember301\"><a href=\"https:\/\/www.linkedin.com\/company\/aisensyofficial\/\" target=\"_blank\" rel=\"noopener\">AiSensy<\/a> was the natural choice for us because it was already deeply integrated into our workflow. We had been using it for WhatsApp-led communication and automation from the beginning, so for this use case, we did not need a new system &#8211; we just needed to use their APIs to trigger the right messages at the right points in the journey.<\/p>\n\n\n\n<p id=\"ember302\">It also helped that the platform was already proven at scale for us. We were sending thousands of messages every day through AiSensy, so we already knew it could support the volume and reliability this workflow needed.<\/p>\n\n\n\n<h3 id=\"ember303\"><span class=\"ez-toc-section\" id=\"The_final_stack\"><\/span>The final stack<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<ul><li><strong>Claude<\/strong> for persona, tone, and script design<\/li><li><strong>VideoSDK<\/strong> for real-time voice agent layer (Inside VideoSDK, they used <strong>Deepgram<\/strong> for speech-to-text, <strong>Google Gemini<\/strong> for language processing, and <strong>Cartesia<\/strong> for text-to-speech)<\/li><li><strong>Elision<\/strong> for telephony<\/li><li><strong>AiSensy<\/strong> for&nbsp; WhatsApp automation<\/li><li><strong>Our Internal sales CRM<\/strong> for tagging, classification, and sales handoff<\/li><\/ul>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember305\"><span class=\"ez-toc-section\" id=\"4_Designing_the_agent_experience\"><\/span>4. Designing the agent experience<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember306\"><em>This was the point where the project stopped being just a stack and became an actual interaction.<\/em><\/p>\n\n\n\n<h3 id=\"ember307\"><span class=\"ez-toc-section\" id=\"41_Finding_the_right_voice\"><\/span>4.1) Finding the right voice<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember308\">Getting the voice right took a few iterations.<\/p>\n\n\n\n<h3 id=\"ember309\"><span class=\"ez-toc-section\" id=\"Step_1_We_started_with_a_robotic_voice_%E2%80%93_and_it_failed_quickly\"><\/span>Step 1: We started with a robotic voice &#8211;&nbsp; and it failed quickly<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember310\">In the beginning, we tried a more robotic voice because it was the easiest place to start.<\/p>\n\n\n\n<p id=\"ember311\">But it did not work.<\/p>\n\n\n\n<p id=\"ember312\">The call started feeling artificial within the first few seconds, almost like an IVR had suddenly turned conversational. That hurt trust immediately. It made one thing clear very early: if this workflow had to work in the real world, the voice could not sound synthetic. It had to feel human, polished, and easy to listen to.<\/p>\n\n\n\n<h3 id=\"ember313\"><span class=\"ez-toc-section\" id=\"Step_2_Once_we_knew_it_had_to_sound_human_we_had_to_decide_what_kind_of_human_voice_fit_the_workflow\"><\/span>Step 2: Once we knew it had to sound human, we had to decide what kind of human voice fit the workflow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember314\">After moving away from the robotic version, the next question was whether the agent should sound male or female.<\/p>\n\n\n\n<p id=\"ember315\">In our testing, the female voice worked better for this workflow. Since a large part of the audience consisted of male business owners, we found that they were more likely to respond patiently and respectfully to a female voice in the opening few seconds. That helped the call feel smoother at the start and improved the chances of keeping the user engaged.<\/p>\n\n\n\n<h3 id=\"ember316\"><span class=\"ez-toc-section\" id=\"Step_3_We_needed_a_clearer_benchmark_for_what_%E2%80%9Cgood%E2%80%9D_should_sound_like\"><\/span>Step 3: We needed a clearer benchmark for what \u201cgood\u201d should sound like<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember317\">Even after deciding on the general direction, we still needed a better standard than just \u201cmake it sound natural.\u201d<\/p>\n\n\n\n<p id=\"ember318\">So we created a clearer internal benchmark.<\/p>\n\n\n\n<p id=\"ember319\">We did not want the call to sound like a machine. We also did not want it to sound like a casual caller. Internally, the benchmark became: the interaction should feel more like a polished front-desk conversation &#8211; <strong>almost as if a well-trained receptionist from a place like the Taj was calling.<\/strong><\/p>\n\n\n\n<p id=\"ember320\">That helped define the tone much more clearly. The voice needed to feel warm, professional, attentive, and well-composed.<\/p>\n\n\n\n<h3 id=\"ember321\"><span class=\"ez-toc-section\" id=\"Step_4_Once_the_voice_direction_was_clear_we_also_thought_about_identity\"><\/span>Step 4: Once the voice direction was clear, we also thought about identity<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember322\">After deciding the voice should feel more human and more polished, we also thought about the identity of the agent.<\/p>\n\n\n\n<p id=\"ember323\">Since users came from many different states and language backgrounds, we wanted a name that would feel familiar, simple, and easy to follow across contexts.<\/p>\n\n\n\n<p id=\"ember324\">That is why we chose the name <strong>Aditi<\/strong>.<\/p>\n\n\n\n<h3 id=\"ember325\"><span class=\"ez-toc-section\" id=\"Step_5_We_then_built_the_final_voice_from_an_internal_sample\"><\/span>Step 5: We then built the final voice from an internal sample<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember326\">Once the direction was clear, we used the voice of an internal team member as the base.<\/p>\n\n\n\n<p id=\"ember327\">We recorded a clean sample in a quiet environment, made sure there was no echo or background noise, and then cloned that voice for the agent.<\/p>\n\n\n\n<p id=\"ember328\">That gave us much more control over the final output and helped us create a voice that felt much closer to the experience we actually wanted the call to deliver.<\/p>\n\n\n\n<h3 id=\"ember329\"><span class=\"ez-toc-section\" id=\"42_Language_and_script_choices\"><\/span>4.2) Language and script choices<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember330\">Once the voice started moving in the right direction, the next challenge was the script.<\/p>\n\n\n\n<h3 id=\"ember331\"><span class=\"ez-toc-section\" id=\"Step_1_We_avoided_overly_formal_Hindi\"><\/span>Step 1: We avoided overly formal Hindi<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember332\">We decided not to lean too heavily on formal Hindi because it often sounded unnatural in a live call.<\/p>\n\n\n\n<p id=\"ember333\">For example, a word like <strong>\u201c\u0938\u0941\u0928\u093f\u0936\u094d\u091a\u093f\u0924\u201d<\/strong> may be correct Hindi, but it does not sound like how most people naturally speak in a business conversation. On a real call, language like that can make the agent sound stiff, scripted, or overly formal.<\/p>\n\n\n\n<p id=\"ember334\">So instead of forcing pure Hindi, we moved toward a more natural conversational style closer to <strong>Hinglish<\/strong>, while still keeping the option to switch into English when the user preferred it.<\/p>\n\n\n\n<h3 id=\"ember335\"><span class=\"ez-toc-section\" id=\"Step_2_We_also_chose_not_to_reveal_upfront_that_it_was_an_AI_call\"><\/span>Step 2: We also chose not to reveal upfront that it was an AI call<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember336\">Another important scripting decision was that we did not want the call to begin by announcing that this was an AI interaction.<\/p>\n\n\n\n<p id=\"ember337\">Since voice AI adoption is still at an early stage, leading with that could make users drop off, dismiss the interaction, or stop taking it seriously. In some cases, they might even start treating the call as a novelty instead of an actual business interaction.<\/p>\n\n\n\n<p id=\"ember338\">So our goal was to make the conversation feel natural first. If the user directly asked whether it was an AI call, the agent would confirm it honestly. But we did not want that to be the first thing they heard.<\/p>\n\n\n\n<h3 id=\"ember339\"><span class=\"ez-toc-section\" id=\"Step_3_We_learned_that_open-ended_questions_made_the_workflow_worse\"><\/span>Step 3: We learned that open-ended questions made the workflow worse<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember340\">This was also where we learned that conversation design affects not just quality, but performance too.<\/p>\n\n\n\n<p id=\"ember341\">In the early versions, some of our questions were too open-ended. That led to:<\/p>\n\n\n\n<ul><li>longer answers<\/li><li>more ambiguity<\/li><li>more processing load on the model<\/li><\/ul>\n\n\n\n<p id=\"ember343\">For example, asking:<\/p>\n\n\n\n<p id=\"ember344\"><em>\u201cHow many invoices do you create?\u201d<\/em><\/p>\n\n\n\n<p id=\"ember345\">often led to wandering or unclear answers.<\/p>\n\n\n\n<p id=\"ember346\">It worked much better to ask something more structured, like:<\/p>\n\n\n\n<p id=\"ember347\"><em>\u201cHow are you managing your invoicing today? On software, Excel, or pen and paper?\u201d<\/em><\/p>\n\n\n\n<p id=\"ember348\">That gave us a more useful signal while also reducing ambiguity and helping the conversation move faster.<\/p>\n\n\n\n<h3 id=\"ember349\"><span class=\"ez-toc-section\" id=\"43_Why_we_kept_the_agent_static\"><\/span>4.3) Why we kept the agent static<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember350\">Another important design decision was to keep the agent static, not dynamic.<\/p>\n\n\n\n<p id=\"ember351\">A dynamic agent might handle multiple use cases, but it also becomes much harder to test and debug. When something breaks, it becomes difficult to know whether the problem came from:<\/p>\n\n\n\n<ul><li>the script<\/li><li>the branching logic<\/li><li>use-case overlap<\/li><li>the model itself<\/li><\/ul>\n\n\n\n<p id=\"ember353\">A static agent is easier to evaluate because it is built for one job only.<\/p>\n\n\n\n<p id=\"ember354\">In our case, that one job was sales qualification.<\/p>\n\n\n\n<p id=\"ember355\">That decision helped us keep the workflow easier to test, easier to refine, and easier to judge.<\/p>\n\n\n\n<h3 id=\"ember356\"><span class=\"ez-toc-section\" id=\"44_Context_inside_the_call\"><\/span>4.4) Context inside the call<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember357\">Finally, we made sure the agent had enough context to avoid sounding generic.<\/p>\n\n\n\n<p id=\"ember358\">At the minimum, that meant passing in:<\/p>\n\n\n\n<ul><li>the user\u2019s name<\/li><li>the organization name<\/li><\/ul>\n\n\n\n<p id=\"ember360\">These may seem like small details, but they helped the interaction feel more grounded and less robotic.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-b3f5c80a wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sample-Claude-Prompt-for-Agent-Persona-Design-Script-Tone-Pronounciation-Language--1024x566.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sample-Claude-Prompt-for-Agent-Persona-Design-Script-Tone-Pronounciation-Language-.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sample-Claude-Prompt-for-Agent-Persona-Design-Script-Tone-Pronounciation-Language-.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sample-Claude-Prompt-for-Agent-Persona-Design-Script-Tone-Pronounciation-Language--1024x566.webp\" alt=\"\" class=\"uag-image-28452\" width=\"704\" height=\"372\" title=\"Sample Claude Prompt for Agent Persona Design (Script, Tone, Pronounciation, Language)\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<p id=\"ember362\">By this point, we were not just writing a script or choosing a voice. We were designing a business conversation that had to be:<\/p>\n\n\n\n<ul><li><strong>Measurable <\/strong>(answer rate, call duration, qualification signals)<\/li><li><strong>Efficient<\/strong> (low latency, controlled cost)<\/li><li><strong>Scalable <\/strong>(reusable personas, SOPs)<\/li><\/ul>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 id=\"ember364\"><span class=\"ez-toc-section\" id=\"5_Setting_up_the_workflow\"><\/span>5. Setting up the workflow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember365\"><em>Once the agent started feeling usable, the next question was: how should the system actually run?<\/em><\/p>\n\n\n\n<h3 id=\"ember366\"><span class=\"ez-toc-section\" id=\"From_agent_to_operating_workflow\"><\/span>From agent to operating workflow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember367\">We integrated the voice agent with our internal sales CRM. This turned the setup from a standalone agent into a working sales workflow.<\/p>\n\n\n\n<p id=\"ember368\">From there, we had to decide:<\/p>\n\n\n\n<ul><li>when a user becomes eligible for a call<\/li><li>how long to wait before the first call<\/li><li>what WhatsApp message should go before the call<\/li><li>what should happen if the user does not answer<\/li><li>what summary should come back after the call<\/li><li>how the user should be classified<\/li><li>how sales should see the outcome<\/li><\/ul>\n\n\n\n<h3 id=\"ember370\"><span class=\"ez-toc-section\" id=\"Guardrails_and_timing\"><\/span>Guardrails and timing<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember371\">We also added office-hour rules. Calls should go out only between <strong>9 AM and 9 PM<\/strong>.<\/p>\n\n\n\n<p id=\"ember372\">User created outside that window would enter a queue, and the calling sequence would start the following morning from 9 AM.<\/p>\n\n\n\n<p id=\"ember373\">That mattered because automation like this only works if it respects basic human expectations.<\/p>\n\n\n\n<h3 id=\"ember374\"><span class=\"ez-toc-section\" id=\"The_actual_sequence\"><\/span>The actual sequence<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember375\">We did not want the call to feel completely cold, so before the first call, we introduced a WhatsApp touchpoint through AiSensy.<\/p>\n\n\n\n<p id=\"ember376\">The sequence worked like this:<\/p>\n\n\n\n<ol><li><strong>User is created<\/strong><\/li><li><strong>Within the first 30 seconds, a WhatsApp message is sent<\/strong><\/li><li><strong>At the 30-second mark, the AI call is triggered<\/strong><\/li><\/ol>\n\n\n\n<p id=\"ember378\">That made the outreach feel more connected. Instead of the user receiving an unexplained call from an unknown number, there was already a lightweight message touchpoint in place.<\/p>\n\n\n\n<h3 id=\"ember379\"><span class=\"ez-toc-section\" id=\"The_overall_flow\"><\/span>The overall flow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-fb19d20a wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sales-Qualification-Workflow-.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sales-Qualification-Workflow-.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sales-Qualification-Workflow-.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Sales-Qualification-Workflow-.webp\" alt=\"\" class=\"uag-image-28453\" width=\"501\" height=\"604\" title=\"Sales Qualification Workflow\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<p id=\"ember381\">Once a user was created in our CRM, the outreach sequence began automatically:<\/p>\n\n\n\n<ul><li>A WhatsApp message was sent first through <strong>AiSensy<\/strong><\/li><li>The call was then triggered through the outbound calling flow<\/li><li>The call was routed through <strong>Elision<\/strong> to the user\u2019s phone<\/li><li>When the user answered, the call was connected to the live agent environment<\/li><li>The AI agent handled the conversation in real time<\/li><li>After the interaction, the outcome was written back into the CRM<\/li><li>Follow-up actions were triggered across sales, WhatsApp, and internal routing systems<\/li><\/ul>\n\n\n\n<h3 id=\"ember383\"><span class=\"ez-toc-section\" id=\"The_retry_sequence\"><\/span>The retry sequence<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember384\">If the user did not answer, the system would retry multiple times over the next few hours.<\/p>\n\n\n\n<p id=\"ember385\">But the retries were not just call attempts. It was a mix of WhatsApp, email, and calls as part of the sequence:<\/p>\n\n\n\n<ul><li>pre-call WhatsApp within 30 seconds of user creation<\/li><li>first AI call at the 30-second mark<\/li><li>if not answered, send a contextual WhatsApp follow-up<\/li><li>retry after 5 minutes<\/li><li>retry after 15 minutes<\/li><li>retry after 30 minutes<\/li><li>retry after 60 minutes<\/li><\/ul>\n\n\n\n<p id=\"ember387\">This helped us maximize answer rates without involving manual follow-up from sales.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-235cb783 wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Re-try-and-Follow-up-Sequence.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Re-try-and-Follow-up-Sequence.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Re-try-and-Follow-up-Sequence.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Re-try-and-Follow-up-Sequence.webp\" alt=\"\" class=\"uag-image-28454\" width=\"438\" height=\"528\" title=\"Re-try and Follow-up Sequence\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<h3 id=\"ember389\"><span class=\"ez-toc-section\" id=\"Why_WhatsApp_mattered_in_the_workflow\"><\/span>Why WhatsApp mattered in the workflow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember390\">We used AiSensy to handle the WhatsApp layer across the calling journey.<\/p>\n\n\n\n<p id=\"ember391\">This was important because the workflow was not designed as a call-only experience. It was designed as a connected outreach sequence, where WhatsApp helped:<\/p>\n\n\n\n<ul><li>prepare the user for the call<\/li><li>support follow-up when the call was not answered<\/li><li>carry the interaction forward afterward<\/li><\/ul>\n\n\n\n<p id=\"ember393\">Through AiSensy, we automated:<\/p>\n\n\n\n<ul><li>pre-call messages<\/li><li>missed-call follow-ups<\/li><li>post-call thank-you messages<\/li><li>contextual next-step templates based on what happened in the interaction<\/li><\/ul>\n\n\n\n<p id=\"ember395\">That made the workflow feel more continuous and better connected, instead of feeling like an abrupt voice call from an unknown number.<\/p>\n\n\n\n<h3 id=\"ember396\"><span class=\"ez-toc-section\" id=\"What_happened_after_the_call\"><\/span>What happened after the call<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember397\">After the call, the system would fetch a summary from the conversation. Based on that summary, we would write useful text and tags into our CRM, such as:<\/p>\n\n\n\n<ul><li>call attempted<\/li><li>identity confirmed<\/li><li>salesperson callback required<\/li><\/ul>\n\n\n\n<p id=\"ember399\">We also routed users into different Slack channels based on the call result, so the sales team could pick up qualified or action-worthy users in real time. That reduced the delay between AI qualification and human follow-up.<\/p>\n\n\n\n<h3 id=\"ember400\"><span class=\"ez-toc-section\" id=\"How_the_live_voice_workflow_worked\"><\/span>How the live voice workflow worked<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-f0456fff wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Real-time-Voice-AI-Flow--1024x586.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Real-time-Voice-AI-Flow-.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Real-time-Voice-AI-Flow-.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Real-time-Voice-AI-Flow--1024x586.webp\" alt=\"\" class=\"uag-image-28455\" width=\"619\" height=\"354\" title=\"Real-time Voice AI Flow\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<p id=\"ember402\"><strong>Step 1:<\/strong> The outbound call was triggered from our workflow and connected through <strong>Elision<\/strong>.<\/p>\n\n\n\n<p id=\"ember403\"><strong>Step 2:<\/strong> Once the user answered, <strong>VideoSDK<\/strong> managed the real-time agent interaction.<\/p>\n\n\n\n<p id=\"ember404\"><strong>Step 3:<\/strong> The user\u2019s speech was converted into text by <strong>Deepgram<\/strong>.<\/p>\n\n\n\n<p id=\"ember405\"><strong>Step 4:<\/strong> That text was processed by <strong>Google Gemini<\/strong>, which understood the input and generated the next response.<\/p>\n\n\n\n<p id=\"ember406\"><strong>Step 5:<\/strong> The response was converted back into speech by <strong>Cartesia<\/strong>.<\/p>\n\n\n\n<p id=\"ember407\"><strong>Step 6:<\/strong> The user heard the reply in real time, and the same loop repeated through the conversation.<\/p>\n\n\n\n<p id=\"ember408\">By the end of this stage, we had built more than a calling agent. We had built an automated qualification workflow that could:<\/p>\n\n\n\n<ul><li>speak to users<\/li><li>process outcomes<\/li><li>classify them<\/li><li>trigger follow-ups<\/li><li>route the right ones to sales<\/li><\/ul>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 id=\"ember410\"><span class=\"ez-toc-section\" id=\"6_What_broke_and_how_we_fixed_it\"><\/span>6. What broke, and how we fixed it<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember411\"><em>The real learning only started once we tested the system properly.<\/em><\/p>\n\n\n\n<p id=\"ember412\">Before going live, we tested:<\/p>\n\n\n\n<ul><li>how naturally the agent was speaking<\/li><li>the speed of speaking<\/li><li>whether it was following the script<\/li><li>pronunciation quality<\/li><li>latency<\/li><li>how the voice clone sounded<\/li><li>how quickly the agent triggered after user creation<\/li><\/ul>\n\n\n\n<p id=\"ember414\">And once we tested at depth, the real issues showed up.<\/p>\n\n\n\n<div class=\"wp-block-uagb-image uagb-block-8fe0f147 wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none\"><figure class=\"wp-block-uagb-image__figure\"><img srcset=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Analysing-Each-Call-in-Testing-Phase--1024x511.webp ,https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Analysing-Each-Call-in-Testing-Phase-.webp 780w, https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Analysing-Each-Call-in-Testing-Phase-.webp 360w\" sizes=\"(max-width: 480px) 150px\" src=\"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Analysing-Each-Call-in-Testing-Phase--1024x511.webp\" alt=\"\" class=\"uag-image-28456\" width=\"729\" height=\"249\" title=\"Analysing Each Call in Testing Phase\" loading=\"lazy\" role=\"img\"\/><\/figure><\/div>\n\n\n\n<h3 id=\"ember416\"><span class=\"ez-toc-section\" id=\"61_Answer_rate_and_spam_trust\"><\/span>6.1) Answer rate and spam trust<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember417\">Answer rate was a major challenge. Even a good agent is useless if people do not answer.<\/p>\n\n\n\n<p id=\"ember418\">Spam perception was part of the same problem. Even though we were only reaching out to our inbound users, some would still mark unfamiliar numbers as spam. To reduce that, we rotated numbers every month so they were less likely to be flagged.<\/p>\n\n\n\n<p id=\"ember419\">We also:<\/p>\n\n\n\n<ul><li>retried calls multiple times<\/li><li>set up our number properly in Truecaller with the official company name<\/li><\/ul>\n\n\n\n<p id=\"ember421\">That helped users identify the caller more easily. These changes did not solve the problem completely, but they meaningfully improved trust and answer rates.<\/p>\n\n\n\n<h3 id=\"ember422\"><span class=\"ez-toc-section\" id=\"62_Changing_the_call_opening\"><\/span>6.2) Changing the call opening<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember423\">Earlier, the agent would begin with a longer scripted introduction and explain the purpose of the call right away.<\/p>\n\n\n\n<p id=\"ember424\">For example, one of the earlier openings sounded like this:<\/p>\n\n\n\n<p id=\"ember425\"><em>\u201c\u0928\u092e\u0938\u094d\u0924\u0947 {name}! \u092e\u0948\u0902 Refrens \u0938\u0947 \u0905\u0926\u093f\u0924\u093f \u092c\u094b\u0932 \u0930\u0939\u0940 \u0939\u0942\u0901\u0964 {business_name} \u0915\u0947 \u092c\u093e\u0930\u0947 \u092e\u0947\u0902 \u0939\u092e\u0938\u0947 \u0938\u0902\u092a\u0930\u094d\u0915 \u0915\u0930\u0928\u0947 \u0915\u0947 \u0932\u093f\u090f \u0927\u0928\u094d\u092f\u0935\u093e\u0926\u0964 \u092e\u0948\u0902 \u0938\u092e\u091d\u0924\u0940 \u0939\u0942\u0901 \u0915\u093f \u0906\u092a \u0939\u092e\u093e\u0930\u0947 billing \u0914\u0930 accounting solutions \u092e\u0947\u0902 interested \u0939\u0948\u0902\u0964 \u0915\u094d\u092f\u093e \u0905\u092d\u0940 \u0932\u0917\u092d\u0917 \u0926\u094b \u092e\u093f\u0928\u091f \u092c\u093e\u0924 \u0915\u0930\u0928\u0947 \u0915\u093e \u092f\u0939 \u0938\u0939\u0940 \u0938\u092e\u092f \u0939\u0948 \u0924\u093e\u0915\u093f \u092e\u0948\u0902 \u0906\u092a\u0915\u0940 \u091c\u093c\u0930\u0942\u0930\u0924\u094b\u0902 \u0915\u094b \u0938\u092e\u091d \u0938\u0915\u0942\u0901?\u201d<\/em><\/p>\n\n\n\n<p id=\"ember426\">Over time, we realized there were a few problems with this:<\/p>\n\n\n\n<ul><li>it started speaking before confirming identity<\/li><li>it was too long for the first few seconds of a call<\/li><li>it was too Hindi-heavy<\/li><li>it sounded scripted and robotic<\/li><\/ul>\n\n\n\n<p id=\"ember428\">So we changed the opening to something much simpler:<\/p>\n\n\n\n<p id=\"ember429\"><strong>\u201cAm I speaking with [name]?\u201d<\/strong><\/p>\n\n\n\n<p id=\"ember430\">That made the interaction feel more natural, confirmed identity first, and reduced friction at the start.<\/p>\n\n\n\n<h3 id=\"ember431\"><span class=\"ez-toc-section\" id=\"63_Fixing_pronunciation_issues\"><\/span>6.3) Fixing pronunciation issues<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember432\">Pronunciation became a real issue during testing, and a lot of it came from how the text was written before it reached the agent.<\/p>\n\n\n\n<p id=\"ember433\">For example:<\/p>\n\n\n\n<ul><li>if <strong>gst<\/strong> was written in small letters, it could sound like <strong>\u201cgist\u201d<\/strong><\/li><li>if a name like <strong>UTSAV<\/strong> was written fully in capital letters, the agent could read it as <strong>\u201cU-T-S-A-V\u201d<\/strong> instead of <strong>\u201cOotsaav\u201d<\/strong><\/li><\/ul>\n\n\n\n<p id=\"ember435\">So we started cleaning and standardizing the text before passing it into the agent. We paid more attention to:<\/p>\n\n\n\n<ul><li>casing<\/li><li>formatting<\/li><li>how names and business terms were written<\/li><\/ul>\n\n\n\n<p id=\"ember437\">These may seem like small fixes, but they made the speech sound much more natural and noticeably improved call quality.<\/p>\n\n\n\n<h3 id=\"ember438\"><span class=\"ez-toc-section\" id=\"64_Reducing_latency_through_script_changes\"><\/span>6.4) Reducing latency through script changes<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember439\">Latency was a constant trade-off.<\/p>\n\n\n\n<p id=\"ember440\">Some of it came from model selection. Lower-cost models can increase delay. Higher-end models can reduce it, but cost more.<\/p>\n\n\n\n<p id=\"ember441\">But some of the latency problems also came from the conversation itself. If a question was too open-ended, the model had to do more work, which increased response time.<\/p>\n\n\n\n<p id=\"ember442\">To reduce latency, we simplified the questions the agent asked. We moved away from overly open-ended questions and toward structured, closed-ended options. This reduced the amount of thinking the AI had to do and made the call flow faster and cleaner.<\/p>\n\n\n\n<h3 id=\"ember443\"><span class=\"ez-toc-section\" id=\"65_Handling_silence_during_delay\"><\/span>6.5) Handling silence during delay<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember444\">Silence during latency felt like a dropped call.<\/p>\n\n\n\n<p id=\"ember445\">When there was a pause due to latency, the person on the other side could feel that the call had disconnected.<\/p>\n\n\n\n<p id=\"ember446\">So we added subtle background noise behind the agent\u2019s voice. That helped during latency gaps because complete silence made people think the call had dropped. The background layer made the call feel more active while the model was processing.<\/p>\n\n\n\n<h3 id=\"ember447\"><span class=\"ez-toc-section\" id=\"66_Detecting_IVR_and_bot_responses\"><\/span>6.6) Detecting IVR and bot responses<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember448\">IVR and bots answering calls also created waste.<\/p>\n\n\n\n<p id=\"ember449\">Sometimes the agent connected to an IVR system or another bot instead of a person. If the system kept talking in that situation, cost increased without any real value being created.<\/p>\n\n\n\n<p id=\"ember450\">So we began using real-time cues to detect when the agent was speaking to a bot or IVR instead of a real person. If the system noticed:<\/p>\n\n\n\n<ul><li>network tones<\/li><li>robotic response behavior<\/li><li>other bot-like patterns<\/li><\/ul>\n\n\n\n<p id=\"ember452\">it would end the call instead of continuing to speak pointlessly.<\/p>\n\n\n\n<h3 id=\"ember453\"><span class=\"ez-toc-section\" id=\"67_Controlling_rollout_by_language\"><\/span>6.7) Controlling rollout by language<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p id=\"ember454\">Language was another limitation.<\/p>\n\n\n\n<p id=\"ember455\">Hindi does not work equally well across all regions. In southern states and many northeastern regions, Hindi is not the preferred language for business conversations. So we realized we could not treat all geographies the same.<\/p>\n\n\n\n<p id=\"ember456\">Now, we are planning to:<\/p>\n\n\n\n<ul><li>introduce English for states where it is more suitable<\/li><li>experiment with regional languages for other states wherever relevant<\/li><\/ul>\n\n\n\n<p id=\"ember458\">By this stage, the system had become much more robust.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember459\"><span class=\"ez-toc-section\" id=\"7_Cost_value_and_efficiency\"><\/span>7. Cost, value, and efficiency<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember460\"><em>This is not just an AI-vs-human cost story.<\/em><\/p>\n\n\n\n<p id=\"ember461\">One of the most important things we learned is that this project should not be judged only through a simple AI-versus-human cost lens.<\/p>\n\n\n\n<p id=\"ember462\">Especially in India, AI voice calling is not always dramatically cheaper than a person making calls. That is not the full story.<\/p>\n\n\n\n<p id=\"ember463\">The real value is <strong>efficiency<\/strong>.<\/p>\n\n\n\n<p id=\"ember464\">Before this system, our sales team was simply not calling non-priority users at scale. So the comparison was not:<\/p>\n\n\n\n<p id=\"ember465\"><strong>AI or human for the same task<\/strong><\/p>\n\n\n\n<p id=\"ember466\">It was closer to:<\/p>\n\n\n\n<p id=\"ember467\"><strong>AI qualification layer or no qualification layer<\/strong><\/p>\n\n\n\n<p id=\"ember468\">That is where the value came from. The agent helped us cover a part of the funnel that was previously untouched. It also helped sales avoid spending time on:<\/p>\n\n\n\n<ul><li>users who do not answer<\/li><li>wrong numbers<\/li><li>identity mismatches<\/li><li>weakly qualified users<\/li><li>repetitive first-pass qualification conversations<\/li><\/ul>\n\n\n\n<p id=\"ember470\">A moderate-efficiency setup can bring costs to around <strong>\u20b96 per minute<\/strong>, while a more premium setup can push it toward <strong>\u20b99 to \u20b910 per minute<\/strong>.<\/p>\n\n\n\n<p id=\"ember471\">For us, the workflow cost was roughly <strong>\u20b96.3 per minute<\/strong>, spread across three main layers:<\/p>\n\n\n\n<ul><li>telephony<\/li><li>WhatsApp automation<\/li><li>the live voice stack, which included VideoSDK and the model<\/li><\/ul>\n\n\n\n<p id=\"ember473\">So in practice, AI voice calling is not just an LLM cost. It is a combined operating cost across the entire workflow.<\/p>\n\n\n\n<p id=\"ember474\">The bigger business point is flexibility. Hiring more people for repetitive top-of-funnel qualification comes with fixed cost, training time, and lower ability to scale up or down quickly. With an AI voice setup, once the system is trained and stabilized, we can expand or contract more easily through numbers, channels, and workflows.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember475\"><span class=\"ez-toc-section\" id=\"8_What_this_opens_up_next\"><\/span>8. What this opens up next<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember476\">Right now, we have started with a small cohort and one focused use case: qualification.<\/p>\n\n\n\n<p id=\"ember477\">But the longer-term possibilities are bigger.<\/p>\n\n\n\n<p id=\"ember478\">We are already thinking about questions like:<\/p>\n\n\n\n<ul><li>Could a voice agent handle more of a renewal workflow end-to-end?<\/li><li>Could it speak to an existing customer, detect positive renewal intent, send a discount code on WhatsApp during the call, and continue the conversation with that live context?<\/li><\/ul>\n\n\n\n<p id=\"ember480\">We also know that scaling this will require different agents for different cohorts. What works for one user group or geography may not work for another. Different segments may require:<\/p>\n\n\n\n<ul><li>different scripts<\/li><li>different tones<\/li><li>different voices<\/li><li>different language paths<\/li><li>different cultural patterns<\/li><\/ul>\n\n\n\n<p id=\"ember482\">Another important lever is the speed of deployment. It took us around <strong>one to two months<\/strong> to properly build and stabilize our first AI agent. Going forward, one of our goals is to reduce that cycle to <strong>one week or less<\/strong>. If we can do that, then we can launch more specialized agents much faster across more use cases.<\/p>\n\n\n\n<p id=\"ember483\">So for us, this is not just a one-off automation experiment. It is the start of a deeper conversational operations layer.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 id=\"ember484\"><span class=\"ez-toc-section\" id=\"9_Endnotes\"><\/span>9. Endnotes<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p id=\"ember485\">We started this project because we had a very real gap in our sales process.<\/p>\n\n\n\n<p id=\"ember486\">We had a large pool of non-priority users that our sales team could not realistically call, but we believed that some of those users still had meaningful business potential.<\/p>\n\n\n\n<p id=\"ember487\">Voice AI gave us a way to create a qualification layer for that pool.<\/p>\n\n\n\n<p id=\"ember488\">The rollout taught us that success in voice AI is not just about plugging a model into a voice system. It depends on:<\/p>\n\n\n\n<ul><li>the use case<\/li><li>the persona<\/li><li>script design<\/li><li>telephony quality<\/li><li>CRM integration<\/li><li>retry logic<\/li><li>number trust<\/li><li>language fit<\/li><li>latency management<\/li><li>automation around the call<\/li><\/ul>\n\n\n\n<p id=\"ember490\">The workflow was not just voice-led. It also included a WhatsApp layer before and after calls, plus outcome-based follow-ups for missed calls and completed interactions, which made the overall system more usable and better connected to the user journey.<\/p>\n\n\n\n<p id=\"ember491\">Today, this workflow helps us qualify users our sales team could not reach before and route the more promising ones into a human sales process with context.<\/p>\n\n\n\n<p id=\"ember492\"><strong>That is the real value we unlocked: <\/strong>not just more calls, but better use of human effort.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p id=\"ember495\"><strong>About Refrens<\/strong><\/p>\n\n\n\n<p id=\"ember496\"><em><a href=\"https:\/\/www.refrens.com\/\" data-type=\"URL\" data-id=\"https:\/\/www.refrens.com\/\">Refrens.com<\/a> a B2B SaaS platform trusted by 150,000+ businesses across 170+ countries for invoicing, accounting, payments, compliance, sales, inventory, and other core business workflows. We are backed by Vijay Shekhar Sharma (Paytm), Anupam Mittal (<a href=\"http:\/\/shaadi.com\/\" target=\"_blank\" rel=\"noopener\">Shaadi.com<\/a>) Kunal Shah (CRED), and Dinesh Agarwal (IndiaMART) among others. We are on a mission to change the way millions of SMEs across the globe run their core business operations. <a href=\"https:\/\/www.refrens.com\/\">Check us out! &gt;<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most AI voice demos look impressive for a few minutes. But building a voice calling workflow that actually works inside a sales process is a very different challenge. Once you move beyond the demo, you start dealing with real operational questions: who should get called, when should the call go out, what should happen if &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/www.refrens.com\/grow\/building-ai-voice-calling-workflow-for-sales-qualification\/\"> <span class=\"screen-reader-text\">How to Build a Production-Ready AI Voice Calling Workflow for Sales Qualification<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":15,"featured_media":28459,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"default","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","spay_email":""},"categories":[3],"tags":[],"jetpack_featured_media_url":"https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1.webp","uagb_featured_image_src":{"full":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1.webp",2560,1440,false],"thumbnail":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-150x84.webp",150,84,true],"medium":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-300x169.webp",300,169,true],"medium_large":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-768x432.webp",768,432,true],"large":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-1024x576.webp",1024,576,true],"1536x1536":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-1536x864.webp",1536,864,true],"2048x2048":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-2048x1152.webp",2048,1152,true],"refrens-yarpp-thumbnail-w200":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-200x112.webp",200,112,true],"newspack-article-block-landscape-large":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-1200x900.webp",1200,900,true],"newspack-article-block-portrait-large":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-900x1200.webp",900,1200,true],"newspack-article-block-square-large":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-1200x1200.webp",1200,1200,true],"newspack-article-block-landscape-medium":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-800x600.webp",800,600,true],"newspack-article-block-portrait-medium":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-600x800.webp",600,800,true],"newspack-article-block-square-medium":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-800x800.webp",800,800,true],"newspack-article-block-landscape-small":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-400x300.webp",400,300,true],"newspack-article-block-portrait-small":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-300x400.webp",300,400,true],"newspack-article-block-square-small":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-400x400.webp",400,400,true],"newspack-article-block-landscape-tiny":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-200x150.webp",200,150,true],"newspack-article-block-portrait-tiny":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-150x200.webp",150,200,true],"newspack-article-block-square-tiny":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-200x200.webp",200,200,true],"newspack-article-block-uncropped":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-1200x675.webp",1200,675,true],"yarpp-thumbnail":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-120x120.webp",120,120,true],"web-stories-poster-portrait":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-640x853.webp",640,853,true],"web-stories-publisher-logo":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-96x96.webp",96,96,true],"web-stories-thumbnail":["https:\/\/www.refrens.com\/grow\/wp-content\/uploads\/2026\/04\/Feature-Images-1-150x84.webp",150,84,true]},"uagb_author_info":{"display_name":"Mitesh Kariya","author_link":"https:\/\/www.refrens.com\/grow\/author\/mitesh-kariya\/"},"uagb_comment_info":0,"uagb_excerpt":"Most AI voice demos look impressive for a few minutes. But building a voice calling workflow that actually works inside a sales process is a very different challenge. Once you move beyond the demo, you start dealing with real operational questions: who should get called, when should the call go out, what should happen if&hellip;","_links":{"self":[{"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/posts\/28450"}],"collection":[{"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/comments?post=28450"}],"version-history":[{"count":2,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/posts\/28450\/revisions"}],"predecessor-version":[{"id":28461,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/posts\/28450\/revisions\/28461"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/media\/28459"}],"wp:attachment":[{"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/media?parent=28450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/categories?post=28450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.refrens.com\/grow\/wp-json\/wp\/v2\/tags?post=28450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}