What This Call Analyser Is
A call analyser is a system that helps us understand recorded conversations. At a very basic level, it takes an audio recording of a call and turns it into text. Once the call is in text form, it becomes much easier to review, search, summarise, and learn from.
The goal is not to judge calls or monitor people. It is to understand conversations better.
- What customers are asking.
- Where confusion happens.
- What patterns repeat over time.
This idea already existed in the team. We were using external tools to do this, and they worked fine. The challenge was cost.
Why We Needed an In-House Version
We were using OpenAI-based APIs to process calls. These services charge based on how much text is processed. As call volume increased, the cost increased steadily. Longer calls meant more text, and more text meant higher bills. Technically, nothing was wrong. Financially, it started to feel unsustainable. So instead of removing call analysis completely, we decided to explore whether we could handle most of it internally and use paid services only where they truly mattered. That is where my work on this started.
The Overall Flow of the System
Before going into technical details, it helps to understand the flow in simple terms.
- A call recording is uploaded
- The audio is converted into text
- The text is analysed and summarised
- The results are stored and reviewed later
This does not happen in real time. Calls are processed after they end, in the background when worker is free and picks it up. That single decision shapes most of the technical choices we made.
Speech to Text
For converting audio to text, we used faster-whisper. This is an open-source implementation based on OpenAI’s Whisper model, but it runs locally. Running locally means the audio does not need to be sent to an external service, which helps reduce cost and keeps data within our system.
It is fast, reliable for our use case, and works well for offline processing.
Text Analysis and Summarisation
For analysing the text, we used Llama 3 via Ollama.
Llama 3 is a language model that can read text and understand context. Ollama allows this model to run locally on our machines instead of in the cloud.
Instead of asking open-ended questions, we use it for very specific and structured outputs.
In simple terms, the system generates:
A quick summary of the call
A basic rating, which gives a rough idea of how the call went
A disposition, which means a short label describing what the call was mainly about
These outputs help us quickly understand the intent of the call and get a general sense of the team’s performance, without listening to the entire recording.
All of this happens internally, without sending the data outside our system.
Why a Separate Microservice
We did not integrate this logic into the main application.
Instead, we created a separate microservice. A microservice is a small independent application that focuses on one responsibility. In this case, only call processing.
We did this for a few reasons:
Call analysis uses more CPU and memory than normal requests
We did not want heavy processing to affect the main application
It allows easier changes and experiments without risking core functionality
This separation made the system safer and easier to manage.
Deployment and Cost Decisions
The service is containerised, meaning it is packaged with everything it needs to run. This makes it predictable and easier to deploy. Since we do not need real-time processing, we run this service locally instead of on expensive cloud infrastructure. Calls are processed in batches, in the background. This choice alone reduced deployment and usage costs significantly.
Known Limitations
This system is not perfect. Because processing is not real-time, insights are delayed. For our current needs, this is acceptable. If real-time analysis becomes important in the future, parts of this system will need to change. For now, we consciously chose cost efficiency and stability over speed. It depends on your needs completely.
Takeaways and Learnings
This project taught me a few important lessons.
First, architecture is about trade-offs, not perfection.
Every decision has a cost, whether financial, technical, or operational.
Second, breaking a system into small steps makes complex problems manageable.
Once I stopped thinking in terms of “AI call analysis” and started thinking in terms of “audio to text, then text analysis”, everything became clearer.
Third, building something internally forces you to understand it deeply.
There were gaps in my knowledge. I had to read, test, and revise multiple times.
That learning was the most valuable part.
Advice to Anyone Building Something Similar
If you are thinking of building an in-house system like this, start simple.
Do not try to match external tools feature by feature.
Focus on what you actually need.
Measure cost early.
Accept limitations instead of hiding them.
And most importantly, do not wait until you feel fully ready. You learn the most while building.
Final Thoughts
This was not about building something impressive. It was about building something practical, affordable, and good enough for now. I am still learning, and this system will evolve. But this experience helped me understand how real-world constraints shape technical decisions.
That alone made it worth doing.
Would love to hear your thoughts on it. That’s why we have a comments section 😉 Feel free to share.
Checkout My Portfolio : Dheeraj Verma
