During my 15-year journey in the world of software development, I spent the vast majority of my time designing and implementing systems using microservices architecture. I’ve been involved in multiple migrations from monolithic systems to microservices, as well as building new solutions from scratch. Along the way, my teams and I made plenty of mistakes. Some were quite trivial and easy to correct, others involved decisions that were difficult to reverse, or changes that required significant effort to fix. More than once, I found myself in situations where it seemed that working with this architecture was pointless and too challenging.
Fortunately, many failures gave me the right lessons, and realization that it’s not just about the failures, as we also succeeded and got many things right. In this article, I share key takeaways from my experience, hopefully giving you the confidence to implement microservices architecture successfully. Let’s get started!
Improper Service Boundary Designs
The first lesson addresses a common problem that I’ve encountered repeatedly in my projects: inappropriate service boundaries. In theory, these boundaries should align with business domains to minimize coupling and maximize cohesion. As is often the case, the theory is correct. I’ve seen firsthand how incorrect service boundaries have led to significant issues: developers facing constant headaches, errors become frequent, time-to-market increases, and ultimately, client or user satisfaction decreases. This is the complete opposite of what a microservices architecture should deliver.
But why does this happen? In my opinion, the problem lies in the fact that while theory clearly defines what we want to achieve – namely, breaking services along business boundaries – it fails to provide a clear methodology for “how” to do it effectively. There is a surprising lack of guidance on how to properly partition a business domain into microservices to meet these goals. I’ve heard countless stories from colleagues about microservices projects that either failed or came dangerously close to failure due to this challenge. And I’ve personally been in the trenches, making mistakes in this area, more than once. Let me share two key lessons that cover this problem.
When Buzzwords Backfire – Lessons from “Project-Alpha”
I was working as a software developer on a project involving a large monolithic system. For the purposes of this article, I will refer to it as “project-alpha”, since it was my first time with microservices, and I’ll reference it throughout later chapters. At the time, I had no experience with microservice architectures, especially when it came to migrating a monolith. To make it even more thrilling, no one on our project team had any experience with microservices at that time.
At some point, a decision was made to upscale our architecture to microservices. The rationale was that the system was rapidly growing both in terms of customers and the number of features expected to be developed, which naturally meant a larger team working on the project. Looking back, I believe that the main reason for adopting a microservices architecture was the “hype” surrounding the technology at that time. All the conferences, meetups, podcasts, and so on were about microservices, much like what we see with AI today. It was an exciting topic (buzzword?), and everyone was eager to jump aboard. But in reality, we had no idea what we were getting ourselves into.
The first attempt at microservices involved implementing new functionality as a separate deployment unit, with the monolith communicating via a REST API. On the surface, it didn’t seem like a bad approach. However, no one had properly analyzed how this new service would interact with the many existing functionalities within the monolith. At that point, all we knew was that the service was meant to handle a completely new concept within the system, and isolating it as its own microservice seemed logical. So, we did as planned:
We ended up with a big ball of mud with a microservice glued onto the monolith that was responsible for managing the new concept. So far, not so bad, right? Well, that was just the beginning. It quickly became apparent that there were very few business scenarios in which this new concept could operate in isolation. As requirements unfolded, we discovered that many functionalities in the monolith depended on the data stored in the microservice for this new concept. This tight coupling meant that critical business operations couldn’t function without the service, undermining the very goal of a microservices architecture.
Instead of simplifying our system, we introduced significant complexity. We had to deal with challenges such as HTTP communication and its associated issues, additional infrastructure requirements, and the need to adopt specialized tools for microservices, to name just a few. Yet, we didn’t gain any real benefits. Should we implement the new concept within the monolith as a separate module? The solution would have been simpler: with fewer HTTP calls, reduced potential for errors, and faster delivery of the feature.
So, what went wrong? In my opinion, the root issue was a lack of proper analysis regarding the functionality. We failed to map the business boundaries and design the solution accordingly. As a result, we blindly pursued microservices without fully understanding the consequences. And we ended up with a distributed big ball of mud, instead of properly distributed microservices. We paid the price: longer development times, more errors, and overall headaches from overengineering.
Before adopting microservices, I encourage you to spend time performing a deep analysis of how the projected business functionalities rely on each other, the type and degree of coupling they represent, and how it all translates into service boundaries.
Naive Boundaries Based on Identified Concepts
The second story involves similar issues, but this time, it didn’t occur during the transition from a monolith to microservices. It happened while designing an entirely new part of the system. This component was responsible for registering and verifying business accounts on an e-commerce platform. During initial discussions with product owner, the following process stages were identified:
- Account registration – the user provides an email address, username, and password. Once submitted, the account is registered for the given email.
- Account verification – the user supplies company details, banking information, and completes a verification payment. At this stage, the company has limited access to the platform.
- Document submission – the user sends the required company documents to unlock all platform features.
For context, this solution was developed at a company that already had microservices and had well-established processes for quickly creating and deploying new microservices with all the necessary features (observability, service discovery, auditing, etc.). This environment created a strong temptation to implement each domain concept as a separate microservice, and that was the case here. It was decided to split the solution into three microservices: account registration, verification, and document handling.
Initially, apart from the overhead of HTTP communication and resource overuse, there weren’t many drawbacks. However, problems soon began to emerge. The business process proved to be very unstable, requiring frequent changes, from minor adjustments to major variations depending on the registration country or company type. It quickly became apparent that the initially designed boundaries were far from ideal. Process changes required updates across multiple services, even when only one part of the process was affected (it was due to regional customizations). This interdependency made truly independent deployments impossible. Instead of enabling flexibility, changes in one service introduced new dependencies, leading to cascading failures.
So, what’s the key lesson from this story? The simplest and more effective approach would have been to start with a modular monolith. That is, a single deployment unit composed of three modules. Over time, we could have observed its behavior to validate whether the proposed boundaries were correct. In our case, it became clear that the boundaries were far from perfect and didn’t align with the business domains. Adjusting them within a monolith would have been much easier than doing so in a microservices architecture. Later, if necessary, those modules could be split into separate microservices, depending on other factors. In our situation, the solution would have worked perfectly as a single deployment unit.
Another issue I observed was a lack of deep understanding of the domain and its potential changes. We built a basic understanding based on a few discussions with the product owner, but that wasn’t sufficient. The product owner often doesn’t have complete knowledge of all aspects of the business and its domain. Instead, we should have organized analytical sessions with the business stakeholders (such as an Event Storming workshop) to gain a deeper understanding of the domain and its processes. This approach would likely have revealed that our initial implementation was merely a basic version of a process that could vary based on multiple factors, prompting a need for different boundaries and a revised design.
As I mentioned at the beginning of this chapter, I’ve encountered many real-life examples of similar issues where service boundaries were designed incorrectly, whether while creating a brand-new microservices solution or migrating a monolith to microservices. They all share a common root cause: a lack of proper domain understanding. This is why I always emphasize the importance of gaining a deep understanding of the business before writing any piece of code. In the microservices world, this is even more critical, as modifying service boundaries later is both difficult and costly.
Avoid Overuse of Synchronous Communication
When it comes to inter-service communication, we typically encounter two types: synchronous and asynchronous. Synchronous communication is deceptively simple, well-known, and often the first choice for integrating microservices. In contrast, asynchronous communication is more complex to implement and requires additional efforts in areas such as: observability, error handling, and managing eventual consistency, which is usually inherent in this approach. So, it’s not for free, but in my experience, it should be the default choice for inter-service communication of push type (commands and events). Let’s see why.
Synchronous Approach
What are the challenges associated with synchronous communication patterns in a microservice architecture? Consider, for example, a basic order creation process on an e-commerce website. In the happy path, an order is created only if the item is available in inventory and the payment is successful. Once the order is confirmed, the product is deducted from the inventory. Let’s assume that this process is distributed across several microservices.
Synchronous communication between these services introduces several challenges:
- Temporal coupling – this occurs when one service requires another to be up and running simultaneously in order to correctly process business operations. In our scenario, if any microservice is unavailable or unable to process the request, the entire order creation process fails. This is a highly undesirable outcome for business operations. That was also the case in Project-Alpha, where the monolith required the microservice to process business operations.
- Increased latency – creating an order involves multiple inter-service connections, each adding overhead. As a result, the process is slower compared to a monolithic implementation. Website users don’t like waiting.
- Cascading failures – if, for example, the Payment Gateway service encounters an infrastructure error (e.g. an inability to connect to the database), the error propagates from the Payment Service to the Order Service. Each service in the chain must handle the error. It results in unnecessary overhead.
- Scaling challenges – during peak periods (like Black Week), the system must scale to handle increased requests. With synchronous communication, scaling a single service is not always sufficient. Usually, all services involved in the synchronous communication chain must be scaled together, which is far less efficient.
Additionally, due to the nature of distributed systems, transaction management becomes complicated in such cases. Imagine that we first deduct an item from the inventory and then attempt to process the payment. If the payment fails, whether due to technical issues or transactional problems (e.g., incorrect credit card information), we cannot easily roll back the changes in the database. However, this issue is not unique to synchronous communication, as it also occurs with asynchronous methods. Thus, this topic is not the focus of these lessons.
However, synchronous communication is not inherently bad, and in some cases, it’s the right choice. First, it’s simple to implement, and I dare say everyone knows how. It also works perfectly for queries when you need to retrieve data from another service.
Asynchronous Approach
Now, let’s examine how asynchronous communication can help in this scenario. There are several ways to implement it, but for this example, I have chosen to use a message broker to transmit events from the Order Service to other services. In a happy path scenario, the order creation process might follow this flow:
- When a user places an order, the Order Service creates the order and emits an OrderCreated event to the message broker. The user immediately receives confirmation that the order was created successfully.
- The Payment Service listens for the OrderCreated event and processes the payment (using the Payment Gateway).
- The Inventory Service also listens for this event and deducts the item from the inventory.
How does this solution address the aforementioned problems?
- Decoupling services improves resilience: eliminating the temporal coupling between the Order Service and both Payment Service and Inventory Service improves the system’s resilience. In the event of a temporary outage of e.g., Payment Service, users can still create orders. The payment will simply be processed later once the Payment Service is restored.
- Reduced latency for order creation process: from the user’s perspective, the process is much faster. There is minimal communication overhead, as the order only needs to be persisted in the Order Service’s database and an event emitted.
- Failures in downstream services are not propagated upstream: for example, if the Payment Service encounters technical issues, the event processing can be retried without affecting the Order Service.
- Each service can be scaled independently: for instance, we can scale up the Order Service without having to scale the Payment Service. With a message broker serving as a buffer, the system can handle a higher volume of orders.
While this solution sounds promising, as always, there is no free lunch. Using this approach entails some costs and drawbacks:
- It requires a messaging infrastructure: i.e., a message broker, which involves setup and maintenance costs. However, once established, it can be reused across multiple scenarios.
- Message broker becomes a single point of failure: if it goes down, all business operations halt. Special measures must be taken to reduce the likelihood of this event.
- Error handling may be more complex compared to synchronous communication: for example, if a payment fails because the user’s credit card is locked, the Payment Service must communicate with the Order Service to cancel the order (for example, using another event).
- Communication with users becomes more complicated: if a payment fails, we must inform the user that the order has been canceled or that different credit card details are required, even though the order was already created and the user received confirmation. This typically necessitates asynchronous communication methods such as push notifications or emails.
Based on my experience with microservices, asynchronous communication may initially seem complex to implement and might require additional infrastructure, but in the long run, it delivers better resilience, system stability, and scalability. When combined with event-driven architecture, it becomes a powerful tool. In fact, I consider asynchronous communication the default choice for all push-type interactions (commands and events), resorting to synchronous methods only when there is a compelling reason to do so.
Again, you should analyze the feature you’re about to develop to determine whether using asynchronous communication will pay off. In some cases, such as for non-critical paths or during prototyping, using async may result in overengineering. Last but not least, please avoid the trap of overusing async methods for queries. That’s usually the wrong choice.
Missing or Inappropriate Tools
Do you remember “project-alpha” from the earlier lessons? That project taught me a lot. Another key lesson I want to share is that working with microservice architecture is extremely challenging without the right tools. Especially when migrating from a monolith to microservices, failing to adjust your toolbox for this architecture can turn your project into a nightmare. Let’s explore this in more detail.
In the project discussed in this lesson, the decision was made to adopt microservices. You might recall that a new service was created alongside the monolith. You probably assumed it was deployed in Kubernetes or another orchestrator, as best practices show. However, that wasn’t the case. At that time, a decision was made to develop a deployment solution involving Kubernetes, but for reasons unknown to me, it was continually delayed. This was also before cloud services like EKS became widely used. Nevertheless, the new service still needed to be deployed. Since the original monolith was deployed to a Tomcat server, we decided to deploy the new service as a separate .war file. The drawbacks of this approach: the lack of failure isolation and deployment independence, are self-evident.
But that was just the tip of the iceberg. Not only was the deployment platform missing, but we also lacked any log aggregation tools, not to mention tracing. Troubleshooting was done by directly accessing logs on the Tomcat server and jumping between different .war files and their instances – a truly terrible experience.
Moreover, the monitoring and alerting tools were not equipped to support microservices. In reality, we were completely blind and deaf when it came to understanding what was happening within the system, the communication between the monolith and the new microservice, its performance, and so on.
Additionally, there were no service discovery mechanisms in place. The monolith’s instances had the hostname and port of the new microservice defined in their configuration separately for each instance.
I know this all sounds wild. However, I’m sharing these experiences to raise awareness about the problems caused by missing or inappropriate tools when adopting microservices. This architecture is far more complex than a monolith and requires more sophisticated tools for deployment, observability, maintenance, and more. If you decide to migrate to microservices, make sure to include adjusting or creating your toolbox in the roadmap.
Organizational and Process Changes Are Also Crucial
All of the lessons so far focus on a technical dimension. They are, to varying degrees, issues that software engineers can address or approach differently to ensure the successful implementation of microservices. Except in cases where employing this architecture causes indefensible overengineering.
However, there is another set of challenges that programmers cannot easily fix – let’s call them organizational issues. These might include:
- flawed organizational processes
- an incorrect mindset among team members
- problems with team organization
Essentially, any factor unrelated to technology that prevents us from fully realizing the benefits of a microservice architecture. Throughout my journey with microservices, I have encountered several such challenges. In this chapter, I will describe two common ones.
Deployment Process from a Bygone Era
If you’ve been in the industry for say, 12+ years, you probably remember those “old days” of massive deployments. You and your team worked for several weeks or even months, and your code was deployed to production only once a quarter, semi-annually, or, in extreme cases, even once a year. But what a celebration it was!
Anyway, that era of producing software has faded into oblivion. Nowadays, we’re agile. We create software iteratively, gather feedback from users and the business, and adjust the software in short cycles. The benefits of microservice architecture, such as the ability to execute independent deployments, break code into smaller projects, and isolate failures, truly support the modern approach to software development.
However, my experience shows that this isn’t always the case. I once worked on a project for a large corporation, serving customers in nearly every country. Our goal was to build a data integration platform for various departments and systems within the company, as well as for integration with third parties. The data flew from one system – be it an internal department or an external partner – to another, there were thousands of such integrations.
From a technical perspective, using microservices for this project was the right approach. The platform handled enormous traffic – hundreds of thousands of users worldwide – with a critical need for scalability and failure isolation (an error in a single integration shouldn’t affect others). There was also a demand for independent deployments of the integrations. The codebase was enormous, with 60 developers contributing to the platform’s development.
In my opinion, the decision to use microservices to build this platform was a good one. What went wrong? The company implemented microservices without adapting the mindset and processes to fully leverage this architecture and its benefits. The biggest problem was the deployment process. You would expect that at such a scale with microservices, you could deploy new integrations or changes several times a day (or at least a week). Nothing could be further from the truth, as the deployment process was literally borrowed from systems that were deployed, let’s say, only once a quarter. Even if the team could develop and test an integration in a day or two, the deployment process took a minimum of three days – yes, three days.
What caused deployment delays? To deploy a microservice (or a change to it), developers were required to prepare extensive documentation detailing the microservice and the integration it handled. This documentation was then submitted to a “Configuration Manager,” who was responsible for reviewing it, conducting audits, and generating additional documentation. Finally, the changes were reviewed by a group with the final “go/no-go” authority over the integration.
One might argue that it’s beneficial for changes to be audited, well documented, and confirmed by a dedicated review team, and I agree. However, this could be done far more efficiently without hindering the team’s speed, causing frustration, and increasing time to market. We made multiple attempts to improve this process, but ultimately, it came down to the mindset of those in charge. They were unwilling to accept that things could be done more efficiently in a modern microservices architecture.
The lesson from this story is that unlocking the full potential of microservices requires not only technical changes to the architecture but also adjustments to the organization’s processes and the mindset of its people. I also recommend measuring the process using, for example, the “deployment frequency” and “lead time for changes” DORA metrics to monitor its health and identify areas for improvement.
Inefficient Team Organization
One of the greatest benefits of using microservices, in my opinion, is reducing programmers’ cognitive load by modularizing the code into separate codebases and assigning specific teams to each microservice. As a programmer, you no longer need to understand and work with the entire codebase, as was necessary with a monolith (at least without well-suited modules). Instead, you focus on a specific piece of the solution. This approach truly improves team efficiency and enhances developer productivity.
However, when migrating from a monolith to microservices, it is crucial to carefully design not only the microservice boundaries but also the team organization and the responsibilities each team will have. It’s very easy to simply transfer the mindset and workflow from a monolithic environment to a microservices architecture, which ultimately can make our lives more complicated rather than easier. In this lesson, I will share a personal story about how my team and I made this mistake and how it complicated our work.
Initially, the project was supported by a monolith with four teams contributing to it:
At some point, a decision was made to divide the system into microservices. This time, we approached it correctly by conducting a careful analysis of how to partition it into business modules and by using the Strangler Fig Pattern to iteratively extract these modules from the monolith. What we failed to do was redesign the team organization and reassign responsibilities after the migration. Instead of clearly defining each team’s responsibilities by assigning one (or two) microservices per team, we simply replicated the monolithic solution in the new architecture. We ended up with something similar to this:
What problems did this cause?
- Cognitive overload: every developer still needed to understand all parts of the code, which became even more challenging as the code was split across multiple codebases.
- Increased coordination overhead: issues such as merge conflicts and deployment conflicts became more frequent.
- Unclear separation of responsibilities:
- Which team should handle incidents occurring in a specific service?
- Which team ensures quality and coding standards in a specific service?
- Scattered business logic: knowing all the microservices led to a tendency to spread the business logic across multiple services.
Working in such an environment was far from ideal, so we quickly realized that we needed to fix the issue. We reallocated responsibilities so that each team had a clear assignment for a specific module. Please note that the assignment is not necessarily “one-to-one,” which is perfectly acceptable when it is a conscious decision driven by technical requirements.
What’s the lesson? It’s not enough to design the microservice boundaries after migration, we must also plan how the teams will work with them. This planning should be done upfront to ensure success.
Takeaways
In this article, I shared the most important lessons from my professional experience with microservice architecture. Here are the key takeaways that should improve your experience with microservices:
- Gain a deep understanding of the business – thoroughly understand your business and its processes. This will help you design proper microservice boundaries and reduce the risk of being caught off guard by changes that could drastically alter your solution.
- Start with a modular monolith – before fully committing to microservices, consider starting with a modular monolith. It’s much easier to adjust boundaries when the modules are not distributed.
- Justify the use of microservices – ensure that microservices are the right choice for your project. Collect and confirm the driving factors behind their adoption, and don’t blindly chase buzzwords.
- Equip yourself with the right tools – invest in the right solutions early on. Working with microservices without the proper tools and an efficient infrastructure can be extremely challenging.
- Avoid overusing synchronous communication – while it is easy to implement, excessive reliance on it can introduce unnecessary complexity.
- Rethink challenging processes – identify and refine any processes that might complicate working with microservices.
- Restructure teams if necessary – consider reorganizing teams and their responsibilities to enhance efficiency in a microservices environment.
Remember, microservice architecture is just a tool. It should assist you, not hinder you.