Microservices — How monitoring in a production environment — Log Aggregators

2o5P...cmGf

16 Jan 2023

I already worked in many environments, many companies with many monitoring types, each one with a way monitoring, sometimes with poor monitoring, and others with amazing monitoring. But a thing we can’t deny is monitoring is essential to a production environment.

Imagine a scene where you have many services in which communicate between them, and someone is unavailable, or is throwing exceptions because any reason, how you’ll know about that?

This is where the importance of monitoring

There are many ways to monitor services, using a terminal (which I don’t recommend, especially when is in a production environment), using tools that provide for us, dashboards, log aggregators, alerts tools, tracing metrics, and others that I gonna bring in this series about monitoring.

In this first story in this monitoring series, I’ll show how is important to adopt log aggregation tools in a production environment.

Single Microservice / Monolithic Service — Single Server

In this scenario is easier to monitor a service, we can probably get the logging locally to the host using command-line tools to look at the log

If a user reports an error the monitoring should be able to see these errors giving us what went wrong a request for example, or when saving something in the database.

Single Microservice / Monolithic Service — Multiple Servers

Now, imagine if we need to scale more servers for our service, multiple copies running the same service on separate hosts. We want to continue to monitor all at the same time as before, this is trickier, maybe we can use ssh tools multiplexers to run commands on multiple hosts, but is not a great option, in this scenario, we could use Logs Aggregators(I’ll speak more about that).

Multiple Microservices — Multiple Servers

If you have many microservices communicating between them, asynchronous communication and synchronous communication this is a hard scenario to monitor.

Think a bit piece of at delivery system…

When the app receives a request to deliver something, let’s call this of the ORDER. After creating an order, with the delivery address and collect point, is necessary to create a route, and this route is to be allocated to a driver.

There are a lot of possible errors here:

Maybe there are no drivers available
Got an error at creating a new delivery order
Some drivers can’t send telemetry for some reason
Got an error at creating a new route etc…

In this situation, we can use some building blocks to make our delivery system observable.

Log aggregation

Logs are very important to help us understand what’s happening with our system in production.

Well, with many servers added with many microservices, it’s impossible to follow logs into the machine using ssh-multiplexing. So, to resolve these problems we can use a log aggregation.

A log aggregation will collect the logs of each server and forward these logs sorted in the store that can be queried by stakeholders.

Trust me, consider using a log aggregation principally if you want to use a microservice architecture.

A log aggregation tool as a prerequisite for implementing a microservice architecture.
Newman, Sam. Building Microservices (p. 313). O’Reilly Media. Kindle Edition.

Aggregating your logs, you’ll want to run queries to retrieve exact information. For this to work, it is important to format your logs.

Another great thing is we can use an approach: Correlation ID.

Basically, a correlation ID is typically used in a situation where a business transaction generates multiples requests that need to be tracked together

As mentioned above our example of the delivery system at the moment in which is creating a delivery order.

happens an error at the moment that is allocating the route for the reason that drivers are unavailable. With this approach, will be easier to investigate the reason for creating a query and filtering in the query by correlation ID.

13-01–2023 15:31:00z delivery-order-service INFO correlationId=d6f845cb-30e6–44ad-85c9–1b84ca2dd3b1 - Created order externalId - 99f08ae1–66dd-4ded-a955-c33f9471b409

13-01–2023 15:31:02z route-service INFO correlationId=d6f845cb-30e6–44ad-85c9–1b84ca2dd3b1 - Creating Route externalId - 38fc5b6f-2084-47be-b21f-456c53d21c81

13-01–2023 15:31:07z route-service ERROR correlationId=d6f845cb-30e6–44ad-85c9–1b84ca2dd3b1 - Error - to allocate the route - 38fc5b6f-2084-47be-b21f-456c53d21c81

Image from Logz.io

Alerting Metrics

A lot of log aggregation tools provide built-in alerting capabilities, which allow you to be notified in case of any critical event occurs in your logs.

This is a good advantage because we as software engineers should have a monitoring routine, but a thing better than this is to have someone that notifies us when a thing is wrong in our application.

These tools provide a built-in alerting system that allows us to set up alerts, based on a lot of conditions, for example, when a specific pattern is detected, fields in a log entry exceed a threshold, or when a certain log message appears. It supports alerting via Slack or e-mail.

Conclusion

Monitoring is an essential approach to many situations; for applications in a single serve, applications running on multiple servers, anyway the application one day will need to scale. And principally in the context of microservice architecture.

This story only focus shows you how good it is to have a log aggregator in your production environment