Muhammad Shujaat Siddiqi: Event Sourcing

Showing posts with label Event Sourcing. Show all posts

Thursday, October 31, 2013

EventSource & Performance for High Volume Events

.net framework 4.5 introduced EventSource API to write events using ETW infrastructure. The choice of ETW infrastructure has made the writing of events lightning fast. Now they can be consumed by any ETW consumer. If the event provider is not enabled, then the events just fall on the floor. On the other hand, if a session is already established, they are written to ETW session buffers. Now they can be consumed by the consumers. The event provider (EventSource) doesn't really have to wait until the messages are consumed. This is independent on the number of consumers registered for a particular type of events. We have discussed about ETW tools and their usage with EventSource API [Reference].

As we discussed above, the choice of ETW infrastructure has made EventSource API extremely fast. But we can even improve on it further. But remember that these recommendations would be useful in the case for high volume events generally. For low frequency events, although you would have performance improvement, but this might not be very noticeable.

Optimization # 1: Use IsEnabled() before WriteEvent()
ETW controllers can enable a provider before registration of provider. So when a provider registers itself and starts emitting events, a session is automatically created and they are forwarded to the buffer. In order to prove that, we can start Semantic Logging Service (from Application Block) and then run our application. Our service should still be able to consume the events. On the other hand, if the provider is not enabled, the events just fall on the floor by the ETW infrastructure. In order to improve it, we can check if the event provider is enabled so that we don't even use the WriteEvent() definition in base class. It does check the same thing in the base class as well.

When an Event Provider is enabled (generated for EventSource), it receives a command to enable itself. We can even use this command in our custom EventSource by overriding OnEventCommand() method. You don't need to use the definition of base class method as there is an empty implementation of the virtual method.

It caches this in a local variable. So EventSource can use this field to check if it is enabled. There is no property available in EventSource type to check that but we can find IsEnabled() method to do just that. Now a method might cause some confusion for you thinking it might be slow as it might do some other stuff and could slow it down further. As the development team suggests, it is just a field fetch. We can also verify this by dotPeeking into the framework assembly from Windows folder.

As a matter of fact there are two overloads of the method. One of these overloads is parameter less. The other one is more interesting, it lets us check if the provider is enabled for a particular level and keyword, which makes it more interesting.

Optimization # 2: Convert static field fetch to local member fetch
This suggestion is not for EventSource implementation but it refers to the use of EventSource instance in the logging site. The logging site is the code from where we call EventSource's methods. As we have seen several examples in this blog, EventSource implementation is singleton based [See singleton], it is a static variable fetch at the call site. Here is an example of the usage:

As we know static variable fetch is more expensive than an instance member fetch, we can assign the singleton instance to a local instance member. In order to improve the usage of EventSource's method, we can assign this to a local variable. Then we can use the local member at the logging site.

Optimization #3: Minimize the number of EventSources in the application
In a previous post, we discussed about the mapping between EventSource and ETW Event providers. Actually the framework registers an Event Provider in ETW infrastructure for every EventSource in the application. This is done using ETW EventRegister API. This allows the framework to pass a command to the EventSource from external controllers.

The framework also maintains a list of EventSources in the application domain. We can get the list of all EventSource (s) in the application domain using the static GetSources() method in EventSource type.

Both of these tasks are performed at startup. Since this would depend on the number of EventSource (s) in the applications so the higher number for event sources in your application would mean slower startup. This can be resolved by minimizing the number of EventSource in your application. I would definitely not suggest an EventSource for each type in your application. There can be two options to resolve this.

The first option is using Partial types. We can span the same type across different files in the same assembly. All of these definitions are combined to generate a consolidated type by the compiler. We generally organize our types in different folders in a project. Here each folder generally represents types belonging to the same group. We can provide all the event methods definition for the method group for the types in this folder.

Most of the real life projects span more than one projects. In this case, we should still be able to minimize the number of EventSource (s), by defining on for each group, in order to improve the startup performance. Here each event source can take care of instrumentation requirement for a group of types in your application. You can group them logically or the way you want.

Optimization # 4: Avoid fallback WriteEvent method with Object Array Parameter
There are various overloads of WriteEvent() methods in EventSource type. We need to call WriteEvent() method from our [Event] methods in EventSource type. One of these overloads is an overload with params array. If none of the overload matches your call then the compiler automatically falls back to this overload.

We should be avoiding the fallback method as much as possible for performance reasons as it is reported to be 10-20 times more expensive. This is because the arguments need to cast to object, an array needs to be allocated and these casted arguments are added to the array. Then calling the methods with these arguments as serialized.

In order to avoid using the fallback overload, we can introduce new WriteEvent() method by overriding it.

Sunday, August 18, 2013

SLAB - Reactive Event Sourcing

In the previous post we saw how we can use ETW events generated using EventSource using PerfView. We can also use these events in our application using EventListener. This is another new abstract type provided in .net framework 4.5. The abstract member is OnEventWritten() method. Additionally, it has a virtual method OnEventSourceCreated(). Below is a custom event listener. Here we are overriding the two members. We are just writing the debug messages when these methods are called. During development, debug messages can be seen in the Visual Studio's Debug Output window.

The listener's OnEventWritten method is called when EventSource writes an event. But before that we need to specify what level of events we are interested in from the EventSource. This is called Enabling the events where listener is specified with the EventSource and EventLevel.

You can notice OnEventSourceCreated and OnEventWritten being used by the logs in the Output window. The above code wold result the following logs:

Semantic Logging Application Block
Semantic Logging Application Block is provided in Enterprise Library 6 for supporting structured logging. It enhances EventSource to support logging events in SQL Server and Azure tables. Like other application blocks, this is also available as a Nuget package.

SLAB provides custom implementation of EventListener. The listener is Rx's IObservable. This is named as ObservableEventListener. Since this implements IObservable, it allows subscriptions from Event sinks. The sinks are IObserver (s) to observe on these event listeners.

The application block also provides a specialization of EventSource. This is named as SemanticLoggingEventSource.

In-Process Vs Out-Process
SLAB can be used in-process or out-process. We can use the Nuget package to use the libraries for using it in process. For out of process, it is hosted as a windows service. The service installer is available as a separate download. In the previous post, we discussed how we can use out-process logging with SLAB.

Event Text Formatter
SLAB supports JSON and XML based text formatting. The types are available in Microsoft.Practices.EnterpriseLibrary.SemanticLogging.Formatters namespace available in Microsoft.Practices.EnterpriseLibrary.SemanticLogging assembly. The assembly is downloaded using EnterpriseLibrary.SemanticLogging nuget package.

Event Sinks & Factories
SLAB also provides Event Sinks and their factories. All of these event sinks are IObserver.

Let's install EnterpriseLibrary.SemanticLogging application block nuget package. The package should take care of all its dependencies installation as well, which is only comprised of Newtonsoft.Json package.

Let's also install Rx-Main nuget package for subscriptions using Reactive extensions based subscribers.

In the following code we are subscribing to ObservableEventListener. We can get the messages in the OnNext() method for the subscriber. Here we have also used OnCompleted() and OnError() of the subscriber.

Running the above code would result in the following output. Please notice that the messages being logged using the event source as provided to the listener. Since we are subscribing to the listener, we are getting those messages in OnNext(). Here we are just writing those messages to Debug Window.

Download

Friday, May 10, 2013

Command Query Responsibility Segregation [ CQRS ] - An Introduction

Bertrand Meyer introduced the idea of CQS [Command Query Segregation] in his famous book Object Oriented Software Construction. He discussed about responsibilities of methods of an object. Each method should either be a Command or a Query method. It should not be both. As described here, asking a question should not change the answer. It should just give us the answer back. If you remember, a few months back we discussed about side effects free purely functional approach for designing methods. CQS helps us designing our types and their behaviors. We divide our methods into two categories, ones which modify the state (commands) and others which gives data back (query). Queries are free of side effects. Martin Fowler has also discussed about the approach its limitations for certain scenarios. But this is generally considered as a principle and is widely applicable.

CQRS is about taking CQS to the next level and encompass the whole domain. It is the abbreviation of Command Query Responsibility Segregation. Some people call it pattern, other call it an architectural style. I think we can leave the classification for some other time but there are some scholarly discussions [Distributed Object Computing Group - Washington University] which can help us identify it as one. Based on the point that it doesn't have a documented solution, I am more inclined towards classifying it as an architectural style.

When to use CQRS?
The name was coined by Udi Dahan and Greg Young. CQRS is applicable only on collaborative domains where a large set of users are working on a small set of data. The users interact with each other only this small set of data. There is no feedback by the system as the command wouldn't be processed immediately. We need durable handling of these commands by supporting persistence of these commands to accommodate system failures. The command shouldn't be scrapped until complete processing. There must be a mechanism for handling of duplicate commands.

Like any other architecture styles, CQRS is also not a silver bullet. Adopting CQRS by the system would deprive users of any immediate feedback. So it should be applied very carefully. Dividing systems in commands and queries makes our system highly scalable. It also improves maintainability.

The concepts of CQRS are simple but change in mindset is the most difficult part. So it is always good to come from a DDD mindset. This would open up your mind to design in terms of domain as a whole. Additionally this should help you understanding a lot of text and material put together by CQRS gurus where they reference DDD as it is a pre-requisite. You can also use CQRS on component level instead of the whole domain. So it might not be a top level architecture.

Why to Segregate Commands and Query?
I have just referred Interface Segregation Principle [ISP] an an example of segregation. We can very well use Single Responsibility Principle [SRP] keeping in view of commands and queries as separate responsibilities. See how well SOLID fits in every design and architecture discussion?? :) Let's take the step by step approach to understand CQRS similar to our discussion about Windows Identity Foundation where we argued why it is better to design security around claims.

As we discussed above that CQRS is different than CQS but the definition of commands and queries stay the same. Commands change the state of the system and Queries read the state of the system without causing any side effects.

Significance of Data
For an enterprise's everyday business needs Data plays a significant role. Modern organization use software systems to provide ease of data management. Software systems are designed for satisfying the business needs of our customers. The organizations which can efficiently create the needful data and has ability to process it become successful as they are able to make right decisions at the right time. Yes, knowledge is no more a virtue, it is power now.

These systems have a front-end for client facing. You can read MVVM Survival Guide to further study about designing client applications using MVVM architectural style. The front-end would be making calls to a back-end system. For simpleton applications this backend system might simply be a DataBase Management System. In this case, client applications make necessary calls in the supported query language [SQL]. It might also use DBMS specific database programming features including stored procedures and functions. The choice of DBMS is really important as data is of the highest priority in any system.

Enterprise Operations are the key
These systems work perfectly for simple applications involving CRUD [Create, Read, Update, Destroy] operations. But organizations have complex business processes which require interaction of multiple systems. These business processes have complex workflow and security requirements which don't quite fit in this client / DBMS based approach. Enterprise applications are designed for performing operations by authorized users. It is better to see these systems as a collection of operations and use cases. Back in my procedural programming days I remember saying a program is a set of functions. It is these operations for which we design our applications so our design must focus and built around them. Remember seeing a litany of applications back in earlier days of Information Technology? The moment we get our head away from CRUD approach and start thinking about enterprise operations, we start adding real value to enterprise application infrastructure. It is a paradigm shift and involves a lot of unlearning, which is the most difficult part. This is the core of domain driven design.

Classification of Operations
Building on the idea of command and query separation, we can easily divide enterprise operations into commands and queries. They can be provided to client applications as separate command and query services. Dividing our application as a set of commands enables us to provide Task based implementation. In DDD language, this is Shared Kernel approach for context mapping if we consider command and queries as separate bounded contexts. Both teams must agree on any changes on this shared model. There should be very strict continuous integration based tests to ensure that. Any breaking changes caused by requirement on one side should be planned so that the other team make necessary adjustment to accommodate the change.

Mostly we have different requirements for command and query models. Command models are less forgiving having strict validation requirements. On the other hand, query based models are more denormalized. For reporting purposes, these models also have in-built caching mechanisms generally which is not very common to command based models.

Command and Query Interfaces for Same Model
[Fowler] has discussed the simplest implementation as providing separate command and query interfaces. The model types should be implementing both of these interfaces. You might have seen the same business objects library shared between different projects and applications. They are examples of this implementation. I have seen this enough and this works in most of the situations. This style would be more useful with Dependency Injection where the concrete command services actually depend on the command interfaces, and query services are dependent on query interfaces. The actual model types are injected.

Separate Models for Commands and Queries
As we discussed above, although initially it might seem that we need the same model for both, studying the process in more details mostly suggests that only similar terminologies are being used by different domain experts. They are actually referring to some different concepts. In order to design these different concepts we can build altogether different models. This is the same approach as Separate Ways in Domain Driven design. It is also possible that both models inherit from a Shared Kernel based model for overlapping concepts and then develop their specific implementation on top of that. Doing that would save us from difficulties of planning for changes in shared kernel. Any changes in the domain model, can be planned separately by both teams. These changes should be resulting in model updates for providing a correct complete view of the system.

Scaling Command and Query Services
Based on the particular system we are trying to model, there might be different scaling requirements for commands and queries. It is possible that there are more frequent command operations or otherwise. In order to handle them properly, we can use separate deployment of command and query services. If it is a monitoring application, then we will definitely be getting more status updates (commands) then the queries to determine the state of things being monitored. The separate deployment can very well use the same model if that fits the requirement. In this case, since they are separate applications, we need to find a way to share the model between them. Historically the libraries are shared in a source control repository. While building these applications, we can use the recent version of the library. This sharing can be automated with nuget packages as well.

Or they can use completely different models like separate ways bounded contexts. In our example of monitoring application, the query service would mostly be management information system reports. They might also be complex event processing system designed specifically to ensure immediate action. It might be intensive care units of a hospital which require immediate response of these events.

Separate Reporting Databases
The date generated by commands is more transactional in nature. We might not need the same granularity for query services. Queries often require more aggregated data. This can be denormalized. It is possible to create a separate database for reporting purposes where we hold pre-processed data for data warehousing requirements. The process can run overnight in batches. This is generally read-only data with less historical view for faster queries. The data can be in terms of OLAP cube for easier slicing and dicing.

Related Patterns
There are other related patterns which are generally discussed in the same texts as CQRS. We would try to discuss them in some future post. They are as follows:

Event Sourcing
Process Manager
Eventual Consistency

Further Readings
http://cqrsjourney.github.com
http://pundit.cloudapp.net
http://www.cqrsinfo.com/category/programming/
http://martinfowler.com/bliki/CQRS.html
http://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf

Muhammad Shujaat Siddiqi

Thursday, October 31, 2013

EventSource & Performance for High Volume Events

Sunday, August 18, 2013

SLAB - Reactive Event Sourcing

Friday, May 10, 2013

Command Query Responsibility Segregation [ CQRS ] - An Introduction

Buy me a cup of coffee

Followers

Pageviews last month

Translate

Blog Archive

Who am I ??

Friends