How to write a code within dataflow paradigm

Of course, you know, there are at least two popular paradigm of programming: imperative and functional. But there is another, very interesting paradigm, which I call as “dataflow”. In this post I want to explain why it is good, and how to use it to build web-related services.

The key point of imperative programming is writing a code as a sequence of statements which change (mutate) the state of the program step-by-step. In functional programming, you have no state, but have a “chain” of [math] functions which passed throw a data, and change it “on the fly” from input to output.

In comparing with these approaches, “dataflow” paradigm does not work with sequence of steps. It assumes only catching the events with new states of the data, and reacting on it with creating another data.

It imposes very strict limits to operations with data: within web application, you have only CR*D-like API (here CR*D means CRUD without “U”-operation). In the pure ideology, you can only create, read and destroy the data (in real life, sometimes you have to update it inside the system, for performance purposes, – we will discuss about it later).

Let me show on example:

Example 1: comments with moderation.

Imagine that you have to build a web system (website) where users can comment anything, but each comment must be passed throw a moderation system. When user creates or “updates” the comment, other users should see only moderated (previous) version of the comment.

See how it works:

When user creates a comment first time, it creates two objects exactly: Comment object as a stupid container for the states, and first State object with comment body (text).

As described, we need to “send” the comment to moderation. When we receive an event of creating the new State of the Comment, we should react on it by creating a new object: ModerationRequest (with relation to State).

When moderator reviews the comment body (State) and approves it, he actually creates new object ModerationVerdict with attributes {approved:true}. The system listens to this event, and marks corresponding comment State as moderated and approved. Of course, in this case you mutate the data, but it is just a little workaround for reducing complexity of the system.

Note than moderator can create negative verdict with {approved:false} or even destroy unappropriate comment State. But latest approach is not good idea because you can lost the valuable data (which could be used for audit purposes).

When user “updates” his comment, he only creates new State (with new body text), and the whole process repeats as described.

In all cases, other users will see only last moderated and approved State of the comment.

Example 2: search.

In typical web systems, user sends you request with search params, then you parse it and prepare the answer in real time.

With dataflow you should do another thing: user creates a new object called SearchRequest, the system receives it and reacts on it with pack of found objects (like they was just created). User receives this pack as a search results.

Example 3: user lives in the city.

Typical approach for storing information about user’s city is adding an attribute “city_id” to user. But it would require from you to create a custom method like “update_user_city” and other dirty hacks. With immutable ideas, we should do another thing: just make a UserCity model (id, user_id, city_id, created_at), and create a new object of this model every time when you need to store information of current or next city of user.

Note the attribute “id” at this model! It is important because it allows you to create references to these object, and open CRUD API for them.


0. Strict API. When you can use only CRUD [especially without “U”], you can’t create dirty methods like “update_user_city”. With this limitation, you have to think-before-doing, and avoid dirty hacks.

1. No uncontrolled changes. User can not change his comment without moderation by design. You not need make other custom permission validations for it.

2. Drafts. When I write this post, WordPress store it states as drafts. Only last draft will be published.

3. Scalable without locks and race conditions. It does not matter how many working instances you have (single or hundred) when you just create new data and avoid mutations.

4. Business ingelligence out-of-the box. With users and cities, you have full information about user movements and periods of living in each city. You can easy separate travellers and homesters, and offer them a different (very personal) ads. You can give them advices if several people come to same city in same week. And more, and more analytics as you wish.

5. Caching. If you calculate relation between SearchRequest {body:”funny cats”} and corresponding objects once, you can easily use this already prepared relation when another user will search for funny cats.

6. Subscribe to filters. If you have got new (very fresh) funny cat picture in your system, you can just iterate stored SearchRequests and send notifications to all users who looking for funny cats last week.

7. History. History of user movements, history of user search, history of document states with supervising, so on.

8. Integrating with Flux [javascript] pattern out-of-the-box. When you have deal only with events of new state of data, you can easily push them to frontend (by SSE or websocket channels) and refresh user interface. In another side, user’s interactions with web interface produces events with new states of the data, and you can receive it by typical way.

Author: dobryakov

I’m a developer with full-stack business experience in web industry from sales to top-level technical management. Familiar with wide spectre of popular tools and methodics for whole life cycle of the software business - from communication with consumers to continuous delivery and support the long-term product. I have production experience in building the cloud and SaaS environments, designing distributed SOA systems and in DevOps activity. Feel free to contact me anytime by e-mail