Tsoobame

software, japanese, kyudo and more

Understanding Dataloader

Dataloader is one of the packages I find more useful and smart from the ones I have in my toolbox.

I am going to set up a obvious naive example and follow the process to build a simple dataloader to understand its beauty and how useful it is.

About the project

We are going to create a view and api over a social network. Our users relations are:

User 1 friend of [ 2, 3 ]
User 2 friend of [ 1, 3 ]
User 3 friend of [ 1, 2, 4 ]
User 4 friend of [ 3, 5 ]
User 5 friend of [ 4 ]

The view can show the relation between users and their friends. We can show N levels of their friendship. We are not goint to look much at it in this post.

Users data can be found here.

The only dependency will be express.

Initial Setup

datasource.js

The datasource allows us to retrieve one or multiple users by id. Contract is not random, it is already based on the real dataloader so there will be minimal changes over the course of the post. Data is defined in a file within the project. Code is pretty simple:

const users = require('./users.json')


const getUsersFromFile = (ids) => ids.map(id => users.find(u => u.id === id))
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms))

async function loadMany(ids) {
    console.log(`GET /users?ids=${ids}`)

    await sleep(100)
    return getUsersFromFile(ids)
}

async function load(id) {
    const results = await loadMany([id])
    return results[0]
}

module.exports = {
    load,
    loadMany
}

The only interesting method is loadMany. We will print the requests to the simulated service so we can check the console. There will be a delay to resolve the promise, so we can simulate better and understand why dataloader is so good. 

A very important requirement is that data needs to be returned to the caller in the right order and all elements need to be returned (same length of ids and results arrays). This will be clear when we put in place the dataloader.

resolver.js

Resolver will use the datasource received by parameter to load friendship data about users. It can receive the levels of friends we want to get, so it will use a recursive approach to load friends of friends until all levels are fetched.

async function getFriends(datasource, user, levels) {
    if (levels == 0) {
        return { id: user.id, name: user.name }
    }

    const friends = await datasource.loadMany(user.friends)

    return {
        ...user,
        friends: await Promise.all(
             friends.map(f => getFriends(datasource, f, levels - 1))
        )
    }
}


async function getUserWithFriends(datasource, id, levels = 1) {
    const user = await datasource.load(id)
    return getFriends(datasource, user, levels)
}

module.exports = { getUserWithFriends }

It uses a brute force approach on purpose. The code is simple but far away from being optimal. In one method it looks obvious, but sometimes, when we are building graphql or similar apis, or complex workflows we might be doing exactly this kind of brute force requests.

view.js

Nothing advanced. Just render users friends in a nested way.

function render(user) {
    return `<div style="padding-left: 12px;background-color:#def">
            ${user.name}
            ${user.friends ? user.friends.map(u => render(u)).join('') : ""}
    </div>`
}


module.exports = {
    render
}

server.js


const express = require('express')
const PORT = 3000
const app = express()

const datasource = require('./datasource')
const resolver = require('./resolver')
const view = require('./view')


app.get(`/user-with-friends/:id`, async (req, res) => {
    const id = req.params.id
    const levels = req.query.levels || 1

    const user = await resolver.getUserWithFriends(datasource, id, levels)

    res.send(view.render(user))

})

app.listen(PORT, () => console.log(`Fakebook listening to ${PORT}`))


Run

node index.js

Test 1

We will render friends of user 1. Only 1 level:

http://localhost:3000/user-with-friends/1

If we check in our console we will find:

GET /users?ids=1
GET /users?ids=2,3

All good. We requested user 1 and their friends 2 and 3.

Test 2

Let's try by loading 3 levels:

http://localhost:3000/user-with-friends/1?levels=3

Things are getting interesting here:

GET /users?ids=1
GET /users?ids=2,3
GET /users?ids=1,3
GET /users?ids=1,2,4
GET /users?ids=2,3
GET /users?ids=1,2,4
GET /users?ids=2,3
GET /users?ids=1,3
GET /users?ids=3,5

We are loading data for users 1,2,3,4,5 but we are doing 9 requests. We are requesting the same users again and again. We could easily improve the situation adding some sort of cache per request.

Cache per request

We are going to add a cache to the system. It will be empty at the start of each request, so we do not need to worry about expirations. The benefits will be:

  • Do not request the same resource twice to the remote source during the same request.
  • As side effect, if we try to get the same resource twice during the same request, we will get the same data. So mutations of the resources in between a request will not provide incoherent results.

cache.js

Simple cache implementation:


function make(loadManyFn) {

    const cache = {}

    async function loadMany(ids) {
        const notCachedIds = ids.filter(id => !cache[id])

        if (notCachedIds.length > 0) {
            const results = await loadManyFn(notCachedIds)
            notCachedIds.forEach((id, idx) => cache[id] = results[idx])
        }

        return ids.map(id => cache[id])
    }

    return {
        load: async id => {
            const results = await loadMany([id])
            return results[0]
        },
        loadMany
    }

}

module.exports = { make }

Cache needs a function to retrieve multiple data by id (or in general by a key). It will check the data that is cached and request only the ids that are not found.

Implements the same contract as datasource.

server.js

Let's add this line to the server:

const cache = require('./cache')

And replace this line:

const user = await resolver.getUserWithFriends(datasource, id, levels)

with:

const user = await resolver.getUserWithFriends(cache.make(datasource.loadMany), id, levels)

Run

Let's run again the server and test the previous request:

http://localhost:3000/user-with-friends/1?levels=3

GET /users?ids=1
GET /users?ids=2,3
GET /users?ids=4
GET /users?ids=4
GET /users?ids=5

We could reduce the number of requests from 9 to 5, which is pretty good. But, what a momentwhat happened here? Why are we requesting id=4 twice?

If we unnest the request flow based on how nodejs works (and how we implemented our resolver) this is what happened:

  • 1 - Load user 1 => GET /users?ids=1
    • 2 - Load friends of 1: [2,3]=> GET /users?ids=2,3
      • 3.1. Load friends of 2: [1,3] => all cached
        • 4.1. Load friends of 1 : [2,3] => all cached
        • 4.2. Load friends of 3 : [1,2,4] => GET /users?ids=4
      • 3.2. Load friends of 3: [1,2,4] => GET /users?ids=4
        • 4.3. Load friends of 1: [2,3] => all cached
        • 4.4. Load friends of 2: [1,3] => all cached
        • 4.5. Load friends of 4: [3,5] => GET /users?ids=5

On 3.1 we had all friends of user 2 cached. So the code was straight to 4.2, than ran in parallel with 3.2. Both were waiting for the same user (4) and therefore made the same requests twice.

So with our simple cache, we did not reduce the requests to the minimun we wanted.

For example, if we did:

const users = await Promise.all(load(1), load(1))

There would be 2 requests before the cache has data for id=1.

Let's fix this and produce the ideal:

GET /users?ids=1
GET /users?ids=2,3
GET /users?ids=4
GET /users?ids=5

Dataloader

Using nodejs process.nextTick(...) we can postpone the execution of a given function to the end of the current event loop cycle. It is useful to run a given function after all variables are initialized for example.

From nodejs documentation

By using process.nextTick() we guarantee that apiCall() always runs its callback after the rest of the user's code and before the event loop is allowed to proceed.

Using it we can accumulate all the keys that are being requested during the same cycle (3.2 and 4.2 in the example above) and request them at the end. In the next cycle we would accumulate again the ones that were depending in the previous ones and so on.

This simple version of dataloader incorporates also code to accomplish the cache:


function make(loadManyFn) {

    const cache = {}
    let pending = []
    let scheduled = false
    function scheduleSearch() {
        if (pending.length > 0 && !scheduled) {
            scheduled = true
            Promise.resolve().then(() => process.nextTick(async () => {
                await runSearch()
                scheduled = false
            }))
        }
    }

    async function runSearch() {
        const pendingCopy = pending.splice(0, pending.length)
        pending = []

        if (pendingCopy.length > 0) {
            const results = await loadManyFn(pendingCopy.map(p => p.id))
            pendingCopy.forEach(({ resolve }, idx) => resolve((results[idx])))
        }

    }


    async function loadMany(ids) {
        const notCachedIds = ids.filter(id => !cache[id])

        if (notCachedIds.length > 0) {
            notCachedIds.map(id => {
                cache[id] = new Promise(resolve => {
                    pending.push({ id, resolve })
                })
            })

            scheduleSearch()
        }

        return Promise.all(ids.map(id => cache[id]))
    }

    return {
        load: async id => {
            const results = await loadMany([id])
            return results[0]
        },
        loadMany
    }

}


module.exports = { make }

Ignoring the part of the cache, the important bits are:

Accumulating requests

notCachedIds.map(id => {
    cache[id] = new Promise(resolve => {
        pending.push({ id, resolve })
    })
})

We will add to the list of pending ids the ones that are not cached. We will keep the id and the resolve method, so we can resolve them afterwards with the right value. We cache the promise itself in the hashmap. This would allow us to cache also rejected promises for example. So we do not request over and over the same rejection. It is not used in this implementation, though.

Scheduling the request

 function scheduleSearch() {
        if (pending.length > 0 && !scheduled) {
            scheduled = true
            Promise.resolve().then(() => process.nextTick(async () => {
                await runSearch()
                scheduled = false
            }))
        }
    }

That is where the magic happens. This function is short but is the most important one: We schedule/delay the request to the end of all the promises declarations.

    async function runSearch() {
        const pendingCopy = pending.splice(0, pending.length)
        pending = []

        if (pendingCopy.length > 0) {
            const results = await loadManyFn(pendingCopy.map(p => p.id))
            pendingCopy.forEach(({ resolve }, idx) => resolve((results[idx])))
        }

    }

Clone the ids (so they can be accumulated again after the search completes) and call the loadManyFn so we can resolve the promises we had pending. Remember the requirements of loadMany to return the data in the right order and all the elements ? This is where it is needed. We can reference the results by index and resolve the right pending promises.

Let's run it!

Execution

Again the same request:

http://localhost:3000/user-with-friends/1?levels=3

That produces the following output:

GET /users?ids=1
GET /users?ids=2,3
GET /users?ids=4
GET /users?ids=5

Exactly what we wanted.

Conclusion

- Dataloader is a great package that should be in all developers toolbox. Specially the ones implementing Graphql or similar Apis. 

- The resolvers in this example could be optimized but sometimes our requests are on different files at different levels that depend on some conditions. With Dataloader we can keep our file structure and code readability without damaging our performance, both on response time to our client and on number of requests spawn within our mesh.


Are you using Dataloader? Do you know any tool that accomplishes something similar? Do you now any other packages that in your opinion should be in all nodejs devs toolbox?

Graphql Stitching - Part 1

I am going to write a short (?) post about how to create a simple API Gateway that exposes two services using Graphql Stitching. I am assuming some knowledge about graphql and Apollo Server.
We will use express , nodejs and apollo for the service and a technique called schema stitching.
If you want to learn more about Graphql you can go to the official site.

Why do we need Api gateways and schema stitching

I will write a whole post about the reasons we had to use Graphql in our services and in our Api Gateway.
Here I am offering a short explanation:
In real world scenarios we are creating independent and autonomous (micro)services. The less data they share, the less they need to call each other and the less coupled they are, the better.
Many times a service manages entities (or parts of entities) that hold an id about another entity but does not need to know more details. For example an inventory service might manage productID and available units, but does not need to know about the name of the product or about its price. 
Inventory service will be able to run all its operations and apply the rules it manages without requesting information to any other service. 
Users, on the other hand, will need to see this scattered data together in one screen. In order to avoid too many requests from the UI, an API Gateway can offer a single endpoint where UI can request the data needed for a specific functionality/screen in one request, and the Gateway can orchestrate the calls to other services, cache results if needed, etc.

Let's start working

Let's create a folder as the root for our project:
mkdir graphql-stitching
cd graphql-stitching


Creating the songs service

We are going to create a simple service that offers data about songs.

mkdir songs
cd songs
npm init -y
npm install express graphql apollo-server-express body-parser
We are going to create our schema first:
touch schema.js
schema.js
const { makeExecutableSchema } = require("graphql-tools");
const gql = require('graphql-tag')

const songs = [
{ id: 1, title: "I will always love you" },
{ id: 2, title: "Lose yourself" },
{ id: 3, title: "Eye of the tiger"},
{ id: 4, title: "Men in Black" },
{ id: 5, title: "The power of love" },
{ id: 6, title: "My Heart will go on" }
];

const typeDefs = gql`
type Query {
songs: [Song]
song(songId: ID!): Song
}
type Song {
id: ID
title: String
}
`;

const resolvers = {
Query: {
songs: () => {
return songs;
},
song(parent, args, context, info) {
return songs.find(song => song.id === Number(args.songId));
}
}
};

module.exports = makeExecutableSchema({
typeDefs,
resolvers
});


We are defining a list of songs.
The type Song (id, title) and two queries for getting all songs and one song by id.

Let's create the api:
touch index.js
index.js:
const express = require('express')
const { ApolloServer } = require('apollo-server-express')
const cors = require('cors')
const schema = require('./schema')
const bodyParser = require('body-parser')

const app = express()
app.use(cors())
app.use(bodyParser.json())

const server = new ApolloServer({
playground: {
endpoint: '/api',
settings: {
'editor.cursorShape': 'block',
'editor.cursorColor': '#000',
'editor.theme': 'light'
}
},
schema
})

server.applyMiddleware({ app, path: '/api' })

app.listen(3000, () => {
console.log('Song services listening to 3000...')
})


We create a simple express service using apollo engine to expose both the api and the playground to tests our api.
node index.js
and open the songs api
You will see the playground, so you can run the first query:
{
  songs{
    id 
    title
  }
}
you should be able to see the results.

Creating the movies service

We are going to follow the same process. From the root of our project:
mkdir movies
cd movies
touch index.js
touch schema.js
npm init -y
npm install  express graphql apollo-server-express body-parser graphql-tag
index.js will be similar to the previous one. Only the port number needs to be different
const express = require('express')
const { ApolloServer } = require('apollo-server-express')
const cors = require('cors')
const schema = require('./schema')
const bodyParser = require('body-parser')

const app = express()
app.use(cors())
app.use(bodyParser.json())

const server = new ApolloServer({
playground: {
endpoint: '/api',
settings: {
'editor.cursorShape': 'block',
'editor.cursorColor': '#000',
'editor.theme': 'light'
}
},
schema
})

server.applyMiddleware({ app, path: '/api' })

app.listen(3001, () => {
console.log('Movie services listening to 3001...')
})


Schema will be very similar:
const { makeExecutableSchema } = require("graphql-tools");
const gql = require('graphql-tag')

const movies = [
{ id: 1, title: "The Bodyguard", mainSongId: 1 },
{ id: 2, title: "8 Mile", mainSongId: 2 },
{ id: 3, title: "Rocky III", mainSongId: 3},
{ id: 4, title: "Men in Black", mainSongId: 4 },
{ id: 5, title: "Back to the Future", mainSongId: 5 },
{ id: 6, title: "Titanic", mainSongId: 6 }
];

const typeDefs = gql`
type Query {
movies: [Movie]
movie(movieId: ID!): Movie
}
type Movie {
id: ID!
title: String!
mainSongId: ID!
}
`;

const resolvers = {
Query: {
movies: () => {
return movies;
},
movie(parent, args, context, info) {
return movies.find(movie => movie.id === Number(args.movieId));
}
}
};

module.exports = makeExecutableSchema({
typeDefs,
resolvers
});


The difference is that movie has a reference to songs. Specifically mainSongId. Since both services are isolated and are autonomous, movie service does not know where songs service is, or what data a songs holds. Only knows that a movie has a main song and it holds its ID.

If we run the project in the same way
node index.js
we can see the playground and run our test queries.

Let's start the interesting part. Our Api gateway

We are going to create the same files. From project root:
mkdir apigateway
cd apigateway
touch index.js
touch schema.js
npm init -y
npm install  express graphql apollo-server-express body-parser graphql-tag apollo-link-http node-fetch

The schema will created based on the schemas of the other services, so we are going to stitch and expose them in the api gateway.
schema.js
const {
introspectSchema,
makeRemoteExecutableSchema,
mergeSchemas,
} = require("graphql-tools")
const { createHttpLink } = require("apollo-link-http");
const fetch = require("node-fetch");


const MoviesUrl = 'http://localhost:3001/api'
const SongsUrl = 'http://localhost:3000/api'


async function createServiceSchema(url) {
const link = createHttpLink({
uri: url,
fetch
});
const schema = await introspectSchema(link);
return makeRemoteExecutableSchema({
schema,
link
});
}

async function createSchemas() {
const movieSchema = await createServiceSchema(SongsUrl);
const songsSchema = await createServiceSchema(MoviesUrl);

return mergeSchemas({ schemas: [songsSchema, movieSchema] })
}

module.exports = createSchemas()


 
As you can see in the code, the schema is generated by requesting the schemas of both APIs and merging them. 
One difference is, now we need to request this data before being able to start the apigateway, so the index.js will be slightly different:
const express = require('express')
const { ApolloServer } = require('apollo-server-express')
const cors = require('cors')
const createSchema = require('./schema')
const bodyParser = require('body-parser')

const app = express()
app.use(cors())
app.use(bodyParser.json())

createSchema.then(schema => {
const server = new ApolloServer({
playground: {
endpoint: '/api',
settings: {
'editor.cursorShape': 'block',
'editor.cursorColor': '#000',
'editor.theme': 'light'
}
},
schema
})
server.applyMiddleware({ app, path: '/api' })
app.listen(4000, () => {
console.log('Graphql listening to 4000...')
})
})

Before starting the listener, the schema is requested and merged so we can expose it in our api.

We need to run the previous services in order to be able to execute this one. From the root of the project:
node movies/index.js &
node songs/index.js &
node apigateway/index.js
If we go to the api gateway playground we can query movies and songs in the same query:
{
  movies{
    id
    title
    mainSongId
  }
  
  songs {
    id
    title
  }
}

This was an introduction to schema stitching. In part 2 I will show some more concepts and real case scenarios like extending the services' schema in the api gateway with custom resolvers, how to optimize by using dataloaders.

If you have any questions about graphql schema stitching or about api gateway in general, please add your comment or contact me.


User Registration - Layered Architecture (OOP)

This post can be read independently but is part of a series of posts about Ports and Adapters and Functional programming. If you want to read it, you can access through this link.


User Registration

We need to implement a service that will manage the user registration for a web application. The system needs to allow users to get registered and to confirm their registration and email address through a link that will be sent in an automatic email. 

The emails are sent by a different service that contains this functionality. The User Service will need to publish events to the event bus in order to communicate with the rest of the system and will need to store the information about the user in its own database.

Note: we are going to simulate the database and the messaging system because it goes beyond the goal of this post.


Architecture

We are going to follow a simple Layered Architecture, from HTTP API to Database / Event Bus:


We are going to focus on the elements within the dotted box:

  • HTTP Facade: offers the HTTP interface for the clients to connect and access the functionality of the service.
  • Service: contains the business of the service. Executes the behaviour defined by the domain rules.
  • Repository: offers the methods required for storing and retrieving the data from the database. 
  • Events adapter: offers the methods required for publish and subscribe to events in the event bus. In this case, the user service will only publish them.

Coding

You can find the code related to this post in GitHub

I am going to start the development from top to bottom since it is easier to reason about the layers in Layered Architecture.

HTTP Facade

The HTTP Facade will contain the entry point to the User Service. As we said we will need two different methods: registerUser and confirmUser.

Note: since I want to discuss about the concepts I am not using REST or OData or GraphQL APIs, since they require a whole (big) post on their own.

Our initial version of the Http Facade will be as simple as:

const express = require('express');
const app = express();
app.post('registerUser', (req, res) => {
});
app.post('confirmUser', (req, res) => {
});
app.listen(3000, () => {
    console.log('Listening to port 3000');
});

As you can see with express it is very easy to implement the facade. We just need to choose the endpoints of the api (both path and verbs).

Of course now the HTTP Facade is doing nothing, but we still do not have a service to call. We are going to include the User Service to see how the final picture of this layer will look like. 

As shown in the diagram, the User Service will have some dependencies on the repository and the events adapter. We are going to inject those dependencies in the class constructor, so we can abstract the service from the implementation of these dependencies so we can test it better by mocking them in the tests. ç

We could have a factory taking care of injecting the dependencies to the service, or a bootstrapper configuring some of the dependencies map through a IOC library (like infusejs for example) but for simplicity we are going to initialise and inject the dependencies in the HttpFacade.

The registerUser method would look like:

app.post('registerUser'async (req, res) => {
    const userService = new UserService(new UserRepository(), new EventAdapter());
    const userData = req.body;  // Using bordy-parser
    const result = await userService.registerUser(userData);

    res.send(result);
});

The code consists in simply :

  1. Initialise a new UserService passing its dependencies in the constructor.
  2. Retrieve the data sent to the api endpoint.
  3. Call the right method in the UserService.
  4. Return the result to the caller (we are ignoring potential errors to make the code simpler).
The confirmUser endpoint will do something similar, just calling a different method in the service. The UserService initialisation will be the same, and it could be that UserRepository or EventAdapter had dependencies, so it would be nice to have a factory or use an IOC library as mentioned before.

We are going to create the User Service next.

User Service

The user service will contain all the domain or business logic. It needs to apply the rules, validations, etc to ensure that the actions we are trying to perform are correct before persisting the result or new state in the chosen storage system and notifying the rest of the system about this operation.
module.exports = class UserService {
    constructor(repository, eventAdapter) {
        this.repository = repository;
        this.eventAdapter = eventAdapter;
    }

    async registerUser(userData) {
        if(!userData) {
            throw new Error('The user cannot be null.');
        }

        validateUserName(userData.userName);
        validatePassword(userData.password);

        const user = {
            userName: userData.userName,
            password: userData.password,
            state: 'PENDING'
        };

        const newUser = await this.repository.save(user);
        this.eventAdapter.publish('user.registered', newUser);
        return newUser;
    }

    async confirmUser(userData) {
        if(!userData) {
            throw new Error('The user cannot be null.');
        }

        let user = await this.repository.getByNameAndPassword(userData.userName, userData.password);
        user.state = 'CONFIRMED';

        await this.repository.save(user);
        this.eventAdapter.publish('user.confirmed', user);
        return user;
    }
}

function validatePassword(password) {
    // Validate length, strength, etc
}

function validateUserName(userName) {
    // validate length, format, etc.
}

The code is simple but here is what this class is doing:
constructor
  • Stores the dependencies so they can be used whenever needed.
registerUser:
  • Validates that the user data is correct.
  • Sets some fields based on the domain rules: a user starts as PENDING until the email is confirmed.
  • Persists the information in the database.
  • If everything above is correct it will publish a new event to notify the rest of the system about the new user.
confirmUser: (see registerUser)

Although is beyond this topic, I wanted to write a short comment about events:
In this example we are using events to communicate through different services. In this specific case, we might need to sent an Email to the user that just registered so he can confirm its email address. Since one application can send emails due to different reasons or from different actions that might be implemented in different services, it makes sense to have a separated service for the Emails. It is very important to understand that an event represent something that happened. I should take special care on not sending the event before knowing that the action was performed and persisted, otherwise we could provoke a lot of inconsistencies in the system (for example sending an email to the user that was not persisted in the DB or even that had some error in a field and the validation did not success). 

In our example EventAdapter and Repository are doing nothing, so we will not show the code in this post. But again you can download the full code from GitHub

Unit Tests

We are going to test the service to ensure that the functionality does what we expect and follows the requirements. We are going to install chai and mocha for its execution ( npm install chai mocha --dev ).

In order to test the UserService in isolation we will need to mock (create predictable versions) of the dependencies. We can use tools like sinon for this task, but in this case, I am going to create very basic mocks.

We are going to test the registerUser method only as an example.

In the code bellow you will see that we need chai and chai-as-promised to create our tests.

The first test will just ensure that a registering a user sending null will throw an exception. We need to use the "rejectedWith" method because the userService returns Promises when running the registerUser method. 

const chai = require('chai');
const UserService = require('./user-service');
const chaiAsPromised = require('chai-as-promised')

chai.use(chaiAsPromised);
const expect = chai.expect;


describe('user-service', () => {
    describe('registerUser', () => {
        it('If user data is null, registerUser should throw an exception', () => {
            const userService = new UserService();
            expect(userService.registerUser(null)).to.be.rejectedWith(Error);
        });

        it('If user data is null, registerUser should throw an exception', () => {
            const userService = new UserService();
            expect(userService.registerUser(null)).to.be.rejectedWith(Error);
        });
    })
})

A more interesting test would be a happy case. We will assume that the user is correct and the repository and the eventAdapter work properly. For that we will need to create mocks of the dependencies used by UserService.

Creating two methods we will use for testing the happy case:

const doNothing = () => {};
const returnArgument = (arg) => arg;

Testing that saving the user data returns the same data plus the new state. We do not check about the new Id generated since it is responsibility of the repository to create it.

it('If user data is correct and the user can be saved and the event published the service will return the new user'async () => {
    const userData = {
        userName: 'user-name',
        password: 'some-password'
    }
            
    const userService = new UserService({ save: returnArgument }, { publish: doNothing });
    const newUser = await userService.registerUser(userData);

    expect(newUser.userName).to.equal(userData.userName);
    expect(newUser.password).to.equal(userData.password);
    expect(newUser.state).to.equal('PENDING');
 });


We can create more tests by thinking on what could occur when using the registerUser method and deciding what should be the right reaction on each case. For example, what if the user could not be saved? what if the user could be saved but the event could not be published? etc.

Final questions

We did not add any log to our service. If we wanted to use a logger module, should we inject it through the constructor? Should we just import it form the node_modules?
What about other dependencies like lodash or uuid, etc. Should we inject all the utility libraries through the constructor? Should we allow modules to import other modules directly?

Sometimes deciding what should be injected and what should be just imported is not easy. 
One possible criteria is injecting all modules that produce non deterministic behaviours and just import all the ones that produce deterministic responses. It would mean that all modules executing I/O operations, generating uuid or random values, etc. should be injected.

The logger for example, is we assume that it writes only to console.log, we could assume it as deterministic (almost) but if it can write to files or send events, then it would be clearly non deterministic and therefore it should be injected to all modules using it.

Ports and adapters architecture and functional programming (part 1)

Last week I was seeing a video about functional architecture and how it could help projects to be more maintainable:

Functional architecture - The pits of success - Mark Seemann


Although I did not agree on some of the statements and some of the examples on C#, I got curious (even more) about the topic. 

In my project we are starting to use RabbitMQ as message broker for our business events, both for services communication and for analytics.  For example, we want to make our logger to publish events in case of a message that is above some configured threshold (error by default).

When explaining these ideas to my team, one of my team members was worried about it, since the logger that was already performing side effects (but more or less safe) would start emitting events. He proposed to detach the functionality from the side effects. So again, functional programming was knocking to my door. 

Functional Programming

I started reading about functional programming and found some posts very interesting about applying functional programming in javascript. I am not an expert on functional programming but there are some basic concepts that are key to make software that is testable and less prone to errors. Before learning more about the complex concepts (monads, functors, etc) I want to understand the basic ones and try to apply them to a vanilla javascript project. 

Side Effects

A side effect is any change or any action performed out of the boundaries of a given function. By definition modifying a global variable, writing to the database, sending an event through an event bus, writing to a log file, changing the DOM, etc. is a side effect. Any of those things could produce a failure or an unexpected change and therefore any function performing any of those actions becomes non-deterministic and more difficult to test (more on this to come). 


Pure vs Impure functions

A pure function is a deterministic function that will always return the same value when getting the same input and that does not perform any side effects. So basically the pure function gets some values, performs some calculations and return a result: always the same one given the same parameters.

As you might suppose then, an impure function is any function that has performs side effects, so it is not deterministic. Its output, success or failure, etc would depend on external factors.

As a result, testing a pure function is easy and testing an impure function not so much; depending on the number or nature of the side effects they can become very complex to test.



Ports and adapters architecture

Ports and adapters or Hexagonal architecture tries so separate the core of the application from all the external elements it needs to connect to. An application will need to access to the Database, send emails, send events through a message broker, receive http requests, write log messages to files, etc.



It proposes four different types of elements:

- Application: the core, the functionality that models the domain behaviours and data. 

- Ports: entry (or exit) points to the application. The public interface to the application. What methods it provides for inputting information and what methods it needs to send resulting data to external services.

- Adapters: they are the elements responsible for getting or sending data to specific external elements. For example a module to access to the Database, a module to send events to an event bus, a logger to write the messages to disk, etc.

 - External elements: these are elements that are not part of out system but we need to interact with them to be able to perform the complete functionality. These are UI, Database, message brokers, etc.


Combining everything

Adding all together, it is easy to map the concepts from functional programming and from ports and adapters to have an idea on how nicely they fit and how natural it should be to create a functional application with ports and adapters architecture.

- Application shall be composed by only pure functions. It will provide the predictability and testability we need for the most important part of the system and we can reason about it more easily due to its deterministic nature.

- Adapters shall contain all the functions that will access the external systems, i.e. that will perform side effects. So the impure (non-deterministic) functions will be enclosed in the boundaries of the application.


Conclusion

The idea of joining functional programming with ports and adapters seems very reasonable and logic, but still without a clear/real example the concepts cannot be tested.

In my next posts I will create an application in three different flavours so we can materialise these ideas and find out if they really make sense. The application will be a service based on nodejs performing accesses to a database, writing log messages and sending business events to an event broker (all the external systems will be simulated). The flavours will be:

- Layered architecture: HTTP layer will call service layer that will call the repository and the event bus.

- Ports and adapters - Object Oriented: the core application will contain the domain logic and will get the adapters by dependency injections.

- Ports and adapters - Functional: the core application will contain only pure functions, the side effects will be performed by the adapters.

All will contain Unit Tests since I can foresee that in this regard we will see a lot of differences.


Do you  have any experience on working with functional programming? What are your thoughts about it?


What am I doing?

Although I am supposed to post contents about Software, Kyudo and Japan, I feel with the responsibility of writing a post about my current circumstances. My life has changed a lot, and I had no time to write as regularly as I wish in this blog.

3 months ago I made a big change in my life. I moved from Barcelona (Spain) to Tokyo (Japan) and I changed from a big-huge company (Roche Diagnostics) to a start-up (Libra Inc.) and changed Microsoft-based project (C#, TFS, Visual Studio) to a Javascript-based project (Typescript, nodejs, Jenkins, etc) and changed from Software Architect to Web developer.



Why I did that? 

In first place because of the experience. Letting aside the cultural change, language and the distance, I thought that my new position/company could make me grow as a professional. The pace in a start-up exhausting. The main goal is the time to market of new features based on feedback gathered from users. So the development-release cycle is much faster. Also I wanted to work with the technologies like Javascript now that it became so popular and there are so many options (when I started working with JS, there was no such a thing as jQuery around).

What I am doing now?

I am learning or updating my knowledge about the technologies being used. But also I am providing my experience in Test automation and Continuous Integration to the company, so we are starting to create pipelines based on Jenkins that some day (soon) will lead us to Continuous Deployment, or at least will make us able to decide if we want to do that or not.
I am also helping setting up agile methodologies (Scrum in this case) so we have a defined and predictable and measurable release cycle.

What else?

I am also starting a project with an ex-coworker I admire a lot. It is a mobile application developed using Xamarin and, although it is not 100% decided, we most likely will use nodejs backend in Azure using docker containers for running the different services.

Summary

I am in a moment of change and learning professionally as well as personally speaking.
I hope this phase will make a better engineer and more mature person.


I promise that my next post will be related to software. If you want me to post about any specific topic of the ones I mentioned in this post, just let me know.

Thanks for reading!

How to become a good developer

Some days ago, someone in my company asked me a fair-but-not-easy-to-answer question: 

How do you think I can become a good software developer ?

Without much time to prepare to this question my first answer was about trying to build a real project. You can read and follow a lot of tutorials (of course they are important) but you will not really learn until you put them into practise in a real project. 

If your company does not use a given technology you want to learn, just start your own project using it. It does not need to be successful in terms of business, but needs to be real. 

After this quick answer, I started to think on when I started to feel like I was becoming a mature software developer. This is the outcome of my thoughts:


Understand that the code you write is for others to be read

 This is something very basic, but sometimes we forget. The code we are writing today, will be read and maintained by others or by ourselves in the future. Also the effort a company (or a group of developers) takes for developing an application is very small compared to the effort they will invest in maintaining it (solving issues, adding features, improving some areas, etc.).

Some basic rules for accomplishing this are:

  • Do not relay on comments to explain the code. Code needs to be self explanatory, so by reading the code everyone needs to understand WHAT is being done. Comments should be used only when the WHY of what we are doing is not clear, or to explain lines of code that might look like potential bugs to others.
  • We all like to write perfect algorithms and super-optimal code, but it usually makes code difficult to read. Do not write too much magic in your code, and if you need it, enclose it in methods with a clear naming.
  • Perform code reviews frequently (during every push for example). The reviewer should understand the code without many explanations from the developer that wrote the code, so make sure that variables and method names are clear.
  • Use an external tool for validating the code style. There are plenty of them, and they ensure that the code is uniform throughout the project.

Decide clear naming for methods and classes

This is related to the previous aspect, but it has special importance. If you think of how to name a method or a class you are deciding its responsibility and its boundaries, so it will help to design a modular system. 

The name of the class, method etc. need to make clear to everyone what it holds. For this to be true, we need to ensure that the contents of the class or the behaviour of the method, its inputs and its outputs match the name you choose. If they not, or if they change overtime, no not be afraid of changing the name to be accurate.

Examples of bad names include the typical *Helper. This name states who is helping to, but it does not helps to understand what this class holds. For example StringHelper can hold validation methods, formatting methods, constants, localisation, etc.

Learn about Unit Testing and keep it in mind when developing

When unit testing classes and methods, you need to think about the boundaries of those classes and the responsibilities and of method. 

This includes what output needs to deliver to the caller, how to react to a given input and will arise lots of “what if” questions that will make the code very robust.

It will also help you thinking on the responsibilities this class needs to have and what responsibilities should be delegated to a different one, creating the dependency map in an evolutionary way and making the dependency injection something natural.

Conclusions

As you can see my main concerns about good developers are not about technical skills but about clarity in the code and structured projects.

If all this is accomplished a team can maintain the code easily, the can look for better algorithms for complex calculations areas in the code, or look for third party libraries that can fulfil some of the dependencies we might encounter. 

Of course we can become experts on some technologies or some tactics, or infrastructures, but first we need to change or mindset and work for others. Not only for our customers, but also for other developers (providing easy-to-read code) and projects in our company (offering ways of reusing our code through libraries) or even 3rd party companies (by offering APIs).


What would you recommend so that junior developers can become good software engineers? 



Spring Cloud - Introduction

I am starting to learn about the Spring Cloud framework and the Netflix OSS. I will not add much information since I am newbie here, but I just wanted to share a video (well actually a list of videos) that show its capabilities i.e. how easy is building distributed applications using the framework.

Of course the easiness has a price, since you really get bounded to the framework. But if you want to be highly productive and you care about the business and not so much about controlling the infrastructure at low level, this could be a perfect way to go. 

You can see the first video with the introduction by Matt Stine here:


You can see the whole playlist (7 videos) clicking here


Oyaji no senaka - おやじの背中

In order not to lose my level of japanese "listening", even in times when I have no time for studying, I try to hear as much japanese as possible. Of course I cannot meet japanese friends every day, so I try to search for more available resources ;).

In anime the characters tend to speak too casual or using dirty language, so I try avoiding them. In the other hand the news, movies or some series (doramas) use real japanese with different degrees of formality. They are also a good source to understand the culture and behavior of people in japan.

In a previous post, I showed the NHK Easy news that contain real news explained in a easy way for learning japanese.

This time I want to introduce you "Oyaji no senaka". It is a 10-episode series that explain different stories (each episode is independent) about the relation of a father and their children. 


It is a very important topic since in Japan the father is often seen as the one working and taking money home, but not participating as a real family member. Fortunately this is changing a lot nowadays. 

I found this serie very interesting for both practising japanese and understanding japanese society.

I hope you like it and find it useful :)


Event sourcing: introduction by Greg Young

Some days ago I watched a video about event sourcing and cqrs where Greg Young was introducing the concepts in a very nice way.

I remember once in a project where we had to record every action that every user was performing in the system, so we had the usual database with the current state of the system that included a historical table with all the actions performed.

Event sourcing is about assuming that all the information that a system holds is based on the events that produced changes in that system. So they worth being stored. Instead of focusing on the current state of the data, it focus in the process that lead to the current state. For regulated enviroments it is a must to have all this events stored so it can naturally fall into the event sourcing pattern, but many other business can take a lot of benefit on using event sourcing.

This way of working is not new. Banks, lawyers, human resources, etc. have been working in this way since long time ago (before information was managed by something called software). The current state of your account is just the result of executing all the events/actions that affected it. No one would like to know their savings by having a table with the current state and not all the movements on the account. 

It has a lot of advantatges, like being able to go back in time and see how the system was in a given point of time, or calculate the evolution of a system, and also to be able to analyze the data in a deeper way than just having the current state. So for business intelligence reports and dashboards it is very useful.

It has also some ugly disadvantatges. Since you focus on storing the changes that are made to your system, you need to execute all the events in order to calculate the current state of the system. It is not very performant if you need to show this information to your users in a reasonable time. To minimize this you can create snapshots, so you only need to execute the events from the latest snapshot to the end, but still it would be too expensive to query data to show it on the screens. That is why event sourcing cannot work without CQRS.

I will talk about CQRS in a new post, but there is also a small introduction to it in the attached video.If you want to know more about event sourcing and to learn from someone more acknowledgeable, here you have the video:

Greg Young - CQRS and Event Sourcing - Code on the Beach 2014



Are you using event sourcing in your companies? Did you find any problems difficult to solve?

Self-Hosted OWIN - Hello World!

We are going to create a new self-hosted website using owin.

0) Requirements;

  • Visual Studio 201x (I am using the VS 2015) in "Run as Administrator" mode (required to open ports).
  • Internet connection (assumed that you have it since you are reading this post).

1) Create a new Console application:



2) Install the Owin packages:

Open the Package Manager Console by clicking the Tools menu and going to Tools/Nuget Package Manager/Package Manager Console

Execute the following command:
Install-Package Microsoft.Owin.SelfHost

It will install the Owin libraries to reference them in your project.

3) Configure the application to host the web:

Add the following content to the Program.cs class:
On top  of usings:
using Microsoft.Owin;
using Microsoft.Owin.Hosting;

Above the namespace declaration:
[assembly: OwinStartup(typeof(Owin.HelloWorld.Startup))]

Add the Startup class :
    public class Startup
    {
        public void Configuration(IAppBuilder app)
        {
            app.Run(context =>
            {
                context.Response.ContentType = "text/plain";
                return context.Response.WriteAsync("Hello, world.");
            });
        }
    }

Add the content to the body of the Main method:
        static void Main(string[] args)
        {
            using (WebApp.Start("http://localhost:9000"))
            {
                Console.WriteLine("Press [enter] to quit...");
                Console.ReadLine();
            }
        }

4) Check it!

Press F5 to execute the Console Application in Debug mode.

5) Further information: