I thought to write a set of articles related to software architecture (also as a way to somehow document my thoughts/knowledge on various related topics). In this article I will cover the topic of cohesion.
Wikipedia defines cohesion as “the degree to which the elements inside a module belong together”, but the concept is applicable not only to modules.
Depending on their language of choice, when software engineers/architects talk about cohesion they may refer to different aspects of a language, e.g. in an object-orientated language they may talk about classes/modules while in a functional/sequential language may refer to the cohesion of a function or routine. In any case, no matter if is a function or a class or a module, with cohesion we refer to the degree to which the inner elements of a unit (function or class or module) belong together. I have also heard people referring to the same characteristic as “strongness”.
For the sake of keeping the article short I will focus on classes in an object orientated language (I use Java in my examples).
Classes can include various methods/fields who attribute to its cohesion, let’s see some types of cohesion a class can have.
The worst type of cohesion is coincidental, a class that have some random methods in it. For example the following Utils class that has helper methods for Strings and Dates completely unrelated to each other.
class Utils {
public static boolean hasStringNumber(String str){
return str.matches("\\d+");
}
public static Date getCurrentDate(){
return new Date();
}
...
}
Another type of cohesion is logical cohesion, a class that has methods that logically may belong together but they do a completely different thing, for example a Parser class that is parsing both JSON and XML files
class Parse {
public static void parseXmlFile(String filePath){
...
}
public static void parseJsonFile(String filePath){
...
}
...
}
Temporal cohesion is when things are related to each other only by “time”, for example a class that initializes a bunch of things at the start-up of an application.
class GameInitializer {
public static void initializeScoreBoard(){
...
}
public static void initializePlayers(){
...
}
...
}
There are a few more types of cohesion, Sequential cohesion is when the outputs of a method become the input of another one. Communicational cohesion are when the methods can operate on the same data and contribute somehow to the general output, e.g. mapping over an array and transforming it.
The strongest form of cohesion is functional cohesion, when all the methods of a class are related to each other and essential to how the class is functioning.
Although there are various types of cohesion and various flavors of definition about it I found the quote from Larry Constantine the most clear way of communicating the concept.
Attempting to divide a cohesive module would only result in increased coupling and decreased readability. -Larry Constantine
Larry is mainly pointing to the absence of the cohesion rather than the cohesion itself.
Despite of the subjectiveness of the attribute there are a few metrics and techniques to measure cohesion, one of them is the Chidamber and Kemerer Lack of Cohesion in Methods (LCOM) metric which can basically summarized by the following equation:
P increases by one for any method that doesn’t access a particular shared field and Q decreases by one for methods that do share a particular shared field.
Consider the following class with private fields a and b. Some methods only access a, some others only access b. The amount of methods that not shared via the private fields are high, so the class has a high LCOM score, indicating that it has high lack of cohesion.
class DummyClass {
private double a;
private double b;
public DummyClass(double a, double b){
this.a = a;
this.b = b;
}
public static void doSomethingWithA(){
...
}
public static void doSomethingElseWithA(){
...
}
public static void doSomethingElseWithA(double c){
...
}
public static void doSomethingWithB(){
...
}
public static void doSomethingWithB(double c){
...
}
}
Usually classes with cohesion problem is the utility classes (or service classes, but that is a different topic, as to what a service class is differs per framework many times). LCOM is a useful tool to guide us on certain refactorings but as any metric is not perfect, it can find structural lack of cohesion but it cannot find logical lack of cohesion, it can’t determine if methods fit logically to be together in the same class.