Blog

My first experiment writing code in Scala

(This post was written on Nov 17, 2016. For some reason I forgot to publish it at that time).

For the past few days, I have been assigned a small programming task which should be done in Scala. This is part of a hiring process and I have been asked to implement the server side of a tic-tac-toe game in Scala providing a REST API.

For the past few days, I have been assigned a small programming task which should be done in Scala. Now, this is the first time I am writing code in Scala language ever. So although I am not a professional Scala developer yet, I thought I should write down about my experience here.

The Good

I really like the language. It is simple and straight-forward. The principle of least astonishment was obeyed and I had very little surprises when I wanted to see how something is done, which is really good.

One of the positive points was the fact that there is no separation between primitive and Object-based data types. So we have “Int” and not “int” + “Integer” (Like Java). This helps a lot.

I also liked the way objects are defined: Like a function which accepts a set of inputs:

class Game(val player1: String, val player2: String) { ... }

Also the `var` and `val` separation is really encouraging the developer to make the best decision about the immutability of the data he is working with. It makes you think in a new way.

In Scala, you can use Java libraries easily without any pain. That gives you access to thousands of valuable frameworks and modules.

The Bad

Well, this section is mostly personal preference.

In Scala (unlike C and Java and similar languages), you should obey “variable name : type” rule which is not normal for me, considering the fact that I have worked for many years in C-like languages which have: “type variable_name” notation.

The “=” notation when defining a function which is returning a value, does not seem natural to me. Usually, you use “=” when you want to assign a value to a variable but the functions are not variables!

Same for the “<-” notation which is used to iterate over a collection.

Also I noticed there is no “break/continue” keywords in Scala. Well, it is possible to “simulate” them using a set of built-in keywords and exceptions, but seriously! exceptions are not invented to break out of a for loop! You should not use them to control program execution.

In defense of deterministic password managers

tldr; Although they have some shortcomings, deterministic password management is a great idea with a lot of benefits!

Today I read this post which was trying to convince people to switch to stateful password managers which store all passwords in some central data storage locally or on the cloud. I have to admit that the article had some valid points, but generally, it did poorly to convince me that deterministic password managers are bad.

First let me explain what a non-deterministic password manager is.

Normally, when you register in a website, you use a password manager to keep track of all of your passwords. So after signing up for a new email account (e.g. mahdix@gmail.com), you will store email id and it’s password (which is chosen by you), in the password manager’s database.

password-manager
How password manager works? [Image taken from Zoho.com website]
Obviously in this scheme, the central database becomes the most important component which needs extra protection. Where will this database be stored? Either in the service provider’s servers which is out of your control, or locally on your laptop or mobile phone. So if there is a data breach or your mobile phone is infected with a virus, your passwords can be in wrong hands! Of course there are a number of security measures to protect confidentiality of these data but as long as you have no control over them, there is a risk.

What is the solution? There has recently been a new way to manage password which I call deterministic password management (or DPM for short). This means that there is no central password database. Everything is inside your head! There is no need to have a cloud-based server to store your passwords. Generally, you have a basic passphrase which I call the “master password”. All other passwords are generated uniquely using a formula based on “master password” and some other public information. As a result, you don’t need to memorize 100 passwords or save them anywhere! Just memorize the master password and the reset can be automated.

For example for website “gmail.com” and username “mahdix” the password will be calculated using some formula like this:

password = Hash(master_password, 'gmail.com', 'mahdix')

password-managers-no-storage

The most important advantage of this scheme is that there is no need to store sensitive data anywhere! I would like to stress the term “sensitive” because later I will refer to this feature. The other advantage is that your password will all be strong passwords because they are generated using a hash function which causes the result to be semi-random. There will no longer be passwords like “letmein” or “admin” in this scheme.

What are flaws of Deterministic Password Managers?

Now, back to the initial point! The blog post discusses some reasons why this type of password management is not secure and as the author claims: “has fatal flaws”.

There are four points discussed in the article:

  1. DPMs cannot accommodate varying password policies without keeping state: I disagree with this one. As long as the hash function is complex enough (which normally is), the generated password can be accoreding to every sane password policy. The examples that author provides: “password must be at least 8 characters long” or “password must be lowercase only” either are by default satisfied or don’t make sense! I have never seen a website which insists password must be lowercase.
  2. DPMs cannot handle recovation of exposed passwords without keeping state: I agree with this one but see point below.
  3. DPMs cannot store existing secrets: Agreed but it can be fixed without loosing security. See the point below.
  4. Exposed master password alone exposes all of your site passwords: I disagree with this point because it is not an issue of DPMs per-se. If your email account’s password is exposed, it will probably have similar effect! That is how email works. This is the natural side-effect of using email.

State vs Sensitive data

I would like to differentiate between state of a password manager with sensitive data (e.g. passwords). Of course for a traditional password manager like 1Password, these are the same: State of the password manager is the collection of stored password. So it is sensitive data by definition.

But for DPMs this is not the case. You can have some basic and minimal state which is not sensitive. What does that mean? It means that this state can be publicly exposed and available without loosing security of any of your accounts. The only important security feature of state in this sense is “Integrity”. They should not be altered without your consent. If this feature is satisfied then there is no problem for a DPM to have publicly available state.

For example you can store this state in your Google Drive or Dropbox or even on GitHub! Even if you want to maintain higher level of security you can encrypt those information using the “master password”, but that won’t be required.

Now, how can using state address the two problems stated above: Password revocation and storing existing secrets?

How to store existing secrets in a DPM?

Suppose that you already have an account with password: EP. Now you want to switch to DPM. Can you keep this password? It is possible using State.

Suppose that DLM expects the password of this account to be CP where CP is result of the Hash function formula of the system. It is enough to store “XP=CP^EP” in the State storage of the system (^ represents XOR operation). Now, it is impossible to retain any information about “EP” with having only “XP“. But user can easily calculate “EP” by calculating “CP” and then: “EP=CP^XP” (You can use any other function with similar property).

Using this method, a user can store his existing password without revealing any sensitive information. Same approach can be used to renew a password. Just keep a public parameter (like counter), in the State data. It won’t expose anything sensitive and can be used to renew the password. Just increment the counter and the password will be:

Password = Hash(master_password, website, user_id, counter)

This can easily be incorporated to any DLM and provide features to re-new password or add pre-existing passwords.

Of course there will always be a trade-off . So for above mentioned advantages, there are some drawbacks too. Most notably, if your master password is exposed or stolen, all of your accounts will be at risk. This is the natural consequence of using “Master Password”.

Why do I prefer statically typed programming languages?

tldr;  Dynamic typing sucks!

I have been working with a dynamic programming language, Perl, for the past 1.5 years. I also have worked with Java and C# languages during my career and at this point, I can really say that static typing makes the life of developers much easier.

So let me first explain the difference between these two terms and then I will talk about my reasons.

Dynamic Typing

Dynamically typed programming languages, are languages where the type of the data is not specified in the source code, but when the program is being executed. So the developer is not responsible for thinking about whether some variable should hold a number, string or an instance of DateTime object. He just uses a variable name. When the code is being executed, the runtime environment (interpreter, JIT compiler, …) will deduce the type which is most appropriate for that variable.

For example, the developer can write something like this, in Perl:

my $birth_date = Some::Module::get_birth_date();
$birth_date->plus_time_interval('18y');
my $day_of_week = $birth_date->day_of_week;

In the above code, type of ‘$birth_date‘ and ‘$day_of_week‘ are not explicitly specified. When someone reads this piece of code, it clearly implies that `$day_of_week` should be an integer number, but in a large codebase, this is not always possible when you look at the code. So, for example, you really don’t know the type of `$birth_date` unless you read the source code of the `get_birth_date` function, or you can’t know if `plus_time_interval‘ expects an integer or string or something else unless you have used that function before.

Now assume someone is responsible for maintaining or fixing a bug in a 100K SLOC codebase. When investigating the flow of execution of the code, it sometimes becomes really hard and complicated to find out the real type of a variable.

Examples of dynamically typed programming languages are: Perl, Python and Ruby and Javascript.

Static typing

On the other hand, in statically typed programming languages, the developer is responsible to explicitly state the expected type of all of the variables or function outputs. This will limit the scope of valid operations on a variable and help the compiler (or possibly the interpreter) to check for invalid operations. For example, when the developer defines an integer variable, he cannot treat that variable as a string or assign a string literal to it because the compiler will catch such errors. An example of statically typed code:

int x = 12;
int y = x+y;
Customer c = find_customer("id", y);
c.name = "Mahdi";
c.save();

In the above code, type of the variables `x`, `y` and the input/output of the function `find_customer` are all specified statically within the source code. You cannot pass an integer parameter as the first argument of `find_customer` and compile the code because you will get a compiler error.

So writing a statically typed code seems harder because the developer should write more code and keep track of variable types in his mind when writing code. Right? Yes! That is right if and only if you want to write a few lines of code and keep that for a relatively short period of time and ignore all of that afterward. But if you want to keep a relatively large code base over a longer period of time (which is the case most of the time, in thr software industry) this burden proves to have some benefits.

But if you want to keep a relatively large code base over a longer period of time (which is the case most of the time, in the software industry) this burden proves to have some benefits.

Examples of statically typed programming languages include C, C++, Java and C#.

Why static typing?

As I pointed out in the previous section, unless you want to write a few lines of code, keep it for a short period of time and discard everything afterward, the statically typed programming language have a clear advantage over dynamically typed ones.

The point is, the source code of a software is written once, but it will be read a lot of times (for review, bug fix, refactoring, migrations, optimizations, …). So the basic idea in static typing is that you write some more code and take care of some more things in the code when writing the code, and in exchange, you (or others) will have a much easier life when they want to read it. Another clear benefit is that compiler can help you find out lots of possible errors before releasing your code to the production environment.

I have had numerous cases of “Can't locate object method 'xyz' via package "ABC::DEF"` error in production server (Because in the code, someone was trying to call a method on a variable which was expected to be of type T, but at runtime, it had another type). That is because of we, human beings, make mistakes. And computers are there to help us prevent those mistakes. If we just ignore their help (And use dynamic typing), then this is what happens next :-)

But don’t get me wrong. There are cases where dynamic typing is the preferred choice when writing a software system. Generally speaking, if the code base size is relatively small (say 1-10K SLOC) with a small development team (~5 people), then dynamic typing can have benefits. Because obviously developers can write the code much faster and time-to-market will be much shorter. And this is what normally happens for startups where founders want to deliver the product/prototype as soon as possible.

I have seen some people say that we can achieve same advantages of static typing in a dynamically typed programming language by writing good tests. This is possible in theory but if you implement all of the required test cases for this purpose, then you will be doing what the computer is supposed to do! Manually doing something which can be automated. And by the way, the result code based will become much larger because of all those tests! which eliminates one of the important advantages of dynamic typing: more concise code with less boilerplate.

What is wrong with Object Oriented Programming and what is right with Functional Programming?

I have been reading numerous articles about OOP and FP and comparison of these two recently. There are lots of differences and pro/cons for each of them but one important and interesting thing I recently found out was the fact that how tangled behavior and data are in OOP. I have been writing OO code for years, but I had not found out this until I read about FP and the way it handles data/behavior.

When you write a software in an OO approach, you must relate every behavior to a single isolated data entity (or class) in the business domain of the application. For example, when writing a code to send email, you have a data entity called “Email Message” and a behavior “Send a message”. There is no way for you to define these two separately. By separately I mean being able to invoke “Send Email Message” functionality without an “Email Message” object. Of course, you need a message when you want to send one, but this is a difference in the way we organize and approach things in OOP. When writing software in OO approach, you MUST define each behavior under one and exactly one class.

Question is, what if we cannot fit a behavior in this pattern? What if it doesn’t make sense or it is confusing to attach a behavior to one and only one class?

Considering above example about sending an email message, what happens if we have other entities in the application like Sender, Recipient, and Storage. Then where should I put “Send email message” functionality? Each one of the below candidates can be an acceptable place for putting this behavior.

  1. You can put “Send” method in “Message” class: msg1.Send();
  2. It can also be placed inside Recipient class: recipient.ReceiveMessage(msg1);
  3. It can also be part of Sender class: sender1.SendMessage(msg1);
  4. It can be part of a separate class. For example MessageBroker: MessageBroker.getInstace().SendMessage(msg1);

This is really confusing and IMHO unneeded complexity.

The advantage of Functional Programming approach is that you don’t have to bind each behavior to one class. They are completely separated. So you can define and use them separately. This type of modeling is more consistent with the way we think about software world concepts.

Of course for things which exist in the physical world, this is less of an issue (A car is the only object which can do the “Start” functionality). But for more abstract concepts which can only be found in the software world, I think FP approach makes more sense.

 

What is Apache Maven and why do we need it?

I was going to write a blog post about Spring framework but I thought maybe it’s better to start with Maven as the first step.

With an increase in the number of Java libraries and frameworks and projects, there was a need for a standard way of describing how a piece of software should be built and what are it’s dependencies. Normally a Java project is using a lot of frameworks and tools (e.g. Logging, Database, Encryption, Math, …). We need a standard (machine-readable) way of describing these dependencies and how to compile and build the project.

Using Maven enforces a specific directory structure to your project. There is a specific directory path for your source code, runtime resource files, … Another change in your project is adding a file named pom.xml which will describe dependencies and build options.

I will explain some basic concepts in the Maven world, then will proceed to create a new project based on Maven.

Artifact

An Artifact, is a JAR file which is available for use in a Java project. Artifacts can be downloaded from Maven Repository (a publicly available website), but normally this is done automatically by maven command line.

Each Artifact represents a small piece of software that can be re-used in a software project. For example, there are artifacts for logging and encryption. There are a number of Maven repositories available, but the well-known repository is the one provided by Maven itself which is available for search and browse at http://search.maven.org/. You can search for artifacts and download them. You also have a local repository which contains JAR files downloaded from remote repositories or installed locally.

Each artifact is uniquely identified by using 3 elements: group-id, artifact-id and version.

Group-id represents the name of the company or organization which has produced the artifact (e.g. org.apache.logging.log4j). Artifact-id is a unique string for the artifact (e.g. log4j) and version string (e.g. 1.2.17). When you specify your project’s dependencies, you have to write their Group-id, artifact-id and version.

You can also tell Maven to produce an artifact when compiling your project. The result will be a .jar file including metadata about the artifact. You can then install this jar file into a Maven repository and use it as a dependency in other projects.

Build lifecycle

A build lifecyle is a set of steps (phases) which Maven does to build the project. You can define your own lifecycle but the default lifecycle is used most of the time. Here is a list of most important phases in the default lifecycle:

  1. validate: Make sure project structure is correct and all required data are available
  2. compile: Compile source code files
  3. test: Run unit test files
  4. package: Package compiled files in a distributable format (e.g. JAR)
  5. install: Install package into the local repository.

There is also another important lifecycle named “clean“. This lifecycle will clean-up artifacts created during build lifecycle.

You can call Maven command like tool and give the name of the phase or lifecycle to execute:

mvn clean package

Above command will run clean lifecycle and then will run build lifecycle up to ‘package‘ step. The target of the operation is the current directory.

Directory structure

Maven expects a standard project structure in your project. You have to follow this structure:

Suppose the root directory of the project is named ‘myapp‘:

  • myapp/pom.xml
  • myapp/src/main/java/com/mahdix/App.java: Source code files should reside in this location and below, according to your package names.
  • myapp/src/test/java/com/mahdix/Test.java: Location for test files

You will run Maven command line utility on the root directory of the project where there is pom.xml file.

POM.xml

Here is the general structure of a pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0">
 <modelVersion>4.0.0</modelVersion>
 
 <groupId>com.mahdix</groupId>
 <artifactId>my-app</artifactId>
 <version>1.0</version>
 <packaging>jar</packaging>
 
 <name>Test Application</name>
 <url>http://www.mahdix.com</url>
 
 <dependencies>
   <dependency>
     <groupId>junit</groupId>
     <artifactId>junit</artifactId>
     <version>4.8.2</version>
   </dependency>
</dependencies>
</project>

Here is explanation of important parts:

  • modelVersion: Specifies the model version of the Maven tool we use, this should always be 4.0.0
  • groupId, artifactId, version: These tags specify information about the current project
  • dependencies: This tag lists dependencies of the project. Maven will automatically fetch, download and install required JAR files according to the dependency list provided. You just need to specify artifacts that you need.

 

Tutorial: Java Persistence API (JPA)

 

Java Persistence API or JPA is a standard specification and a set of classes which helps developers write a provider-independent persistence layer in his applications. JPA has taken best ideas from different persistence providers (e.g. Hibernate, TopLink, etc.).

The general workflow for JPA is:

  1. Define POJO classes corresponding to database entities. You annotate these classes to let JPA find them at runtime.
  2. Add persistence.xml file which explains about your POJO classes and database connection information.
  3. Write the code to work with database entities (CRUD operations) using JPA classes and JPQL language (which is similar to SQL).

Defining Entities

Entity classes are simple Java classes. These classes don’t need to inherit from a specific class. All you need to do is annotate the class, add relevant fields representing database table fields and adding getter/setter methods.

You can also annotate fields to indicate primary key or give some more information about database representation of the field.

Below is example of a student entity class:

import javax.persistence.*;

@Entity(name = "TBL_STUDENT")
public class Student {
    @Id
    @Column(name="ST_ID", nullable=false)
    @GeneratedValue
    private long id;

    @Column(name="FIRST_NAME")
    private String firstName;

    @Column(name="LAST_NAME")
    private String lastName;

    @Column(name="AVG_SCORE")
    private float avgScore;

    //getter and setter for fields go here
}

Note that we use Java annotations to represent metadata explaining more information about the entity.

  • @Column annotation is used to define database field name of the class field. You can omit this is names are the same.
  • @Entity is used to declare this class as a DB entity.
  • @Id is used to represent primary key of the entity
  • @GeneratedValue is used to declare JPA provider should automatically provide value for this field upon data insertion.

JPA has a lot of other annotations to explain the type of data that can be stored in a field, table inheritance, composite primary key, entity relationships and much more. These will be explained in a future post.

Adding persistence.xml

In this XML file, you declare some metadata which cannot be written in the code easily. This is a sample XML file:

<?xml version="1.0" encoding="UTF-8"?>
<persistence>
 <persistence-unit name="mytest" transaction-type="RESOURCE_LOCAL">
  <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
  <class>test.Student</class>
  <properties>
  <property name="javax.persistence.jdbc.driver" value="org.sqlite.JDBC" />
  <property name="javax.persistence.jdbc.url" value="jdbc:sqlite:sample.db" />
  <property name="eclipselink.ddl-generation" value="drop-and-create-tables" />
  <property name="eclipselink.ddl-generation.output-mode" value="database" />
  </properties>
 </persistence-unit>
</persistence>

The XML file structure is almost the same across different projects. I have highlighted two important parts of the file:

  • name of the persistence-unit is a unique string which is used in the code to refer to this XML file.
  • provider XML tag is used to indicate which provider will be used at runtime to talk to the database.

The properties section of the XML file contains database connection information (e.g. database driver type, database name, …).

The code

The last step is using JPA classes to get a connection to the database and working with your POJO classes and EntityManager to do CRUD operations.

The basic flow is:

  1. Getting an instance of EntityManagerFactory class through Persistence class. Here you specify the persistence-unit name from XML file.
  2. Getting an instance of EntityManager class through EntityManagerFactory.
  3. Working with EntityManager instance and it’s EntityTransaction instance to create, read, update or delete POJO classes.

Here is the basic code:

//step 1
EntityManagerFactory factory = Persistence.createEntityManagerFactory("mytest");
//step 2
EntityManager entityManager = factory.createEntityManager();
//step 3
EntityTransaction txn = entityManager.getTransaction();
txn.begin();

//step 4 - CRUD operations
...

//step 5 - finalization
txn.commit(); //you need this line if you have changed some db data
entityManager.close();
factory.close();

CRUD operations

The CRUD operations become pretty straightforward considering the fact that you just need to work with your POJO classes as a proxy to the database table rows.

  • Create: You instantiate POJO class, set field values, call persist method of EntityManager.
  • Read: You create a JPQL query and get the result as a List<POJOClass>.
  • Find: Calling find method on EntityManager and giving the primary key value of the row. The result will be an instance of a POJO class.
  • Update: First find the row, update the POJO class fields.
  • Delete: Find the row, then call EntityManager.remove method.

Below is sample code for CRUD operations:

//CREATE a new row
Student st = new Student();
st.setFirstName("mahdi");

entityManager.persist(st);

//UPDATE row
Student st = entityManager.find(Student.class, 19);
st.setLastName("Mohammadinasab");

//DELETE row
Student st = entityManager.find(Student.class, 19);
entityManager.remove(st);

//READ rows
Query q = entityManager.createQuery("SELECT s FROM Student s");
List<Student> students = q.getResultList();

Note that in the update process you don’t need to call any JPA method. Just update POJO fields and when you commit the transaction, data will be updated.

For the query which reads rows from a database, JPQL language is used. I will explain more about JPQL syntax and how to handle relationships (one-to-many and many-to-many) in a future post.

 

Cassandra Internals: Bloom filters

In a previous post I explained about SSTable and their role in the persistent storage of data in Cassandra.

Suppose that following query is received on a node (news_id is primary key):

SELECT * FROM tblNewsItems WHERE news_id = '8bacfe891fa89bfab98d9e99f9a9';

How can the node determine if this specific row exists in it’s local data? As we know we have in-memory tables (mem-table) but they are used as a caching mechanism and not the real storage. So any data lookup in these in-memory storages won’t be enough as they don’t have the whole range of data.

Another solution is to read the SSTable and check if the table contains the key that we want. Even in case of using effective search algorithms like binary search, this will be too expensive because it involves a lot of disk reads which may not be efficient enough for a system with a lot of data and a lot of read queries.

The solution is to use Bloom Filter data structure. This is an in-memory, probabilistic data structure which can help you find if an element is member of a set or no? Here the set is the Cassandra table (which is actually a set of rows) and the element is the primary key we are looking for. So we have these features:

  1. It is in-memory, so it will be fast and efficient.
  2. It is probabilistic means the result will not be guaranteed to be correct. There is a high chance that the result is correct and this is enough for us. This is the trade-off that we pay to have such an efficient in-memory data structure. As a result, if the bloom filter says a row is inside a set, there is a very low probability that this is not correct. But the other side of the result is guaranteed to be true. If it says some row ir NOT in the set, it definitely is not.

How bloom filter works

A bloom filter is a bit-map containing n bits and a set of hash functions which are expected to be independant. Let’s assume we have a bloom filter with 100 bits and 3 hash functions (f1, f2 and f3). In practice the size of bloom filter will be much larger (MB or GBs). Each of has functions will receive an arbitrary input (row primary key) and output a number between 1 to n. You can easily achieve this output by using modulo operator and common hash functions.

There are two algorithms for insertion of data into a bloom filter and checking whether something exists in the set or no. These alrogithms are pretty simple and straightforward. To insert data x into bloom filter:

  1. Apply hash functions on x. So we will have a=f1(x), b=f2(x) and c=f3(x).
  2. Set bits numbered a,b and c in the bloom filter to one (Note that they may have already been set to one but this doesn’t matter).

To check if data x exists in the bloom filter:

  1. Apply hash functions on y: So we will have d=f1(y), e=f2(y), f=f3(y).
  2. Check bits numbered d, e, and f in the bit-map of bloom filter. Are they all set to one? If so, the row probably exists in the set. If even one of those bits is zero, means that row definitely does not exist in the set.

How false positives are handled?

False positive for a bloom filter means cases where the filter indicates that a row exists in the table but it was not. Remmber that this is a probabilistic data structure so these cases may happen.

If there is a false positive, we won’t know about it until we scan the SSTable. So generally Cassandra will scan the SSTable looking for a specific row, if the bloom filter indicates the row exists in the table. If after scan completion, row is not found, this will be recorded as a false positive.

You can run ‘nodetool cfstats‘ command in a Cassandra node to view a lot of statistics about the node. One of those stats is for bloom filters which shows you the memory consumed by bloom filter and number of false positives.

 

Cassandra Consistency Levels

In this post, I will explain more about data consistency in Cassandra. In a non-distributed system, it rarely happens that some data is missing or corrupt. But in a large scale distributed database system, this will be the norm rather than the exception. So there should be mechanisms to detect, handle and fix these situations. Customizable consistency levels are one of those mechanisms.

Write Consistency

In the last post, I explained what happens when data is being written in Cassandra. But how Cassandra determines if a write operation is successful or not?

Each write operation can specify the consistency level it needs. This is a number which determines how many replicas have to send success reply to the coordinator so that the whole operation can be considered successful.

If for example this is set to ONE, with a replication factor of 3, the coordinator will ask three nodes to store the data and will return a success status upon receiving the first success response from any of these three nodes. This, in the worst case, may mean that the other two nodes have failed to write the data. This is not something which happens normally but in a large system, with a lot of nodes and a lot of data flying around, things may go wrong.

We can use the consistency level to adjust a trade-off between performance (lower consistency level = faster response time) vs. reliability (higher consistency level = prevent corrupt write).

Read Consistency

Same as what we have for write operations, we can specify the same thing when reading data. When data is written in Cassandra, it will be written to more than one node (Refer to this and this post). Now when we want to read the data back, how should we proceed? Which of those nodes should be contacted and what if some of the contacted nodes doesn’t return a response?

The Read Consistency Level determines how the coordinator node should respond to the cases where some of the nodes, don’t reply a READ request or reply too late. For example, if Read Consistency is set to ALL, this means that coordinator should wait to get a response from all replicas. This will provide the highest level of reliability but the lowest performance. You can set it to TWO or THREE so coordinator will wait for two or three nearest replicas to return a response.

Example

Above figure shows a cluster with 12 nodes and a replication factor of 3. This means that each row of data will be written to 3 nodes (R1, R2 and R3 in this figure). When a client asks the coordinator (Node 10) to read data with Consistency Level of ONE, it contacts the nearest node (R1) for the data. In the background, it will make sure R2 and R3 have the most recent data and if not, a read repair operation will be initiated.

Handling Inconsistency

If a coordinator gets different data from different replicas, which one should it pick? The answer is, Cassandra timestamps all the data so the coordinator can easily determine the most recent data which will be returned to the client. Also after this happens, the coordinator will start a read repair process in the background to make sure all replicas have up to date data.

 

How Cassandra Writes Data – Part 2

This is the second part in a two-part series about the internals of Apache Cassandra for writing data. Click here to read the first part.

Flow of execution

The process begins by a client sending a request to the coordinator containing an INSERT statement. Let’s assume we have a table named table1 according to below definition:

CREATE TABLE table1 (id int, name text, PRIMARY KET id);

And the cluster consists of five nodes, which we will call node1, ..., node5, and the replication factor is 3. The cluster will have a partitioner algorithm which given the primary key of the row, outputs a big random number. We call this random number, the identifier of the row. We assume an identifier number is a 128-bit number which means it will be between 0 and 2^128-1 (max number).  Now consider this as a range of (0, max). Upon cluster configuration, this range is divided into five equal sub-ranges (because the cluster has five nodes):

R1=(0, max/5), R2=(max/5+1, 2*max/5), ... .

Each node will have its own range. R1 will be assigned to node1, R2 to node2 etc.

All nodes in the cluster know about other nodes and their corresponding range.

Coordinator receives this CQL statement:

INSERT INTO table1(id, name) VALUES (100, 'mahdi');

First, it applies the partitioner algorithm on ‘100‘ (the primary key). The result will be a number. It then determines the corresponding range (Ri, i=1,..,5) within which the number lies. Let’s assume the result number lies within R2 range. So node2 will receive the first copy of the row. But we have a replication factor of 3 which means data needs to be stored on three different nodes. In a simple replication strategy, additional nodes will be next nodes after the original receiver of the data which are R3 and R4 in our example. So coordinator will send 3 requests to R2, R3 and R4 to store the values for the new row in their local data (There is another more complex strategy called NetworkTopologyStrategy, see here for more information).

Note that R1 and R5 know nothing about the new row but the whole cluster contains the new data and you can later query this data from any of the nodes in the cluster.

SSTable, commit log and memtable

In a node which belongs to a Cassandra Cluster, there are 3 components which are used to store data:

  • SSTable: The storage of database on the persistent disk (e.g. Hard Disk). When data is written to this storage, it is permanently persisted, but problem is, writing to this storage is expensive. That’s why we need the other two components.
  • Memtable: For each table in the database, there is a memory space allocated to store its data. It is extremely fast because it’s in memory but it is not reliable, because in case of a problem in the node, all it’s data will be cleared.
  • Commit log: This is a persistent data file written to the local storage which contains a copy of all the actions applied on the database. These data can be used to re-construct SSTable in case of a problem on the node.

When writing data to the database, the data is written on Memtable and then on Commit log. After that a successful response is sent to the requester indicating the write operation is done successfully. Note that data is not written to SStable but it’s on persistent storage (Commit log) so it is safe. Periodically, the node requests memtables to be flushed to SStable which will write all updates to SStable (the final permanent storage for the data).