Unit & Integration Test in Cassandra

| November 5, 2014 | 0 Comments

2000px-Cassandra_logo.svgCassandra is a distributed database, so it is inherit in the nature of Cassandra that you are going to have more than one node in your cluster. When we run unit test on Mysql there are a few options available and one of them is use an embedded mysql server which is probably the simplest one to gain a high level of isolation on the test. But whit Cassandra the things are a bit different: the most common options that you can find as solutions to add unit testing or integration test to a project which make use of Cassandra are:

  1. Use mocking libraries.
  2. Use an embedded cassandra server.
  3. Use a real cassandra cluster.

I will explain a little bit the different options and how we apply in our projects.

Mocking Libraries

Personally I don’t like mock database driver in general because it add certain complexity to your code and it is common error point. In the specific case of Cassandra and if you are using the datastax java driver at some point you have to mock the resultset object which is returned by the the execute method:

1
ResultSet results = session.execute("SELECT * FROM example  WHERE id = 2cc9ccb7-6221-4ccb-8387-f22b6a1b354d;");

The problem here is the ResulSet object does not implement an interface and it is developed using a private constructor so you can not extend it and override it to create a different implementation. Mock this type of objects is not straightforward, but it is still possible to do that. If you are interested there is a nice post of Christopher Baley which explain in detail how to do that. Even if you are not going to use this type of testing with Cassandra it is a general technique which it could be useful in other situations.

The development team in our company is not really big and because of that (and other aspects about level of error which can be tolerated by our clients against cost of implement and maintaining the test) we often do not include specific unit test for the database repository layer. Instead of that we include only test for our service layer and obviously part of the operations that are included in the service layer could involved request or update data into the database. By this way the repository layer is tested indirectly with a integration test and mocking the driver is not a good and reliable solution in our case.

UPDATE: from the version 2.0 of the cassandra datastax driver was released (february 2014), some of the classes has been exposed as interfaces in order to facilitate testing and one of this classes is Resulset (news about Datastax driver 2.0)

Embedded Cassandra Server

Another option could be use an embedded Cassandra server which is created and deleted on each test execution. At the moment in which we start to create our unit test for our code that access to Cassandra the most known and common option was the project Cassandra Unit created by Jeremy Sevellec. It is conceptually similar to DBUnit for relational database.

After using cassandra unit I have to say that is an excellent project which allow you to start an embedded cassandra server per unit test and load certain data using cql files which makes really easy the pre population of the data.

A very nice feature which is included in the project is the integration with spring allowing you to use Cassandra Unit together with the Spring Test Context Framework. This is the base class we use in our projects to create our unit test cases which needs access to Cassandra:

1
2
3
4
5
6
7
8
9
@RunWith(SpringJUnit4ClassRunner.class)
@TestExecutionListeners({ CassandraUnitTestExecutionListener.class })
@EmbeddedCassandra
@ContextConfiguration(classes = {
        ValidationConfig.class})
public abstract class BaseTransactionalSpringContextTestLegacy extends AbstractTransactionalJUnit4SpringContextTests {
 @Test public void xxx_xxx() { 
                   } 
}

The annotation @EmbeddedCassandra start the server. It is possible to specify a cql file to load using the annotation @CassandraDataSet but we did not use that option because at the beginning we were using column families with dynamic names something that you can not use with cql. We use a xml located in the classpath load the data, you can see an extract of the file here:

<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>
<keyspace xmlns=”http://xml.dataset.cassandraunit.org”>
<name>snpaware_test</name>
<columnFamilies>
<columnFamily>
<name>IndividualMorphsIndividuals</name>
<keyType>UTF8Type</keyType>
<comparatorType>UTF8Type</comparatorType>
<defaultColumnValueType>UTF8Type</defaultColumnValueType>
<gcGraceSeconds>0</gcGraceSeconds>
<readRepairChance>0</readRepairChance>
<row>
<key>1:2</key>
<column>
<name>1:1</name>
<value>0.75:1:15:2:1:0:175</value>
</column>

But then we face with a problem. In the development workstations the test executed correctly but in our jenkins server not. The problem was when the test are starting to be executed that the embedded cassandra server was still not completely started or the data is not completely loaded. And then the test failed because a connection error. We solve this using this code in the our setup of the test:

1
2
3
4
5
6
7
8
9
10
11
12
13
public void setUpTestCase()  {
       //we wait until the data is loaded
       //We load the data in the embedded server
        try{
             DataLoader dataLoader = new DataLoader("TestCluster", "127.0.0.1:9171");
             dataLoader.load(new ClassPathXmlDataSet("cassandra-test-dataset.xml"));
             await().atMost(180, SECONDS).until(checkCassandraIsAlive());
        } catch (Exception e) {
            LOG.error("Error starting In Memory Cassandra Database",e);
        }
 
 
    }

The dataloader class is included in the Cassandra Unit project. Probaly it is not the most elegant solution but it works for us. Everything was ok while we were using Hector as java client library. But there was a moment in which we decide to migrate our client code to use Astyanax instead of Hector. Cassandra Unit use internally Hector, so we have to spent a few days playing with the maven configuration to avoid conflicts between both java client libraries and also the embedded cassandra server of Cassandra Unit. This was the configuration which work for us:

<!– ================================================================= –>
<!– Astyanax – Cassandra client –>
<!– ================================================================= –>

<dependency>
<groupId>com.netflix.astyanax</groupId>
<artifactId>astyanax</artifactId>
<version>${astyanax.core.version}</version>
<exclusions>
<exclusion>
<artifactId>persistence-api</artifactId>
<groupId>javax.persistence</groupId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<artifactId>netty</artifactId>
<groupId>org.jboss.netty</groupId>
</exclusion>

</exclusions>
</dependency>

<!– ================================================================= –>
<!– Cassandra data unit –>
<!– ================================================================= –>
<dependency>
<groupId>org.cassandraunit</groupId>
<artifactId>cassandra-unit</artifactId>
<version>2.0.2.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>com.googlecode.concurrentlinkedhashmap</groupId>
<artifactId>concurrentlinkedhashmap-lru</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.cassandraunit</groupId>
<artifactId>cassandra-unit-spring</artifactId>
<version>2.0.2.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>1.2.0</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<artifactId>persistence-api</artifactId>
<groupId>javax.persistence</groupId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-thrift</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.github.stephenc</groupId>
<artifactId>jamm</artifactId>
<version>0.2.5</version>
<scope>test</scope>
</dependency>

One of the problems of Cassandra Unit is that you can not test failure on the replication system because the embedded server is similar to one node. But what really make us to change our test approach was the need to include code in our application to query another different version of cassandra. Then we start to use the datastax java driver which had a dependency (not any more) with the cassandra-all artifact 2.0. And then we could not include both versions of cassandra-all in the same project.

We decide at that point to try another approach of testing our code.

UPDATE: there  two other solutions similar to CassandraUnit which could be useful. These are: Stubbed Cassandra and No SQL Unit. I only have used these frameworks in demo projects so I do not have too much arguments to explain here more than the one which are covered in their own documentation.

Use a real cassandra cluster

This solution is pretty simple, use a real cluster to make the test. In our case, we populate the data needed for the test in the @Before method and we truncate all the tables in the @After method. To avoid conflicts between developers, each of us we use a different keyspace for testing. We use two types of configurations:

  • Local cluster to the developer workstation: we use the tool ccm to simulate a local cluster.
  • Remote cluster, with one keyspace per developer.

We use one or another configuration based on personal decisions of each developer. The main advantage of this method is that the unit test are totally isolated one for each other and the data is populated using cql which is sometimes easy to code than xml files. And obviously the test reproduce a situation with a high similarity with the one you are going to have in the production environment. The main disadvantage is the time length of the test are increased notably.

This is the method we are using at this moment in our projects.

————————————————————

References:

Mocking Iterable Objects Generically

Datastax Java driver version 2.0

Cassandra Unit

Hector

Astyanax

Ccm

Tags: ,

Category: Databases, Development, Java

Leave a Reply