Internals of RESP - Redis Serialization Protocol

I am Amit Shekhar, I have taught and mentored many developers, and their efforts landed them high-paying tech jobs, helped many tech companies in solving their unique problems, and created many open-source libraries being used by top companies. I am passionate about sharing knowledge through open-source, blogs, and videos.

In this blog, we are going to learn about the internals of the Redis Serialization Protocol(RESP).

Before jumping into the internals of RESP, we should know what exactly the term protocol stands for.

The term protocol stands for the language used between a server and a client for communication during networking.

Here, the language is a set of rules(specifications) that determine how data is transmitted between a server and a client.

Both the client and the server should understand the set of defined rules to encode and decode the information that they need. The client should encode the data before sending it to the server, the server should be able to decode the data and take action. Similarly, the encoded response data that the server is sending to the client, the client should be able to decode and use it.

So, we should have two implementations for any protocol as follows:

Server-Side Implementation: To accept any request from the client, decode the data, process it, and send back the encoded response to the client.
Client-Side Implementation: To make any request to the server with the encoded data, and then decode the response coming back from the server.

Inside the implementation, we majorly write the code to encode and decode the data as per the defined set of rules in the protocol.

Now, we know the term protocol. It's time to learn about RESP.

Redis Serialization Protocol(RESP) is a text-based serialization protocol designed for Redis but can be used with other applications also.

As we know that we should have two implementations for any protocol, one for the server, and one for the client, for the Redis, we will have the following:

Redis-Server: People have written this implementation in every possible widely used language.
Redis-Client: Similarly, the implementation of the Redis-Client side is also present in almost all the possible widely used languages.

Let me show you some pseudo code for using the Redis-Server and Redis-Cleint.

Starting the Redis-Server:

redisServer = RedisServer()
redisServer.start("127.0.0.1:6379")

This is how the Redis-Server gets started to accept any request from the Redis-Client.

Using the Redis-Client:

redisClient = RedisClient()
redisClient.connectWithRedisServer("127.0.0.1:6379")

// This is how we can set any key-value pair.
redisClient.set("MyKey", "MyValue")

// This is how we can get the value for any key.
redisClient.get("MyKey")

In this process, the Redis-Client will encode the information as a text and send it to the Redis-Server, then the Redis-Server should be able to decode and take action.

Similarly, when the Redis-Server sends back the response, it encodes the response data as a text, and the Redis-Client should be able to decode and use it.

Text data transfer happens through the RESP that we are going to learn it. The RESP makes it very efficient to send the data.

As we are trying to understand this RESP which is a protocol, there must be some specifications.

Specifications of RESP:

Whatever we exchange as a request or response between the client and the server is called RESP.

The RESP that gets transferred between the client and the server looks like the below:

+PING\r\n

Let's break it down to understand.

{Prefix}{Data}{CRLF}

+PING\r\n

Prefix: +
Data: PING
CRLF: \r\n

Prefix: In the RESP, the prefix is the first byte of the stream that represents the data types of the RESP. Here it is + and it represents Simple Strings.

As the RESP is a text-based protocol, it supports the following data types:

Simple Strings: The prefix is +.
Bulk Strings: The prefix is $.
Integers: The prefix is :.
Arrays: The prefix is *.
Errors: The prefix is -.

Do not worry, we will learn about all of the above data types of RESP in detail later in this blog.

Data: In the RESP, after the prefix, we have the data. In the example above, as the prefix gave us the information that the data type is Simple Strings, we know that our data is a type of Simple Strings. With all the given information including the below-explained CRLF, we can extract the data as "PING".

CRLF: It stands for the special characters Carriage Return (\r) and Line Feed (\n). Carriage Return signals the end of a line, whereas Line Feed signals a new line. Usually, the purpose of the CRLF combination is to signal where an object in a text stream ends or begins.

Now, let's understand the various data types and how the data will get parsed.

Simple Strings

The prefix is +. It is used for strings such as "OK", "PING", and "PONG".

Suppose we are sending the following.

+PING\r\n

By parsing the first byte, we know that the data type is Simple Strings. Then, we can read the String until it gets the CRLF.

So, we get the parsed value which is the actual string as "PING".

Bulk Strings

The prefix is $. Bulk strings are different from Simple Strings in that they can contain anything - newlines, control characters, or even valid RESP.

Bulk Strings are used in order to represent a single binary-safe string up to 512 MB in length.

Suppose we are sending the following.

$12\r\nAmit Shekhar\r\n

Let's break it down to understand.

$12\r\nAmit Shekhar\r\n

{Prefix}{Length}{CRLF}{Data}{CRLF}

First, by reading the first byte, we know that it is of type Bulk Strings. So, we will know that the next bytes until the next CRLF is the length of the data that we have to read.

So, we get 12 as the length of the actual string data.

Finally, we can just go and read the next 12 bytes and get the actual string data as "Amit Shekhar".

Also, we can represent a Null value using a special variation in this Bulk Strings type.

$-1\r\n

If we get -1 as the length of the bulk strings, we should infer it as a null value.

Integers

The prefix is :. It is used for integers.

:10\r\n

Let's break it down to understand.

:10\r\n

{Prefix}{Data}{CRLF}

It is very simple to understand and can be simply parsed to get the actual integer data as 10.

Arrays

The prefix is *. It is used for arrays. The arrays are actually the array of the above RESP types only. Let's understand it by an example.

*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$7\r\nmyvalue\r\n

Let's break it down to understand.

*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$7\r\nmyvalue\r\n

{Prefix}{Size}{CRLF}{Bulk Strings}{Bulk Strings}{Bulk Strings}

Prefix: *
Size: 3. It denotes that we are going to have 3 of the above RESP types which we learned. In this example, we have all three RESPs as Bulk Strings.

And, we know how to read and parse the Bulk Strings.

After parsing, we will have the following:

SET
mykey
myvalue

From here, we can infer that the client is asking to set the key-value pair.

Similarly, this array type can be used for getting the value corresponding to the key.

*2\r\n$3\r\nGET\r\n$5\r\nmykey\r\n

Let's break it down to understand.

*2\r\n$3\r\nGET\r\n$5\r\nmykey\r\n

{Prefix}{Size}{CRLF}{Bulk Strings}{Bulk Strings}

After parsing, we will have the following:

GET
mykey

From here, we can infer that the client is asking to get the value corresponding to the key "mykey".

Also, we can represent a Null array using a special variation in this Bulk Arrays type.

*-1\r\n

If we get -1 as the length of the arrays, we should infer it as a null array.

Errors

The prefix is -.

-ERR unknown command "GETT"\r\n

Let's break it down to understand.

-ERR unknown command "GETT"\r\n

{Prefix}{Data}{CRLF}

It is also very simple to understand.

It can be easily derived that the error message is as follow:

ERR unknown command "GETT"

This was all about how it parses the different data types.

Similar to HTTP, a Redis client connects with the Redis server using the TCP connection only.

Then, the question arises: Why not use HTTP in Redis?

Answer: We can use HTTP but for the particular use case that we have in Redis, where we need to transfer the text-based data between the client and the server, RESP is used because of the following:

High performance: Fast to parse simple text data.
Human readable: Although the performance of RESP is comparable to the binary protocol(not human-readable), it is human-readable.
Simple Implementation: You can quickly write the implementation in any language.

That's it for now.

Thanks

Amit Shekhar

You can connect with me on:

Read all of my high-quality blogs here.

Simple Strings

Bulk Strings

Integers

Arrays

Errors

Tags