Why build a database?

Most people these days use a database. They’re pretty much the centerpoint of every web application. Almost everyone knows how to use them, which they’re made easy to use these days. You learn the most popular ORM for your language and that’s good enough for your data needs.

For most people that’s enough for what they need to handle, but when you get into a higher scale, either be it large datasets, very high throughput, or complex queries, it becomes obvious that you need to dig deeper.

So why spend a bunch of time learning this now? You can just learn it when you need to, if you need to.

Well, because you might need to at 6pm, before going to dinner on the first day of your holiday. (Yes I’m speaking from experience), I was at a tech conference when I received a call from the CEO of my company: “Everything is down, nothing is working”. After a quick look, the error was coming from redis, which we used in almost every aspect of our app.

Redis::CommandError: MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

I hadn’t set up vm.overcommit = 1 on the server. This meant that redis couldn’t do a background save since the system wouldn’t give it enough memory and failed even though it still had 10GB free memory.

It was an interesting lesson, but I would’ve preferred not learning it the hard way. The more you know, the more you can avoid disaster.

I’m not going to assume any knowledge of how databases work, so let’s start with the basic building blocks.

We’ll start with something that acts a bit like Redis. Actually more simple than redis: Only the GET, SET and DEL operations of redis.

Let’s look at what we need:

  • A long living process
  • It needs to accept network connections
  • Data that every connection can access
  • A data structure to store this effectively
  • A protocol for getting, setting and deleting data

Let’s build it!

NOTE: I will only paste snippets, you can consult this page to get the full version

The process

Let’s keep our promise of simplicity and go with the simplest way to make an API over the network: an HTTP API.

func main() {
	fmt.Printf("Listening on %s\n", listenAddress)
	log.Fatal(http.ListenAndServe(listenAddress, nil))
}

This gives us our long running process and our ability to accept network connections. It doesn’t do much, but we’ll get to that in a bit.

The data

It’s pretty clear what needs to go here. A map (or hash, hashmap) is very commonly used for key/value storage. So let’s just Go’s internal map data structure.

var Data map[string]string

func main() {
    Data = make(map[string]string)
    ...
}

The API

This is the most complicated part of the whole database at this point, and it’s still pretty trivial.

In this case, I decided to go with the REST actions for the API (except a post can overwrite instead of only creating).

func DBMethod(w http.ResponseWriter, req *http.Request) {
	key := strings.TrimPrefix(req.RequestURI, "/db/")

	if key == "" {
		NotFoundError(w)
		return
	}

	switch req.Method {
	case "GET":
		if value, ok := Data[key]; ok {
			io.WriteString(w, value)
		} else {
			NotFoundError(w)
		}
	case "POST", "PUT", "PATCH":
		body, err := ioutil.ReadAll(req.Body)
		if err != nil {
			ServerError(w, err)
			log.Println(err)
			return
		}
		Data[key] = string(body)

		w.WriteHeader(200)
	case "DELETE":
		delete(Data, key)
		w.WriteHeader(200)
	}
}

func main() {
    ...
	http.HandleFunc("/db/", DBMethod)
    ...
}

You can test it out with a few curl commands.

curl -X POST http://localhost:5050/db/potato -d "Some value!"

Will store the value in the potato key and you can fetch it right back with

curl http://localhost:5050/db/potato

We’re done!

Congrats! You made a simple database.

It has a lot of issues and isn’t very useful for now. You can share data across processes, but for starters, all your data will be gone if you shut it down. That’s not a very useful database.

We’ll handle durability in the next post and if you’d like, try to figure out what else is wrong with it and mention it in the comments below.