For people who came here to read the book, you can safely skip to the first chapter of this book. If you still wish to read this, you may not understand a lot of terms mentioned here. Don't get overwhelmed, and take it as a small motivation on why "Learning Node.js is worth it".
"Node.js is slow." This is a statement you may have heard often, perhaps whispered in developer circles or debated on online forums. Node.js has been unfairly criticized for not being suitable for high-performance applications. This chapter aims to disprove that myth permanently.
The tech community often praises languages like Go or Rust for their blazing fast ™ execution, and often doubt whether JavaScript and its server-side partner, Node.js, can keep up. Common criticisms range from Node.js being single-threaded to the fundamental limitations of JavaScript itself. But what if I told you that these critiques are not just overly simplistic, but often misleading?
Why is Node.js perceived as slow? Is it the single-threaded architecture? Or perhaps the fact that it uses JavaScript - a language originally designed for client-side web development - as its backbone? These are just a couple of the many misconceptions that contribute to the underestimation of Node.js. But as you'll soon discover, Node.js can not only stand toe-to-toe with competing backend technologies but, under the right circumstances, can even perform as good as other "fast" languages.
We'll be comparing a couple of "fast" frameworks and putting them against Node.js.
Elysia claims to be 18x faster than Express.
I personally love Axum. It's one of the fastest and the most developer-friendly server side framework for Rust.
This library is always used to establish the baseline for any benchmarks. I would love to be proven wrong, but I do not believe Elysia can be 18 times faster than Express. Maybe 18x faster at "waiting for I/O".
Velocy is a HTTP library that I am building as part of this book. For context, this github repo serves as a book that I am writing - Learn Node.js the hard way. Velocy is still an unfinished product as it was started just 7 days ago. I have managed to build a performant Router
class that uses the Radix Tree data structure to manage routes and extract path params. Our current router provides enough functionality for Velocy to be able to compete in this benchmark. You can find the source of Velocy - here
The question is, how are we going to benchmark? Let's take bun
's approach. If you're not aware, bun
uses uWebsockets under the hood for their web server. Let us see what the creator of uWebsockets
has to say about benchmarking:
Why "hello world" benchmarking?
Contrary to popular belief, "hello world benchmarks" are the most accurate and realistic gauges of performance
for the kind of applications µWebSockets is designed for:
IO-gaming (latency)
Signalling (memory overhead)
Trading (latency)
Finance (latency)
Chatting (memory overhead)
Notifications (memory overhead)
According to them, hello world benchmarks give a real world result. However, I don't think that is true. Real-world applications often involve much more than just sending a "hello world" message. There are usually additional layers of complexity like security, data access, business logic, etc., that these benchmarks do not account for. In all the examples above, there is a layer in between - i.e a database, which should not be ignored when talking about "real world benchmarks".
Since "hello world" benchmarks test the absolute bare minimum, they can sometimes give a false perception of performance. For example, a framework might excel at basic tasks but falter when handling more complex operations.
But nevertheless, we're going to use the elysia/bun
's way of benchmarking -
-
Get the path parameter i.e extracting
name
from the endpoint/bench/:name
-
Get the query parameter i.e extracting
id
from the endpoint/bench/:name?id=10
-
Set a couple of headers on the response.
-
Set a custom header on the response.
-
Set the
content-type
,connection
andkeep-alive
headers explicitly, so the response headers are the same for every framework. -
Serializing. It is a bit more cpu-intensive task than doing something like IO. For that reason, instead of returning an Javascript object, we're going to return a JSON response (serialized) back to the client. Bun claims that the JSON.stringify() in bun is 3.5x faster than Node.js. That should also give our Node.js servers a significant performance hit, right?
We will examine the source of all frameworks that are going to be used for benchmarking. I will also provide necessary information such as their versions, etc. if required.
Bun version - 0.8.1
import { Elysia } from "elysia";
const headers = {
"x-powered-by": "benchmark",
"content-type": "application/json",
connection: "keep-alive",
"keep-alive": "timeout=5",
};
const app = new Elysia().get("/bench/:name", (c) => {
c.set.headers = headers;
return JSON.stringify({
name: c.params.name,
id: c.query.id,
});
});
app.listen(3000);
rustc version - 1.73.0-nightly
axum version - 0.6.20
Release flags:
[profile.release]
opt-level = 3
codegen-units = 1
panic = 'abort'
lto = "thin"
debug = false
incremental = false
overflow-checks = false
Code:
use axum::{
extract::{Path, Query}, response::IntoResponse, routing::get, Router,
};
use serde_json::json;
use std::{collections::HashMap, net::SocketAddr};
const HEADERS: [(&'static str, &'static str); 4] = [
("x-powered-by", "benchmark"),
("content-type", "application/json"),
("connection", "keep-alive"),
("keep-alive", "timeout=5"),
];
async fn handle_request(
Path(name): Path<String>,
Query(query): Query<HashMap<String, String>>,
) -> impl IntoResponse {
(
HEADERS,
json!({
"name": name,
"id": query.get("id")
})
.to_string(),
)
}
#[tokio::main]
async fn main() {
let app = Router::new().route("/bench/:name", get(handle_request));
let port_number: u16 = str::parse("3000").unwrap();
let addr = SocketAddr::from(([127, 0, 0, 1], port_number));
axum::Server::bind(&addr)
.serve(app.into_make_service())
.await
.unwrap();
}
express
version - 4.18.2
const express = require("express");
const app = express();
const headers = {
"x-powered-by": "benchmark",
"content-type": "application/json",
connection: "keep-alive",
"keep-alive": "timeout=5",
};
// We want to make sure that the the response body is same, to make it fair.
app.disable("x-powered-by");
app.disable("etag");
app.get("/bench/:name", (req, res) => {
const { name } = req.params;
const id = req.query.id;
// res.end() does not set the content-length implicitly, so the `transfer-encoding: chunked` is assumed.
// Manually setting the `content-length` to get rid of it.
const response = JSON.stringify({ name, id });
headers["content-length"] = response.length;
res.writeHead(200, headers);
res.end(response);
});
app.listen(3000);
velocy
version - 0.0.12
const { createServer, Router } = require("velocy");
const router = new Router();
const headers = {
"x-powered-by": "benchmark",
"content-type": "application/json",
connection: "keep-alive",
"keep-alive": "timeout=5",
};
router.get("/bench/:name", (req, res) => {
const { name } = req.params;
const id = req.queryParams.get("id");
// Same, setting content-length to get rid of `transfer-encoding: chunked`.
const response = JSON.stringify({ name, id });
headers["content-length"] = response.length;
res.writeHead(200, headers);
res.end(response);
});
createServer(router).listen(3000);
For this round, we're only going to test single-instance performance of all the servers. This will be an example of typical benchmarks that are produced to say - Node.js is slower than X technology.
Testing machine - Macbook, M1 Max
Testing tool - wrk
Testing method -
wrk -t1 -c256 -d30s --latency http://localhost:3000/bench/testing?id=10
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.67ms 387.30us 18.15ms 89.67%
Req/Sec 149.89k 5.24k 155.66k 93.67%
Latency Distribution
50% 1.56ms
75% 1.62ms
90% 2.07ms
99% 3.18ms
4472519 requests in 30.01s, 0.87GB read
Socket errors: connect 6, read 92, write 0, timeout 0
Requests/sec: 149047.56
Transfer/sec: 29.57MB
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 631.35us 211.19us 11.70ms 95.24%
Req/Sec 210.04k 11.71k 225.95k 75.00%
Latency Distribution
50% 592.00us
75% 625.00us
90% 676.00us
99% 1.49ms
6271071 requests in 30.01s, 1.21GB read
Socket errors: connect 6, read 90, write 0, timeout 0
Requests/sec: 208938.22
Transfer/sec: 41.45MB
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.65ms 1.12ms 50.16ms 93.79%
Req/Sec 29.08k 1.68k 30.70k 96.33%
Latency Distribution
50% 8.29ms
75% 8.97ms
90% 9.26ms
99% 12.29ms
867945 requests in 30.01s, 172.17MB read
Socket errors: connect 6, read 115, write 0, timeout 0
Requests/sec: 28923.88
Transfer/sec: 5.74MB
A little mathematical calculation leads us to an obvious conclusion - Express is ~5.1 times slower than Elysia, not 18 as they claimed.
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.00ms 610.89us 36.12ms 98.97%
Req/Sec 84.12k 4.43k 86.08k 97.33%
Latency Distribution
50% 2.95ms
75% 3.00ms
90% 3.10ms
99% 3.65ms
2511165 requests in 30.01s, 498.13MB read
Socket errors: connect 6, read 117, write 0, timeout 0
Requests/sec: 83689.19
Transfer/sec: 16.60MB
Axum is clearly the fastest server framework out of the four we looked at. Elysia is fast too, but not as fast as it claims to be. Many people think Express is slow, but that's not really true. Express only slows down if you add more features or middlewares, which can be said for any other web framework as well.
My own web framework, Velocy, did really well! Plus, it only uses 10 MB of memory when it's idle. Many people assume that Node.js uses a lot of memory, but that's not always true. It's mostly software that are built on top of Electron, use a lot of memory, not server-side apps if written properly. My goal with Velocy is to create a backend framework, without every doing npm install
- i.e no dependencies. That is what this book is all about.
Now, it's time to test our app as if we were actually using it in the production ready environment.
Often, developers make bold claims about the speed of their web server frameworks, touting metrics like "8.5 times faster than Fastify" or "20 times quicker than Express." However, these claims often lack context - faster at doing what? Running console.log
statements? Handling HTTP requests? Brewing virtual coffee? Sending emojis blazingly fast? Additionally, many of these benchmarks are skewed, tailored to conditions that favor their particular framework.
For instance, a frequent argument is, "My framework uses only one process, while yours uses two. Therefore, double my framework's performance numbers." This sort of reasoning neglects the real-world conditions under which these frameworks would operate in a production environment.
Now, let's consider a practical scenario: Imagine you have a multi-core AWS instance. Using just a single core would be a waste of resources. Node.js inherently provides the capability to create a "cluster," allowing you to take full advantage of multi-core processors. So why not utilize it? We are not using any external tools, we're utilizing the Node.js standard library.
The significant critique against Bun's benchmark numbers is its lack of support for clustering. As a result, Bun can only be marginally faster than a Node.js application configured with clustering capabilities. Bun's primary selling point is its "speed," but that comes with the caveat of not supporting multi-core utilization, making it less practical for production settings.
While you could run multiple Bun instances and set up a load balancer, that's not the point. Node.js offers a seamless interface for clustering, which is one of its major advantages. Bun may claim to be fast, but without support for multi-core optimization, it falls short of being a fully scalable solution (for me) for production environments, without the added hassle of setting up a load balancer.
Axum remains unchanged. Axum runs on top of the tokio runtime which by default creates a thread-pool, and work on with new connections/request using a work-stealing algorithm.
Unfortunately, bun does not support working with node
's cluster module. When you try to fork, it throws an error:
NotImplementedError: node:cluster is not yet implemented in Bun. Track the status & thumbs up the issue: https://github.com/oven-sh/bun/issues/2428
code: "ERR_NOT_IMPLEMENTED"
So elysia's code remains unchanged as well.
The cluster module provides a utility function: fork that spawns a new worker process. When a new connection comes in, the master process can distribute the connection to one of the workers, in a round-robin fashion.
const express = require("express");
const cluster = require("cluster");
if (cluster.isMaster) {
for (let i = 0; i < 2; i++) {
cluster.fork();
}
} else {
const headers = {
"x-powered-by": "benchmark",
"content-type": "application/json",
connection: "keep-alive",
"keep-alive": "timeout=5",
};
const app = express();
app.disable("x-powered-by");
app.disable("etag");
app.get("/bench/:name", (req, res) => {
const { name } = req.params;
const id = req.query.id;
const response = JSON.stringify({ name, id });
headers["content-length"] = response.length;
res.writeHead(200, headers);
res.end(response);
});
app.listen(3000);
}
const { createServer, Router } = require("velocy");
const cluster = require("cluster");
if (cluster.isMaster) {
for (let i = 0; i < 2; i++) {
cluster.fork();
}
} else {
const router = new Router();
const headers = {
"x-powered-by": "benchmark",
"content-type": "application/json",
connection: "keep-alive",
"keep-alive": "timeout=5",
};
router.get("/bench/:name", (req, res) => {
const { name } = req.params;
const id = req.queryParams.get("id");
const response = JSON.stringify({ name, id });
headers["content-length"] = response.length;
res.writeHead(200, headers);
res.end(response);
});
createServer(router).listen(3000);
}
Axum and Elysia's results remain unchanged. We'll look at the new numbers for express and velocy.
Note, I am only spawning 2 workers for both express and velocy.
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.13ms 2.10ms 176.20ms 97.84%
Req/Sec 50.56k 1.27k 52.55k 82.00%
Latency Distribution
50% 4.92ms
75% 5.15ms
90% 5.80ms
99% 7.78ms
1508700 requests in 30.01s, 299.27MB read
Socket errors: connect 0, read 92, write 0, timeout 0
Requests/sec: 50275.21
Transfer/sec: 9.97MB
Running 30s test @ http://localhost:3000/bench/testing?id=10
1 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.08ms 5.39ms 148.18ms 99.66%
Req/Sec 139.68k 7.99k 146.00k 96.68%
Latency Distribution
50% 1.71ms
75% 1.97ms
90% 2.21ms
99% 3.49ms
4183269 requests in 30.10s, 829.81MB read
Socket errors: connect 0, read 131, write 0, timeout 0
Requests/sec: 138956.60
Transfer/sec: 27.56MB
Graphs
Too high max
latency due to the fact that the requests have to be distributed to the workers and all worker processes in a cluster share the same server ports, and the OS scheduler decides which worker process will handle an incoming request. This can cause contention and variability in response time. Some workers may get stuck with long-running requests, thereby delaying other incoming requests.
Let's ignore max. latency
and zoom in for some exciting results -
Velocy, our young, immature Node.js framework (in yellow) is performing almost the same as the super fast bun server (in blue)!
This is due to 3 separate processes consuming the memory - One master node, and two workers. Still, not much difference.
If you require high performance for both CPU and IO/resource-intensive tasks, languages like Rust, GoLang, C++ (with Drogon), or Zig are more suitable. Node.js excels in handling demanding IO workloads and is far from slow. We observed how a relatively new web framework nearly matched the speed of Bun, which claimed to be significantly faster than Node.js alternatives. Benchmarks should not be the sole factor in choosing a tool.
What truly matters are the tooling, community, library ecosystem, job market, and other factors. In terms of server-side programming, Node.js can be considered a top choice. For projects requiring high security, such as managing a wallet system or achieving sub-microsecond latency, Axum is my preferred framework.
Additionally, with Node.js, you can work on both server and client applications using a single language, which is an often-overlooked advantage.
TBD: Create benchmarks using
rewrk
too.