Hacker News Re-Imagined

Dragonflydb – A modern replacement for Redis and Memcached

  • 914 points
  • 1 month ago

  • @avielb
  • Created a post

Dragonflydb – A modern replacement for Redis and Memcached


@maxpert 1 month

Replying to @avielb 🎙

How does this compare to other multithreaded redis protocol compatibles? KeyDB is one key player https://docs.keydb.dev/

Reply


@thesuperbigfrog 1 month

Replying to @avielb 🎙

So, per the license only non-production use is allowed until June 2027 when this release changes to an Apache license?

https://github.com/dragonflydb/dragonfly/blob/main/LICENSE.m...

I understand wanting to protect your work from someone else turning into a service, but I will need to get our org's legal team to review it first.

Reply


@mperham 1 month

Replying to @avielb 🎙

If it doesn't support macOS or Windows, how do you suggest developers on those platforms use it? An install guide for those platforms (even if it's mostly Docker commands) would be extremely useful.

Reply


@captainmarble 1 month

Replying to @avielb 🎙

I got 2 feature requests here, 1. mutex/lock, 2. Hashset level ttl. Neat project anyways.

Reply


@throwaway888abc 1 month

Replying to @avielb 🎙

Impressive. Will give it try for internal benchmarks.

Homepage: https://dragonflydb.io/

Benchmark: https://raw.githubusercontent.com/dragonflydb/dragonfly/main...

Reply


@metadat 1 month

Replying to @avielb 🎙

C++? I was expecting Rust!

I am spoiled.

Reply


@abhi12_ayalur 1 month

Replying to @avielb 🎙

Would this be able to eventually handle JSON/deep data similar to RedisJSON? For my team's use case, this is crucial and what we're using currently.

Reply


@Xeoncross 1 month

Replying to @avielb 🎙

I want to take a minute to appreciate and recognize the https://github.com/dragonflydb/dragonfly#background section.

A lot of projects say "faster" without giving some hint of the things they did to achieve this. "A novel fork-less snapshotting algorithm", "each thread would manage its own slice of dictionary data", and "core hashtable structure" are all important information that other projects often leave out.

Reply


@staticassertion 1 month

Replying to @avielb 🎙

Reminds me a bit of Scylladb with the focus on 'shard per core'. I've considered using Scylla as a cache as well, might try this out instead.

Reply


@renonce 1 month

Replying to @avielb 🎙

Can’t wait to see this turn productional and feature-rich! I loved Redis Streams and the sheer power of it as a generic database. Redis is being undervalued for financial applications such as high performance bookkeeping that it doesn’t even have a fsync operation. Hope someone will pick it up.

Reply


@jpomykala 1 month

Replying to @avielb 🎙

Why this is modern? Because there a gradient on landing page?

Reply


@mirzap 1 month

Replying to @avielb 🎙

What "modern" even mean in this context? Is Redis not modern and somehow obsolete?

Reply


@nnx 1 month

Replying to @avielb 🎙

Benchmarks are curiously all based on Graviton2 or Graviton3 instances, it would be interesting to add some Intel/AMD instances to truly compare with Redis...

Reply


@dragosbulugean 1 month

Replying to @avielb 🎙

would love to hear why c++

Reply


@ed25519FUUU 1 month

Replying to @avielb 🎙

To me the focus on speed is a wash now. They’re all fast. I’d like to hear about easy cross-region replication and failover as well as effortless snapshot and restoring of backups.

Reply


@nurettin 1 month

Replying to @avielb 🎙

I use redis as a messaging server via xstream. It handles restarts well. Not sure if dragonflydb supports that.

Reply


@welder 1 month

Replying to @avielb 🎙

> Probably, the fastest in-memory store in the universe!

Redis is fast enough. Read/write speed isn't usually the bottleneck, it's limiting your data set to RAM. I've long ago switched to a disk-backed Redis clone (called SSDB) that solved all my scaling problems.

Reply


@sudarshnachakra 1 month

Replying to @avielb 🎙

I like the redis protocol compatibility and the HTTP compatibility, but from the initial skim through I guess you are using abseil-cpp and the home-grown helio (https://github.com/romange/helio) library.

Could you get me a one liner on the helio library is it used as a fiber wrapper around the io_uring facility in the kernel? Can it be used as a standalone library for implementing fibers in application code?

Also it seems that spinlock has become a defacto standard in the DB world today, thanks for not falling into the trap (because 90% of the users of any DB do not need spinlocks).

Another curious question would be - why not implement with seastar (since you're not speaking to disk often enough)?

Reply


@mamcx 1 month

Replying to @avielb 🎙

Aside nit-pick: I think is dangerous call anything "db" if is not safely stored with Acid.

People not read docs neither know the consequences of words like "eventual" or "in memory" and star using this kind of software as primary data stores, instead of caches/ephemeral ones...

Reply


@12thwonder 1 month

Replying to @avielb 🎙

I am amazed at how small the codebase is, and also pretty readable. great to see work like this, thank you!

Reply


@antirez 1 month

Replying to @avielb 🎙

I'm no longer involved in Redis, but I would love if people receive a clear information: from what the author of Dragonflydb is saying here, that is, memcached and Draonflydb have similar performance, I imagine that the numbers provided in the comparison with Redis are obtain with Redis on a single core, and Draonflydb running on all the cores of the machine. Now, it is true that Redis uses a core per instance, but it is true that this is comparing apple-to-motorbikes. Multi-key operations are possible even with multiple instances (via key tags), so the author should compare N Redis instances (one per core), and report the numbers. Then they should say: "but our advantage is that you can run a single instance with this and that good thing". Moreover I believe it would be fair to memcached to clearly state they have the same performance.

EDIT: another quick note: copy-on-write implementations on the user space, algorithmically, are cool in certain situations, but it must be checked what happens in the worst case. Because the good thing of kernel copy-on-write is that, it is what it is, but is easy to predict. Imagine an instance composed of just very large sorted sets: snapshotting starts, but there are a lot of writes, and all the sorted sets end being duplicated in the process. When instead the sorted sets are able to remember their version because the data structure itself is versioned, you get two things: 1. more memory usage, 2. a lot more complexity in the implementation. I don't know what dragonflydb is using as algorithmic copy-on-write, but I would make sure to understand what the failure modes are with those algorithm, because it's a bit a matter of physics: if you want to capture a snapshot at a given Time T0 of a database, somehow changes must be accumulated. Either at page level or at some other level.

EDIT 2: fun fact, I didn't comment something about Redis for two years!

Reply


@manigandham 1 month

Replying to @avielb 🎙

Interesting project. Very similar to KeyDB [1] which also developed a multi-threaded scale-up approach to Redis. It's since been acquired by Snapchat. There's also Aerospike [2] which has developed a lot around low-latency performance.

1. https://docs.keydb.dev/

2. https://aerospike.com/

Reply


@didip 1 month

Replying to @avielb 🎙

Does it have a Helm chart? It will speed up adoption by a lot if Helm chart is provided.

Reply


@wnzl 1 month

Replying to @avielb 🎙

This looks really good! I gave it a thorough look and is definitely something I’d consider using. Maybe as canary in one of our systems.

Youth of product makes it bit scary to use fully in mission critical systems - given how many problems with Redis started to show up under proper load. But definitely on my watch list.

Reply


@romange 1 month

Replying to @avielb 🎙

Guys, I am the author of the project. Would love to answer any questions you have. Meanwhile will try to do it by replying comments below.

Reply


@tiffanyh 1 month

Replying to @avielb 🎙

FoundationDB should have been included in their perf comparison. It’s ACID compliant and a distributed Value/Key store.

For SSD based storage, it’s getting 50k reads/sec PER core and scales linearly with # of cores you have in your cluster. (They achieved 8MM reads/sec with 384 cores)

https://apple.github.io/foundationdb/performance.html

Reply


@chx 1 month

Replying to @avielb 🎙

I think the HTTP connection should be used to integrate webdis. https://github.com/nicolasff/webdis

Reply


@sirsinsalot 1 month

Replying to @avielb 🎙

What about it makes it a "modern" replacement rather than just a replacement? Is there something about Redis and Memcached that is "outdated" in the (relatively) short time span they've existed (compared to something like C)?

Reply


@lionkor 1 month

Replying to @avielb 🎙

Have Redis and Memcached aged so much we need a modern replacement? Or is this a webdevy 'modern' which just means the first commit is newer than redis' first commit?

Reply


@decidertm 1 month

Replying to @avielb 🎙

This looks very nice, we use KeyDB a lot. This looks like it could be a suitable alternative when there are more horizontal options.

I just deployed it on Northflank with your public docker image and wrote a guide here: https://northflank.com/guides/deploy-dragonfly-on-northflank... - works great!

Reply


@marmada 1 month

Replying to @avielb 🎙

Is io_uring the reason this is faster? I'm curious because redis is in memory right? And io_uring is mostly for disk ops, I assume?

Reply


@etaioinshrdlu 1 month

Replying to @avielb 🎙

It looks like it drops the single-threaded limitation of redis to achieve much better performance.

Could this architecture be extended to scale across multiple machines? What would be the benefits and costs of this?

Reply


@jimnotgym 1 month

Replying to @avielb 🎙

Modern? Eeek I think of Redis as modern (2009). I'm feeling old.

Reply


@jitl 1 month

Replying to @avielb 🎙

There are a lot of benchmarks against Redis, but where is the comparison to Memcached? Redis is quite slow for cache use-case already.

Reply


@girfan 1 month

Replying to @avielb 🎙

This looks like a cool project. Is there any support (or plan to support) I/O through kernel bypass technologies like RDMA? For example, the client reads the objects using 1-sided reads from the server given it knows which address the object lives in. This could be really benefitial for reducing latency and CPU load.

Reply


@numlock86 1 month

Replying to @avielb 🎙

Looks cool, but LICENSE.md looks like a red flag. Too bad.

Reply


@throwaway787544 1 month

Replying to @avielb 🎙

Controversial opinion: we don't need more databases. We need better application design.

Why do you need a redis/memcache? Because you want to look up a shit-ton of random data quickly.

Why does it have to be random? Do you really need to look up any and all data? Is there not another more standard (and not dependent on a single db cluster) data storage and retrieval method you could use?

If you have a bunch of nodes with high memory just to store and retrieve data, and you have a bunch of applications with a tiny amount of memory.... Why not just split the difference? Deploy your apps to nodes with high amounts of memory, add parallel processing so they scale efficiently, store the data in memory closer to the applications, process in queues to prevent swamping the system and more reliable scaling. Or use an SSD array and skip storing it in memory, let the kernel VM take care of it.

If you're trying to "share" this memory between a bunch of different applications, consider if a microservice architecture would be better, or a traditional RDBMS with more efficient database design. (And fwiw, organic networks (as in biological) do not share one big pot of global state, they keep state local and pass messages through a distributed network)

Reply


@FrostKiwi 1 month

Replying to @avielb 🎙

Instantly confused this with a feature specific to DragonFlyBSD. Looking sweet though.

Reply


@judofyr 1 month

Replying to @avielb 🎙

Wow, this looks very nice!

I’ve seen the VLL paper before and I’ve wondered how well it would work in practice (and for what use cases). Does anyone know how they handle blocked transactions across threads? Is the locking done per-thread? If so, how do you detect/resolve deadlocks?

It also be good to see a benchmark comparing single-thread performance between DragonflyDB and Redis. How much of the performance increase is due to being able of using all threads? And how does it handle contention? In Redis it’s easy to reason about because everything is done sequentially. How does DragonflyDB handle cases where (1) 95% of the traffic is GET/SET a single key or (2) 90% of the traffic involves all shards (multi-key transaction)?

Reply


@bradhe 1 month

Replying to @avielb 🎙

To call Redis not modern seems...

Reply


@mfontani 1 month

Replying to @avielb 🎙

Aw, no hyperloglog support. So close for my redis use-case

Reply


@Thaxll 1 month

Replying to @avielb 🎙

On the picture Redis tops at 200k/seconds on an instance with 64 cores (r6g), Dragonfly 1400k/seconds, Redis is single threaded DF is not but it only got 7.7x faster how come?

If you run let say 32 instances of Redis ( not using HT ) with CPU pining will be much faster than DF assuming the data is sharding/clustered.

Reply


@romange 1 month

Replying to @avielb 🎙

I think I found a great analogy for the "N redises vs a single dragonfly" question.

In Israel, nowdays, the price of a watermelon in a local supermarket is 1.8$.

But if you go to farmers in the north, you can probably buy it for 30-50 cents. But then you would spend 3 hours in traffic and 20$ on gas.

So, Dragonfly is a local-supermarket that sells watermelons for 50 cents. mic drop.

Reply


@mayli 1 month

Replying to @avielb 🎙

How about run redis-benchmark and compare the numbers?

Reply


@staticassertion 1 month

Replying to @avielb 🎙

This is really cool. Love a section on how things are designed with links to papers, always makes me feel way better about a project - especially one that has benchmarks.

Might try this out.

Reply


@PeterZaitsev 1 month

Replying to @avielb 🎙

Will be very interesting to see how it plays out. I'm yet to see the database technology completely licensed as BSL to be successful in Open Source strategy.

Elastic made change after it was very popular, MariaDB is same story, and even more so only uses BSL for "Enterprise" components which have very little community adoption.

We see however other folks, such as Neon picking permissive license for their technology https://github.com/neondatabase/neon

I think for Open Source Project just starting up concern of "Clouds will steal my lunch" is just stupid. If you're worth for clouds to Adopt you're in 0.1% of all Open Source Projects and already "winning" You can WHEN revisit your license, think how to get to that point, rather than create adoption barriers early on

Reply


@avinassh 1 month

Replying to @avielb 🎙

This is excellent!

I see only throughput benchmarks. Redis is single threaded, beating it at latency would have been far more impressive.

Do you have latency benchmarks at peak throughput?

Reply


About Us

site design / logo © 2022 Box Piper