Hacker News Re-Imagined

I'm all-in on server-side SQLite

  • 1353 points
  • 16 days ago

  • @dpeck
  • Created a post

I'm all-in on server-side SQLite


@swaraj 16 days

Replying to @dpeck 🎙

Looks v cool, but I feel like I'm missing a big part of the story, how do 2 app 'servers/process' connect to same sqlite/litestream db?

Do you 'init' (restore) the db from each app process? When one app makes a write, is it instantly reflected on the other app's local sqlite?

Reply


@boesboes 15 days

Replying to @dpeck 🎙

How well does this scale for larger data sets? Could I use it with 100GB of data for instance?

Reply


@mwcampbell 16 days

Replying to @dpeck 🎙

Congratulations to Ben on getting a well-funded player like Fly to buy into this vision. I'm looking forward to seeing a complete, ready-to-deploy sample app, when the upcoming Litestream enhancements are ready.

I know that Fly also likes Elixir and Phoenix; they hired Chris McCord, after all. So would it make sense for Phoenix applications deployed in production on Fly to use SQLite and Litestream? Is support for SQLite in the Elixir ecosystem, particularly Ecto, good enough for this?

Reply


@downut 16 days

Replying to @dpeck 🎙

(I am attempting my first "as much as possible make the database do the work" app right now, after 35 years in the business. Yeah I started out on the scientific side, and then the sort of things SQLite is obviously great for.)

I do not understand how one implements the multi-role access system on top of SQLite that postgresql gives you for free.

Other than do it from scratch (eeek!) on the app side.

Just as an example, think of the smallest db backed factory situation you can imagine... as small as you like. There will need to be multiple roles if more than one role accesses the database tables.

Reply


@DeathArrow 15 days

Replying to @dpeck 🎙

With SQLite you embed the DB in the application. If I have 6 Kubernetes pods and the pod containing the writer dies, all other 5 pods will be useless.

Reply


@swlkr 16 days

Replying to @dpeck 🎙

The reduction in complexity from using sqlite + litestream as a server side database is great to see!

Reply


@InitEnabler 15 days

Replying to @dpeck 🎙

SQLite, has to be one of my favorite databases. It's always improving and the story behind it's creation is really quite something.

Reply


@netcraft 16 days

Replying to @dpeck 🎙

This is similar to what I hoped websql had eventually grown into. sqlite in the browser, but let me sync it up and down with a server. Every user gets their own database, the first time to the app they "install" the control and system data, then their data, then writes are synced to the server. If it became standard, it could be super easy - conflict resolution notwithstanding.

Reply


@tyingq 16 days

Replying to @dpeck 🎙

Dqlite is also interesting, and in a similar space. It seems to have evolved from the LXC/LXD team wanting a replacement for Etcd. It's Sqlite with raft replication and also a networked client protocol.

https://dqlite.io/docs/architecture

Reply


@RcouF1uZ4gsC 16 days

Replying to @dpeck 🎙

I love Litestream! It is so simple and it just works!

Congratulations, Ben, on making a great product and on the sale!

One thing I have had in the back of my mind, but have not had the time to pursue is using SQLite replication to make something similar to CloudFlare's durable objects but more open.

A "durable object" would be an SQLite database and some program that processes requests and accesses the SQLite database. There would be a runtime that transparently replicates the (database, program) pair where they are needed and routes to them.

That way, I can just start out locally developing my program with an SQLite database, and then run a command and have it available globally. At the same time, since it is just accessing an SQLite database, there would be much less risk of lockin.

Reply


@krts- 16 days

Replying to @dpeck 🎙

A great project with awesome implications. Well deserved, and the fly.io team are very pragmatic.

This will be even more brilliant than it already is when fly.io can get some slick sidecar/multi-process stuff.

I ended up back with Postgres after my misconfigs left me a bit burned with S3 costs and data stuff. But I think a master VM backed by persistent storage on fly with read replicas as required is maybe the next step: I love the simplicity of SQLite.

Reply


@foodstances 16 days

Replying to @dpeck 🎙

Just curious, is there any financial compensation/support going to Richard Hipp with all of this money changing hands?

When I see these startups making a business that is so heavily based on open-source software (like Tailscale on top of Wireguard), I have to wonder what these companies do to actually support the author(s) of the software that so much of their company is based on.

Reply


@otoolep 16 days

Replying to @dpeck 🎙

Congratulations to Ben! This project has been like a rocket ship.

Reply


@no_wizard 16 days

Replying to @dpeck 🎙

This a great and interesting offering! I think this fits well with fly.io and their model of computing.

I now wish that I had engaged with this idea that was very similar to litestream that I had about a year and half ago. I always thought SQLite just needed a distribution layer to be extremely effective as a distributed database of sorts. Its flat file architecture means its easy to provision, restore and backup. SQLite also has incremental snapshotting and re-producible WAL logs that can be used to do incremental backups, restores, writes etc. It just needs a "frontend" to handle those bits. Latency has gotten to the point where you can replicate a database by its continued snapshots (which is, on a high level, what litestream appears to be doing) being propagated out to object / blob storage. You could even achieve brute force consensus with this approach if you ran it in a truly distributed way (though RAFT is probably more efficient).

Reason I didn't do this? I thought to myself - why in the world in 2020 would someone choose to use SQLite at scale instead of something like Firebase, Spanner, Fauna, or even Postgres? So after I did an initial prototype (long gone, never pushed it to GitHub) I just felt like...there was no appetite for it.

Now I regret!

Just a long winded way of saying, congrats! This is awesome! Thanks for doing exactly what I wanted to do but didn't have the guts to follow through with.

Reply


@mrcwinn 16 days

Replying to @dpeck 🎙

I have really enjoyed using Fly. Great service and support.

Reply


@nojvek 15 days

Replying to @dpeck 🎙

Somebody needs to build litestream for duckdb (columnstore oriented sqlite like db).

That would be epic. DuckDB speed is crazy fast when it comes to aggregate/analysis queries.

Reply


@scwoodal 16 days

Replying to @dpeck 🎙

> According to the conventional wisdom, SQLite has a place in this architecture: as a place to run unit tests.

Be careful with this approach. Frameworks like Django have DB engine specific features[1]. When you start using them in your application you can no longer use a different DB (SQLite) to run your unit tests.

[1] https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f...

Reply


@seanwilson 16 days

Replying to @dpeck 🎙

SQLite uses dynamic types? Is this an issue in practice, especially for large apps? Don't you lose guarantees about your data which makes it messy to handle on the backend?

Context from https://www.sqlite.org/datatype3.html: "SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases. Flexible typing is a feature of SQLite, not a bug."

Reply


@PhineasRex 16 days

Replying to @dpeck 🎙

It's been a while since we reinvented the wheel, hasn't it.

Reply


@jchw 16 days

Replying to @dpeck 🎙

This is interesting! I like using Fly.io today, but I’m currently using a single node for most stuff with SQLite. Having some kind of failover and replication would be pretty awesome. I have yet to try Litestream and it does sound like there’s some issues to work out that could be pretty nasty, but I’ll definitely be watching.

Fly.io is very nice. It’s what I hoped Hyper.sh would be, except it isn’t dead. That said, there are a couple things I worry about… like, there’s no obvious way to resize disks, you pretty much need to make a new disk that’s larger, launch a new instance with it mounted, and transfer data from an existing instance. If it was automated, I probably wouldn’t care, though a zero downtime way of resizing disks would be a massive improvement. Another huge concern is just how good the free tier is. I actually am bothered that I basically don’t get billed. Hyper.sh felt a bit overpriced, and by comparison Fly.io does scale up in price but for small uses it feels like theft.

Reply


@rco8786 16 days

Replying to @dpeck 🎙

All of the action around SQLite recently is very exciting!

Reply


@ignoramous 15 days

Replying to @dpeck 🎙

Looking forward to ditching my PlanetScale plans for this!

> ...people use Litestream today is to replicate their SQLite database to S3 (it's remarkably cheap for most SQLite databases to live-replicate to S3).

Cloudflare R2 would make that even cheaper. Cloudflare set to open beta registration this week.

And if you squint just enough, you'd see R2, S3 et al are nosql KV store themselves, masquerading as disk drives, and used here to back-up a sql db...

> My claim is this: by building reliable, easy-to-use replication for SQLite, we make it attractive for all kinds of full-stack applications to run entirely on SQLite.

Disruption (? [0]) playing out as expected? That said, the world reliable is doing a lot of heavy lifting. Reliability in distributed systems is hard (well... easy if your definition of reliability is different ;) [1])

> And if you don't need the Postgres features, they're a liability.

Reminds me of WireGuard, and how it accomplishes so much more by doing so much less [2].

Congratulations Ben (but really, could have taken a chance with heavybit)!

----

[0] https://hbr.org/2015/12/what-is-disruptive-innovation

[1] God help me, the person on the orange site saying they need to run Jepson tests to verify Litestream WAL-shipping. Stand back! You don’t want to get barium sulfated!, https://twitter.com/tqbf/status/1510066302530072580

[2] "...there’s something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case. And that made it something really powerful to build on top of*, https://www.lastweekinaws.com/podcast/screaming-in-the-cloud...

Reply


@kgeist 15 days

Replying to @dpeck 🎙

>But database optimization has become less important for typical applications. <..> As much as I love tuning SQL queries, it's becoming a dying art for most application developers.

We thought so, too, but as our business started to grow, we had to spend months, if not years, rewriting and fine-tuning most of our queries because every day there were reports about query timeouts in large clients' accounts... Some clients left because they were disappointed with performance. Another issue is growing the development team. We made the application stateless so we can spin up additional app instances at no cost, or move them around between nodes, to make sure the load is evenly distributed across all nodes/CPUs (often a node simply dies for some reason). Since they are stateless, if an app instance crashes or becomes unstable, nothing happens, no data is lost, it's just restarted or moved to a less busy node. DB instances are now managed by the SRE team which consists of a few very experienced devs, while the app itself (microservices) is written by several teams of varying experience and you worry less about the app bringing down the whole production because microservice instances are ephemeral and can be quickly killed/restarted/moved around. Simple solutions are attractive but I'd rather invest in a more complex solution from the very beginning, because moving away from SQLite to something like Postgres can be costlier than investing some time in setting up 3-tier if you plan your business to grow, otherwise eventually you can end up reinventing 3-tier, but with SQLite. But that's just my experience, maybe I'm too used to our architecture.

Reply


@bob1029 16 days

Replying to @dpeck 🎙

> SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.

Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.

Reply


@vinay_ys 15 days

Replying to @dpeck 🎙

In the past two decades we have done this enough times to know better. Here's what we know:

1. Compute and storage should be decoupled because the compute vs storage hardware performance increases at different rate over generations of hardware and if our application is coupled, then choosing an efficient shape of the server hardware is very difficult.

2. We know making a single server highly reliable is very difficult (expensive) but making a bunch of servers in aggregate reliable is much much easier. Hence, we should spread our workload on a bunch of servers to reduce the blast radius of any one single server failing.

3. We know making a single server very big (scale vertically) and utilise it efficiently is also very difficult (again, read: expensive). But using a bunch of smaller servers efficiently is relatively easier and more cost effective. Here, big vs small is relative at any given point in time – the median/average size server is whatever is most popularly used – hence it is mass manufactured and sold at volume-pricing-margins and popular software has caught up to use it efficiently (read: linux kernel and popular server software).

4. We know data is ever growing and application is ever more hungry to use more data in 'smart' ways. Hence, overall size of data upon which we want to operate is ever increasing. Hence scalable data architectures are very crucial to keep up with the market competition. (Even if you believe your app can be dumb and simple, the market competition forces will move you towards becoming more data 'smart').

5. We know a lot of business models are viable only at huge scale of users. At smaller scales, the margins are so low that it isn't viable to operate. Again this is due to competition. Only scale operator survives. Hence, we know building architectures that doesn't scale to "millions of users" (even in enterprise software world) isn't viable anymore.

6. We know such scale brings more complexity – multi-tenancy, multiple regions, multiple jurisdictions etc. Internet world is becoming very complex, geo-politically etc. Multi-tenant usage based pricing models bring interesting challenges w.r.t usage metering, isolation, utilisation efficiency and security challenges. Multi-region and multi-jurisdiction brings interesting challenges w.r.t high-availability/continuity and traffic routing and cross-region data storage/replication along with encryption and key-management.

7. With all this, we have learned that layered architecture is critical to managing complexity while providing both feature agility and non-functional stability. Hence we know a lot of these complex capabilities should be solved by the lower layers in a reusable high-leverage way and not be tied to application layers. This is crucial for application layer to rapidly iterate on features to find product-market fit without destabilising these crucial non-functional core capabilities.

8. We know being able to refactor your application domain logic rapidly and efficiently is a super power for a startup hunting product market fit, for a big tech keeping up the innovation speed or any company in between just surviving the competition everyday. This refactoring super-power is crucial for keeping tech debt in control (and being able to take tech debt strategically) and not blowing up your engineering budget by having to hire like crazy (throwing bodies a the problem).

We know all this..and more.. but I'll stop here... for now.

Reply


@plesiv 16 days

Replying to @dpeck 🎙

I absolutely love this. I think so called n-tier architecture as a pattern should be aggressively battled in the attempt to reduce the n. Software is so much more reliable when the communication between different computational modules of the system are function calls as opposed to IPC calls. Why does everything that computes something or provides some data need to be a process? It doesn't.

Postgresql and every other server/process should have first class support for a single CLI command that: spins up the DB that slurps up the config and the data storage, takes the SQL command provided through the CLI arguments, runs it, returns results and terminates. Effectively, every server/process software should be a library first, since it's easy to make a server out of a library and the reverse is anything but.

Reply


@beck5 16 days

Replying to @dpeck 🎙

I have found it easy to overload SQLite with too many write operations (20+ Concurrently), is this typical behaviour referred to in the post, or a write heavy workload?

Reply


@3np 15 days

Replying to @dpeck 🎙

A common gotcha with sqlite and WAL is how it's not supported on networked filesystems, which will bite anyone trying to keep their data volumes replicated over glusterfs, ceph, and similar with corruption.

Let's say we're running a vendored application (forking it is not an option) utilizing WAL and want to store the db on one of those filesystems not traditionally suitable for WAL'd sqlite.

Would dropping in Litestream on the db allow us to do so safely?

Reply


@NeutralForest 16 days

Replying to @dpeck 🎙

There's something I don't understand, it says that the "data is next to the application", what does it mean? Where is stored and how is it accessed by the application?

Reply


@tybit 15 days

Replying to @dpeck 🎙

I think this architecture would be really powerful paired with the actor model to shard databases to nodes.

Reply


@pjmlp 15 days

Replying to @dpeck 🎙

If it comes with the same tooling as Oracle and SQL Server, I might think about using it server side, until then not really.

Reply


@rullopat 15 days

Replying to @dpeck 🎙

My question is: what would happen if my server blows up while Litestream is still streaming to S3?

Reply


@DeathArrow 15 days

Replying to @dpeck 🎙

If we don't need SQL capabilities of SQLite, we can use the file system as a document database. Rsync will take care of replication.

Reply


@splitrocket 15 days

Replying to @dpeck 🎙

There are a couple of interesting options in a similar space: BedrockDB ( https://bedrockdb.com/ ) Dqlite ( https://dqlite.io/ ) Rqlite ( https://github.com/rqlite/rqlite )

I'm interested in how this performs and particularly, what are the tradeoffs relative to the other options above.

Reply


@kukabynd 15 days

Replying to @dpeck 🎙

Great move, congrats to everyone involved. Fly is very promising player in the space. Pipeline looks amazing, and I’ll be trying more of your offerings down the road.

Reply


@anyfactor 16 days

Replying to @dpeck 🎙

Story time!

A client told me that they will use a DigitalOcean droplet for a web app. Because the database was very small I chose to use SQLite3.

After delivery the client said their devops guy wasn’t available they would like to deploy to Heroku. Heroku being a ephemeral cloud service couldn’t handle the same directory SQLite3 db I had there. The only solution was to use their Postgres database service.

For some reason, it was infuriating that I have to use a database like that to store few thousand rows of data. Moreover, I would have to rewrite a ton of stuff accommodate the change to Postgres.

I ended up using firestore.

---

I think something like this could have saved me a ton of hassle that day.

Reply


@mro_name 15 days

Replying to @dpeck 🎙

> The conventional wisdom could use some updating.

how true in so many fields.

Reply


@jasfi 15 days

Replying to @dpeck 🎙

Is there a good DB admin GUI that supports both SQLite and Postgres?

Reply


@coliveira 15 days

Replying to @dpeck 🎙

What I think interesting is that people write articles about technology architectures without even bothering trying to use the said architecture. I would be very interested in reading from someone who actually used sqlite in a large scale application in the way he described, and then tell what worked or not in this setup. Until then, this article is nothing more than a proposal, another kind of vaporware.

Reply


@dsincl12 15 days

Replying to @dpeck 🎙

Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

Why do everyone seem so hyped on this when it can't really work properly IRL? If you have large amounts of data that need to be stored the app would die instantly, or leave all your users waiting for their changes to be saved.

What am I missing?

Reply


@CGamesPlay 15 days

Replying to @dpeck 🎙

I agree with this article! I even went so far as to write a Prisma-like SQL client generator that uses better-sqlite3 under the hood, so you get the nice API of Prisma and the synchronous performance of better-sqlite3. I’ve been using it for a few small projects, but I just released it at 1.0 yesterday.

https://github.com/CGamesPlay/rapid-cg

Reply


@ilrwbwrkhv 16 days

Replying to @dpeck 🎙

For how much?

Reply


@mtlynch 16 days

Replying to @dpeck 🎙

Super cool! Congrats, Ben!

I've been building all of my projects for the last year with SQLite + fly.io + Litestream. It's already such a great experience, but I'm excited to see what develops now that Litestream is part of fly.

Reply


@obiwanpallav1 15 days

Replying to @dpeck 🎙

In which scenario would you use litestream[1] vs rqlite[2]?

1 - https://github.com/benbjohnson/litestream

2 - https://github.com/rqlite/rqlite

Reply


@DeathArrow 15 days

Replying to @dpeck 🎙

>SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

We will make up for those latency losses by throwing more microservices in our fat microservices architectures, add more message brokers in the flow. For sure will find a way to bring those milliseconds back. :)

Reply


@learndeeply 16 days

Replying to @dpeck 🎙

Since both Fly.io and Litestream founders are here - why not disclose the price?

Reply


@tiffanyh 16 days

Replying to @dpeck 🎙

@dang, the actual title is “ I'm All-In on Server-Side SQLite”

Maybe I missed it but where in the article does it say Fly acquired Litestream?

EDIT: Ben Johnson says he just joined Fly. Nothing about Fly “acquiring” Litestream.

https://mobile.twitter.com/benbjohnson/status/15237489883352...

Reply


@farmin 15 days

Replying to @dpeck 🎙

> The upcoming release of Litestream will let you live-replicate SQLite directly between databases, which means you can set up a write-leader database with distributed read replicas. Read replicas can catch writes and redirect them to the leader; most applications are read-heavy, and this setup gives those applications a globally scalable database.

Would this make lightstream a possible fit to sync a mobile device to a users own silo of data on 'server'? Would need a port of lightstream to Dart.

Reply


@onetom 15 days

Replying to @dpeck 🎙

+1 for SQLite! I've used it from Clojure, via HoneySQL, so no ORM, no danger of SQL injection. It was really wonderful!

https://github.com/seancorfield/honeysql

I used it to quickly iterate on the development of migration SQL scripts for a MySQL DB, which was running in production on RDS.

I might have switched to H2 DB later, because that was more compatible with MariaDB, but I could use the same Clojure code, representing the SQL queries, because HoneySQL can emit different syntaxes. Heck, we are even using it to generate queries for the SQL-variant provided by the QuickBooks HTTP API! :)

https://www.hugsql.org/ it's pretty good too, btw! it's just a bit too much magic for me personally :)

Also, you should really look into JetBrains database tooling, like the one in IntelliJ Ultimate or their standalone DataGrip product! It's freaking amazing, compared to other tools I tried. If you are an Emacs person, then I think even with some inferior shells to the command-line interfaces of the various SQL system, you can go very far a lot more conveniently, than thru some ORMs.

Either way, one secret to developing SQL queries comfortably is to utilize some more modern features, like the WITH clause, to provide test data to your queries: https://www.sqlite.org/lang_with.html

You can use it to just type up some static data, but you can also compute test data dynamically and even randomly!

Other little-known feature is the RETURNING clause for INSERT/UPDATE/DELETE: https://www.sqlite.org/lang_returning.html

It can highly simplify your host-code, which embeds SQL, because you don't have to introduce UUID keys everywhere, just so you can generate them without coordination.

Reply


@jrochkind1 16 days

Replying to @dpeck 🎙

While the title is about a business acquisition, the article is mostly about the technology itself -- replicating SQLite, suggested as a superior option to a more traditional separate-process rdbms, for real large-scale production workloads.

I'd be curious to hear reactions to/experiences with that suggestion/technology, inside or outside the context of fly.io.

Reply


@whazor 15 days

Replying to @dpeck 🎙

I like the idea. It indeed sounds faster to redirect all write API's via your own proxy to a single write instance remote (or maybe multiple via sharding).

Via Kubernetes you could have a cross region cluster that will deal with nodes going offline and like the author said, you would have a couple of seconds downtime with speeds nowadays. Which you could resolve by smarter frontends.

Reply


@LunaSea 16 days

Replying to @dpeck 🎙

I wonder if we'll ever see an embedded version of PostgreSQL?

Reply


@melony 16 days

Replying to @dpeck 🎙

Note that the popular Node.js ORM Prisma does not support WAL.

https://github.com/prisma/prisma/issues/3303

Reply


@DeathArrow 14 days

Replying to @dpeck 🎙

In terms of CAP theorem, you give up consistency and partition tolerance, leaving only availability.

For many, giving up consistency would be a big deal.

Reply


@aidenn0 16 days

Replying to @dpeck 🎙

The SQLite team has done a good job over the years establishing an ethos (in the rhetorical sense) of writing reliable software. The degree to which this can transfer to Lighstream is the degree to which Lightstream is intrusive on the SQLite code.

Another way of saying it: I trust the SQLite's team statements of stability for SQLite because of history and a track-record for following stringent development processes. The same is not true of the Lighstream team. Does anybody know how much any potential damage introduced by the Lightstream code could affect the integrity of my data on disk -- obviously replication added by Lightstream will be only as good as the Lighstream team makes it, but to what degree is the local data-store affected?

Reply


@quintes 16 days

Replying to @dpeck 🎙

What’s the use case here, a single web app with inproc db?

More complex use cases?

I remember I could do this on azure at one point in time with app services, not Sure if it’s still a thing.. but heavy writes and scaling of those types of apps would lead to to rethink this approach right?

Reply


@mkleczek 15 days

Replying to @dpeck 🎙

I guess I am in minority here now but... Embedding an SQL database in the application is really missing the point of having RDBMS. The goal of RDBMS is not merely to persist application data but to _share_ data between different applications.

And while current trend is to implement sharing by applications I expect this to change in the future as it is much more economical to use RDBMS to share data.

Reply


@jl6 15 days

Replying to @dpeck 🎙

Perhaps you could avoid the need for an additional replication tool if you happened to have some kind of synchronous stretch clustered SAN storage on which to place the SQLite database file. Moving HA to the infra layer?

Reply


@swah 13 days

Replying to @dpeck 🎙



@OOPMan 15 days

Replying to @dpeck 🎙

Cool technical marketing blog story bro

Reply


@paulhodge 16 days

Replying to @dpeck 🎙

Wow Litestream sounds really interesting to me. I was just starting on an architecture, that was either stupid or genius, of using many SQLite databases on the server. Each user's account gets their own SQLite file. So the service's horizontal scaling is good (similar to the horizontal scaling of a document DB), and it naturally mitigates data leaks/injections. Also opens up a few neat tricks like the ability to do blue/green rollouts for schema changes. Anyway Litestream seems pretty ideal for that, will be checking it out!

Reply


@Maksadbek 16 days

Replying to @dpeck 🎙

SQLite is known for having many various extentions. If the streaming replication is so important, why didn't sqlite authors create such one before ?

Reply


@daniel_iversen 15 days

Replying to @dpeck 🎙

For most people's purposes I'd assume that ease-of-use, ease-of maintenance, relatively good speed, safe, documented, feature rich and scalable is important. I like SQLite, and while it's cool that they've fixed some big things around safety and clustering, it still seems like a "below-bare-minimum" choice for a lot of production systems, or am I just being old school? MariaDB (/ MySQL) really has a whole lot of good features that I thought would just make it a safer choice? What do people think and why?

Reply


@nh2 15 days

Replying to @dpeck 🎙

The article doesn't seem to discuss one of the most fundamental guarantees of current-day DB-application interaction:

Acknowledged writes must not be lost.

For example, if a user hits "Delete my account", and gets a confirmation "You account was deleted", that answer must be final. It would be bad if the account reappeared afterwards. Similarly, if a user uploads some data, and gets a confirmation (say via HTTP 200), they should be able to assume that the data was durably stored on the other side, and that they can delete it locally.

Most applications make this assumption, and that makes sense: Otherwise you could never know how how much longer a client needs to hold onto the data until being sure that the DB stored it.

This can only be achieved reliably with a server-side network roundtrip on write ("synchronous replication"), because a single machine can fry any time.

The approach presented in the article does not provide this guarantee. It provides low latency by writing to the local SSD, acknowledging the write to the client, and then performing "asynchronous replication" with some delay afterwards. If the server dies after the local SSD write, but before the WAL is shipped, the acknowledged write will be lost. It will still be on the local SSD, but that is not of much use if the server's mainboard is fried (long time to recovery) and another server with old data takes over as the source of truth.

This is why I think it's justified that some other commenters call this approach a "cache" when compared with a multi-AZ DB cluster doing synchronous replication.

The Litestream approach seems to provide roughly the same properties as postgres-on-localhost with async replication turned on. (I also wonder if that would be an interesting implementation of this approach for Fly.io -- it should provide similar microsecond latency while also providing all features that Postgres has.)

As I understand it, Fly.io provides Postgres with synchronous replication (kurt wrote "You can also configure your postgres to use synchronous replication", https://community.fly.io/t/early-look-postgresql-on-fly-we-w...), and https://fly.io/docs/reference/postgres/#high-availability explains that it uses Stolon, which does support synchronous replication if you turn it on. But the "Postgres on Fly" page doesn't seem to explain whether sync or async is the default, and how exactly I can turn on sync mode on Fly.

So I think it would be helpful if the article stated clearly "this is asynchronous replication", thus making clear that it will likely forget acknowledged writes on machine failure, and maybe link to Fly's Postgres offering that provides more guarantees.

Reply


@steve_gh 15 days

Replying to @dpeck 🎙

Thank you Ben! This is exactly what I need for the data science and analytics problems I work on. We import data from a variety of sources via an ETL process, but we want to distribute the data analytics to multiple read-only process nodes.

This gives is the speed is SQLite plus easy replication and a single source of truth.

Chapeau!!!

Reply


@kondro 16 days

Replying to @dpeck 🎙

Curious about the costs of this. Wouldn't it cost at least $13/month just in PutObject request costs to replicate Sqlite to S3 at the default of 1 sync per second? Or is it smart enough to only sync if there have been additions to the WAL?

Reply


@zsims 15 days

Replying to @dpeck 🎙

> It was reasonable to overlook this option 170 years ago, when the Rails Blog Tutorial was first written.

Woah. Rails is really old

Reply


@ok_dad 16 days

Replying to @dpeck 🎙

I was just about to start using this for a project, I hope the license won’t change.

Congrats to the author though, no matter what! I wish everyone could be so successful.

Reply


@wasd 16 days

Replying to @dpeck 🎙

Fly is putting together a pretty great team and interesting tech stack. It's the service I see as a true disruptor to Heroku because it's doing something novel (not just cheaper).

I'm still a little murky on the tradeoffs with Fly (and litestream). @ben / @fly, you should write a tutorial on hosting a todo app using rails with litestream and any expected hurdles at different levels of scale (maybe comparing to Heroku).

Reply


@pbowyer 16 days

Replying to @dpeck 🎙

Not surprised. Congratulations Ben!

Reply


@endisneigh 16 days

Replying to @dpeck 🎙

What’s an example of a popular app (more than 100K users) that uses lite stream? Curious to see how this looks like in production

Reply


@fareesh 15 days

Replying to @dpeck 🎙

For me the dream seems to be a relational, real-time (with optionally configurable JSON/HTML snippet updates going to client applications), with extremely good latency, offline sync, etc. Bonus if the client can pick the fields it wants a-la graphql. Some sort of Rails + Hotwire + Firebase combination which works with web pages and apps alike.

Reply


@kall 16 days

Replying to @dpeck 🎙

I am as obsessed with sub 100ms responses as the people at fly.io, so I think the one writer and many, many readers architecture is smart and fits quite a few applications. When litestream adds actual replication it will get really exciting.

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

That's... a lot of new applications these days.

Reply


@thdxr 16 days

Replying to @dpeck 🎙

in practice how do you make a single application node the writer?

do you now need your nodes to be clustered + electing a leader and shipping writes there?

know fly.io did this with PG + Elixir but BEAM makes this type of stuff pretty easy

Reply


@0xbadcafebee 15 days

Replying to @dpeck 🎙

Things I would like in a database:

- All changes stored as diff trees with signed cryptographic hashes. I want to check out the state of the world at a specific commit, write a change, a week later write another change, revert the first change 3 weeks later. And I want it atomic and side-loaded with no performance hit or downtime.

- Register a Linux container as a UDF or stored procedure. Use with pub/sub to create data-adjacent arbitrary data processing of realtime data

- Fine-grained cryptographically-verified least-privilege access control. No i/o without a valid short-lived key linked to rules allowing specific record access.

- Virtual filesystem. I want to ls /db/sql/SELECT/name/IN/mycorp/myproduct/mysite/users/logged-in/WHERE/Country/EQUALS/USA. (Yes, this is stupid, but I still want it. I don't want to ever have to figure out how to connect to another not-quite-compatible SQL database again.)

Reply


About Us

site design / logo © 2022 Box Piper