Hacker News Re-Imagined

Elfshaker: Version control system fine-tuned for binaries

  • 491 points
  • β€’ 11 hours ago

  • @jim90
  • Created a post
  • β€’ 102 comments

Elfshaker: Version control system fine-tuned for binaries


@mrich β€’ 10 hours

Replying to @jim90 πŸŽ™

I'm guessing this does not yield that high compression for release builds, where code can be optimized across translation units? Likewise when a commit changes a header that is included in many cpps?

Reply


@nh2 β€’ 10 hours

Replying to @jim90 πŸŽ™

I experimented with something similar with a Linux distribution's package binary cache.

Using `bup` (deduplicating backup tool using git packfile format) I deduplicated 4 Chromium builds into the size of 1. It could probably pack thousands into the size of a few.

Large download/storage requirements for updates are one of NixOS's few drawbacks, and I think deduplication could solve that pretty much completely.

Details: https://github.com/NixOS/nixpkgs/issues/89380

Reply


@veselink1 β€’ 10 hours

Replying to @jim90 πŸŽ™

An author here, we've opened a Q&A discussion on GitHub: https://github.com/elfshaker/elfshaker/discussions/58.

Reply


@thristian β€’ 11 hours

Replying to @jim90 πŸŽ™

This seems very much like the Git repository format, with loose objects being collected into compressed pack files - except I think Git has smarter heuristics about which files are likely to compress well together. It would be interesting to see a comparison between this tool and Git used to store the same collection of similar files.

Reply


@londons_explore β€’ 7 hours

Replying to @jim90 πŸŽ™

I'd like to see a version of this built into things like IPFS.

It seems obvious that whenever something is saved into IPFS, there might be a similar object already stored. If there is, go make a diff, and only store the diff.

Reply


@mal10c β€’ 7 hours

Replying to @jim90 πŸŽ™

This project reminded me of something I've been looking for for a while - although it's not exactly what I'm looking for...

I use SolidWorks PDM at work to control drawings, BOMs, test procedures, etc. In all honesty, PDM does an alright job when it works, but when I have problems with our local server, all hell breaks loose and worst case, the engineers can't move forward.

In that light, I'd love to switch to another option. Preferably something decentralized just to ensure we have more backups. Git almost gets us there but doesn't include things like "where used."

All that being said, am I overlooking some features of Elfshaker that would fit well into my hopes of finding an alternative to PDM?

I also see there's another HN thread that asks the question I'm asking - just not through the lens of Elfshaker: https://news.ycombinator.com/item?id=20644770

Reply


@erichocean β€’ 2 hours

Replying to @jim90 πŸŽ™

Seems like the Nix people would be interested in enabling this kind of thing for Nix packages…

Reply


@lxpz β€’ 10 hours

Replying to @jim90 πŸŽ™

This should be integrated with Cargo to reduce the size of the target directories which are becoming ridiculously large.

Reply


@lxe β€’ 5 hours

Replying to @jim90 πŸŽ™

> There are many files,

> Most of them don't change very often so there are a lot of duplicate files,

> When they do change, the deltas of the [binaries] are not huge.

We need this but for node_modules

Reply


@i_like_waiting β€’ 10 hours

Replying to @jim90 πŸŽ™

Thanks, seems like that could be good solution for storing of daily backups of DB. I didn't know I needed it but seems like I do.

Reply


@goodpoint β€’ 8 hours

Replying to @jim90 πŸŽ™

I'm surprised nobody mentioned git-annex. It does the same using git for metadata. It's extremely efficient.

Reply


@jankotek β€’ 10 hours

Replying to @jim90 πŸŽ™

Does it make a sense to turn it into fuse fs, with transparent deduplication?

Reply


@mhx77 β€’ 7 hours

Replying to @jim90 πŸŽ™

Somewhat related (and definitely born out of a very similar use case): https://github.com/mhx/dwarfs

I initially built this for having access to 1000+ Perl installations (spanning decades of Perl releases). The compression in this case is not quite as impressive (50 GiB to around 300 MiB), but access times are typically in the millisecond region.

Reply


@tttsxhub β€’ 9 hours

Replying to @jim90 πŸŽ™

Why does it depend on the CPU architecture?

Reply


@henvic β€’ 10 hours

Replying to @jim90 πŸŽ™

Interesting. I wonder if this can also be [ab]used to, say, deliver deltas of programs, so that you can have faster updates, but maybe it doesn't make sense.

https://en.wikipedia.org/wiki/Binary_delta_compression

Reply


@wlll β€’ 11 hours

Replying to @jim90 πŸŽ™

Related, and impressive: https://github.com/elfshaker/manyclangs

> manyclangs is a project enabling you to run any commit of clang within a few seconds, without having to build it.

> It provides elfshaker pack files, each containing ~2000 builds of LLVM packed into ~100MiB. Running any particular build takes about 4s.

Reply


@yincrash β€’ 9 hours

Replying to @jim90 πŸŽ™

Could this be useful for packing xcode's deriveddata folder for caching in ci builds?

Reply


@svilen_dobrev β€’ 9 hours

Replying to @jim90 πŸŽ™

will some of these work for (compressed) variants of audio? They're never same..

Reply


@cyounkins β€’ 8 hours

Replying to @jim90 πŸŽ™

Cool! I wonder how this would compare to ZFS deduplication.

Reply


@bogwog β€’ 10 hours

Replying to @jim90 πŸŽ™

Does this work well with image files? (PNG, JPEG, etc)

Reply


@ghoul2 β€’ 6 hours

Replying to @jim90 πŸŽ™

If I already have, lets say a 100MB pack file containing (say) 200 builds of clang and then I import the 201st build into that pack file - is it possible to send across a small delta of this new, updated pack file to someone else who already had the older pack file (with 200 builds) such that they can apply the delta to the old pack and get the new pack containing 201 builds?

Reply


@carlmr β€’ 11 hours

Replying to @jim90 πŸŽ™

I find the description a bit confusing, is there and example where we can see the usage?

Reply


@xpe β€’ 8 hours

Replying to @jim90 πŸŽ™

Never shake a baby elf!

Reply


@0942v8653 β€’ 9 hours

Replying to @jim90 πŸŽ™

Does it do any architecture-specific processing, i.e. BCJ filter? Or is there a generic version of this? The performance seems quite good.

Reply


@999900000999 β€’ 25 minutes

Replying to @jim90 πŸŽ™

Would love to see this work with Unity projects.

Right now git lfs takes up so much space when storing files locally.

Reply


@dilap β€’ 11 hours

Replying to @jim90 πŸŽ™

Huh, interesting, could you maybe use this as an in-repo alternative to something like git-lfs?

Reply


@axismundi β€’ 6 hours

Replying to @jim90 πŸŽ™

does it work on intel macs?

Reply


About Us

site design / logo Β© 2021 Box Piper