I'm guessing this does not yield that high compression for release builds, where code can be optimized across translation units? Likewise when a commit changes a header that is included in many cpps?Reply
I experimented with something similar with a Linux distribution's package binary cache.
Using `bup` (deduplicating backup tool using git packfile format) I deduplicated 4 Chromium builds into the size of 1. It could probably pack thousands into the size of a few.
Large download/storage requirements for updates are one of NixOS's few drawbacks, and I think deduplication could solve that pretty much completely.Reply
An author here, we've opened a Q&A discussion on GitHub: https://github.com/elfshaker/elfshaker/discussions/58.Reply
This seems very much like the Git repository format, with loose objects being collected into compressed pack files - except I think Git has smarter heuristics about which files are likely to compress well together. It would be interesting to see a comparison between this tool and Git used to store the same collection of similar files.Reply
I'd like to see a version of this built into things like IPFS.
It seems obvious that whenever something is saved into IPFS, there might be a similar object already stored. If there is, go make a diff, and only store the diff.Reply
This project reminded me of something I've been looking for for a while - although it's not exactly what I'm looking for...
I use SolidWorks PDM at work to control drawings, BOMs, test procedures, etc. In all honesty, PDM does an alright job when it works, but when I have problems with our local server, all hell breaks loose and worst case, the engineers can't move forward.
In that light, I'd love to switch to another option. Preferably something decentralized just to ensure we have more backups. Git almost gets us there but doesn't include things like "where used."
All that being said, am I overlooking some features of Elfshaker that would fit well into my hopes of finding an alternative to PDM?
I also see there's another HN thread that asks the question I'm asking - just not through the lens of Elfshaker: https://news.ycombinator.com/item?id=20644770Reply
Seems like the Nix people would be interested in enabling this kind of thing for Nix packages…Reply
This should be integrated with Cargo to reduce the size of the target directories which are becoming ridiculously large.Reply
> There are many files,
> Most of them don't change very often so there are a lot of duplicate files,
> When they do change, the deltas of the [binaries] are not huge.
We need this but for node_modulesReply
Thanks, seems like that could be good solution for storing of daily backups of DB. I didn't know I needed it but seems like I do.Reply
I'm surprised nobody mentioned git-annex. It does the same using git for metadata. It's extremely efficient.Reply
Does it make a sense to turn it into fuse fs, with transparent deduplication?Reply
Somewhat related (and definitely born out of a very similar use case): https://github.com/mhx/dwarfs
I initially built this for having access to 1000+ Perl installations (spanning decades of Perl releases). The compression in this case is not quite as impressive (50 GiB to around 300 MiB), but access times are typically in the millisecond region.Reply
Why does it depend on the CPU architecture?Reply
Interesting. I wonder if this can also be [ab]used to, say, deliver deltas of programs, so that you can have faster updates, but maybe it doesn't make sense.Reply
Related, and impressive: https://github.com/elfshaker/manyclangs
> manyclangs is a project enabling you to run any commit of clang within a few seconds, without having to build it.
> It provides elfshaker pack files, each containing ~2000 builds of LLVM packed into ~100MiB. Running any particular build takes about 4s.Reply
Could this be useful for packing xcode's deriveddata folder for caching in ci builds?Reply
will some of these work for (compressed) variants of audio? They're never same..Reply
Cool! I wonder how this would compare to ZFS deduplication.Reply
Does this work well with image files? (PNG, JPEG, etc)Reply
If I already have, lets say a 100MB pack file containing (say) 200 builds of clang and then I import the 201st build into that pack file - is it possible to send across a small delta of this new, updated pack file to someone else who already had the older pack file (with 200 builds) such that they can apply the delta to the old pack and get the new pack containing 201 builds?Reply
I find the description a bit confusing, is there and example where we can see the usage?Reply
Never shake a baby elf!Reply
Does it do any architecture-specific processing, i.e. BCJ filter? Or is there a generic version of this? The performance seems quite good.Reply
Would love to see this work with Unity projects.
Right now git lfs takes up so much space when storing files locally.Reply
Huh, interesting, could you maybe use this as an in-repo alternative to something like git-lfs?Reply
does it work on intel macs?Reply