So, I’m selfhosting immich, the issue is we tend to take a lot of pictures of the same scene/thing to later pick the best, and well, we can have 5~10 photos which are basically duplicates but not quite.
Some duplicate finding programs put those images at 95% or more similarity.

I’m wondering if there’s any way, probably at file system level, for the same images to be compressed together.
Maybe deduplication?
Have any of you guys handled a similar situation?

  • Lodra@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 days ago

    That basic idea is roughly how compression works in general. Think zip, tar, etc. files. Identify snippets of highly used byte sequences and create a “map of where each sequence is used. These methods work great on simple types of data like text files where there’s a lot of repetition. Photos have a lot more randomness and tend not to compress as well. At least not so simply.

    You could apply the same methods to multiple image files but I think you’ll run into the same challenge. They won’t compress very well. So you’d have to come up with a more nuanced strategy. It’s a fascinating idea that’s worth exploring. But you’re definitely in the realm of advanced algorithms, file formats, and storage devices.

    That’s apparently my long response for “the other responses are right”

    • 31337@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 days ago

      Yeah, the image bytes are random because they’re already compressed (unless they’re bitmaps, which is not likely).