Giter Site home page Giter Site logo

Comments (8)

danielhatton avatar danielhatton commented on July 23, 2024 1

Thanks. Being a bit of a kludger, I might just write a wrapper script that takes my list of n maildir folders, and invokes mdedup the requisite n(n-1)/2 times to compare them all pairwise. Although n(n-1)/2 for my dataset is about 500, so it'll take a while to run.

from mail-deduplicate.

danielhatton avatar danielhatton commented on July 23, 2024

Thinking a bit further about those numbers, it looks like mdedup is trying to hold the entire contents of the maildir tree in RAM at once, which is surprising given that what it's trying to do is compare hashes of certain headers.

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on July 23, 2024

Yes, mail-deduplicate implementation is quite naive and choke on a non-trivial size of mails. The goal of this CLI was first to have it work before making it performant. We haven't reach that stage yet that's why implementing a cache has been proposed for several years, see: #87.

I do not have any time to work on mail-deduplicate right now. But feel free to propose PRs! :)

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on July 23, 2024

When in doubt, brute force it. If it works, it's not a kludge. And machine time is cheaper than developer time. 😁

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on July 23, 2024

Still, the commit history of that project indicate there's a non-null chance of me refreshing the code base once a year. So if your patient you might see a new release of mail-deduplucate in a couple of months.

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on July 23, 2024

@shirosaki just proposed PR #562 to reduce the memory usage of mail-deduplicate. I just merged it upstream and try to cut a release today.

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on July 23, 2024

Just released mail-deduplicate 7.3.0, with performance enhancements from @shirosaki .

I will close this issue for now on then.

from mail-deduplicate.

github-actions avatar github-actions commented on July 23, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from mail-deduplicate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.