Comments (8)
Thanks. Being a bit of a kludger, I might just write a wrapper script that takes my list of n maildir folders, and invokes mdedup
the requisite n(n-1)/2 times to compare them all pairwise. Although n(n-1)/2 for my dataset is about 500, so it'll take a while to run.
from mail-deduplicate.
Thinking a bit further about those numbers, it looks like mdedup
is trying to hold the entire contents of the maildir tree in RAM at once, which is surprising given that what it's trying to do is compare hashes of certain headers.
from mail-deduplicate.
Yes, mail-deduplicate
implementation is quite naive and choke on a non-trivial size of mails. The goal of this CLI was first to have it work before making it performant. We haven't reach that stage yet that's why implementing a cache has been proposed for several years, see: #87.
I do not have any time to work on mail-deduplicate
right now. But feel free to propose PRs! :)
from mail-deduplicate.
When in doubt, brute force it. If it works, it's not a kludge. And machine time is cheaper than developer time. 😁
from mail-deduplicate.
Still, the commit history of that project indicate there's a non-null chance of me refreshing the code base once a year. So if your patient you might see a new release of mail-deduplucate
in a couple of months.
from mail-deduplicate.
@shirosaki just proposed PR #562 to reduce the memory usage of mail-deduplicate. I just merged it upstream and try to cut a release today.
from mail-deduplicate.
Just released mail-deduplicate 7.3.0, with performance enhancements from @shirosaki .
I will close this issue for now on then.
from mail-deduplicate.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
from mail-deduplicate.
Related Issues (20)
- `--help` option and naked `mdedup` calls must print the same help screen HOT 4
- Hardlink Dupes HOT 4
- warn users of the 3.x release of the unsupported status HOT 6
- Update GitHub project description and link HOT 4
- add pip to pyproject.toml? HOT 6
- Object has no attribute '_subdir' error HOT 8
- Add option to ignore single messages when performing any actions HOT 2
- AttributeError: 'MaildirDedupMail' object has no attribute '_subdir' HOT 2
- iteritems is python2-only HOT 3
- -s discard-newer -a delete-discarded isn't deleting any mail HOT 2
- `boltons.ecoutils.pprint` error on Python 3.10 HOT 7
- No docs HOT 4
- Broken links HOT 1
- Broken links HOT 1
- Broken links HOT 1
- Create a new, deduplicated mailbox with unique emails too (Documentation: What is "discarded"?) HOT 3
- TypeError: 'NoneType' object is not subscriptable (mail with no Date) HOT 1
- Broken links
- 🎁 Multiple strategies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mail-deduplicate.