Giter Site home page Giter Site logo

Comments (8)

pechfunk avatar pechfunk commented on August 25, 2024 2

It looks like the problem is in the DedupMail constructor which tries to auto-detect which of the superclasses is the one that contributes Message-ness.

    def __init__(self, message=None):
        """Initialize a pre-parsed ``Message`` instance the same way the default
        factory in Python's ``mailbox`` module does.
        """
        # Hunt down in our parent classes (but ourselve) the first one inheriting the
        # mailbox.Message class. That way we can get to the original factory.
        orig_message_klass = None
        for klass in inspect.getmro(self.__class__)[1:]:
            if issubclass(klass, mailbox.Message):
                orig_message_klass = klass
                break
        assert orig_message_klass

        # Call original object initialization from the right message class we
        # inherits from mailbox.Message.
        super(orig_message_klass, self).__init__(message)

Now when the search finds a Message-like class orig_message_klass, the super-call will ensure that the successor of orig_message_klass in the MRO will be called first. This means for Maildir messages that the plain Message ctor gets called, but MaildirMessage's not.

I've tried to repair the clever construction in PR #222 . I'm not sure that the cleverness is necessary here, with only a handful of message classes to support, and little innovation in the field of Mbox dialects going on in general. But at least mdedup runs for me again!

from mail-deduplicate.

alisraza avatar alisraza commented on August 25, 2024 1

I think this is likely the same as #135 as @kaz-yos pointed out.

Description:

Running the following command:

"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup \
	--input-format maildir \
	--size-threshold 0 \
	--content-threshold 0 \
	--strategy discard-all-but-one \
	--action move-selected \
	--export "$output_path" \
	--export-format maildir \
	--verbosity debug \
	"$mail_source_1" "$mail_source_2"

Yields (truncated output):

● Phase #3 - Perform action on selected mails
Perform move-selected action...
232 mails selected for action.
Creating new maildir box at [$output_path] ...
debug: Locking box...
debug: Move <MaildirDedupMail ["$mail_source_1"]:[NNNNNNNNNN].[NNNNN]_[NNN].["$hostname"],U=[NNN]> form ["$mail_source_1"] to ["$output_path"]...

With stacktrace:

  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/cli.py", line 388, in mdedup
    perform_action(dedup)
  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 114, in perform_action
    method(dedup)
  File "["$basedir"]/code/forks/mail-deduplicate/mail_deduplicate/action.py", line 62, in move_selected
    box.add(mail)
  File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 300, in add
    subdir = message.get_subdir()
  File "[~]/.pyenv/versions/3.7.10/lib/python3.7/mailbox.py", line 1537, in get_subdir
    return self._subdir
AttributeError: 'MaildirDedupMail' object has no attribute '_subdir'

Debugging Information:

Coding is a side-hobby and I haven't looked at python code for a while, but from stepping through the code, my best guess is that when the mail object is created as a subclass, it may be running the __init__ function from the python standard library's Message class rather than the MaildirMessage class, given the __init__ function for the MaildirMessage class is:

class MaildirMessage(Message):
    """Message with Maildir-specific properties."""

    _type_specific_attributes = ['_subdir', '_info', '_date']

    def __init__(self, message=None):
        """Initialize a MaildirMessage instance."""
        self._subdir = 'new'
        self._info = ''
        self._date = time.time()
        Message.__init__(self, message)

However, based on the stacktrace, when I look at action.py in the move_selected function:

def move_selected(dedup):
    # truncated [...]
            box.add(mail)
            dedup.sources[mail.source_path].remove(mail.mail_id)
            logger.info(f"{mail!r} copied.")
    # truncated [...]

When pausing at box.add(mail), not only does the box object have the mailbox.Maildir class, but the mail object has the MaildirDedupMail class, which appears to be correct, although it is indeed missing the mail._subdir attribute. I would need more time to look into how mail is instantiated, but I hope the information thus far is somewhat helpful. I may be slow to respond in the next few days, but I appreciate anyone who is able to look into this issue.

Additional Information:

Code running with cwd as "$basedir"/code/forks/mail-deduplicate.
Virtual environment created with poetry install in .venv subdir.

poetry --version
# Poetry version 1.1.4
python --version
# Python 3.7.10
pyenv version
# 3.7.10 (set by "$basedir"/code/forks/mail-deduplicate/.python-version)
"$basedir"/code/forks/mail-deduplicate/.venv/bin/mdedup --version
# mdedup 6.1.3
# {'username': '-', 'guid': '82f4afc3ac75c9fa8c7849ab3364986', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Linux', 'node': '-', 'release': '5.10.16-arch1-1', 'version': '#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000', 'machine': 'x86_64', 'processor': ''}, 'linux_dist_name': 'arch', 'linux_dist_version': 'Arch', 'cpu_count': 8, 'fs_encoding': 'utf-8', 'ulimit_soft': 8192, 'ulimit_hard': 524288, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]', 'compiler': 'GCC 10.2.0', 'build_date': 'Feb 18 2021 17:50:07', 'version_info': [3, 7, 10, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1j  16 Feb 2021', 'expat': 'expat_2.2.8', 'sqlite': '3.34.1', 'tkinter': '', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-19 10:10:34.969315', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}

For convenience, corresponding JSON:

{
	"username": "-",
	"guid": "82f4afc3ac75c9fa8c7849ab3364986",
	"hostname": "-",
	"hostfqdn": "-",
	"uname": {
		"system": "Linux",
		"node": "-",
		"release": "5.10.16-arch1-1",
		"version": "#1 SMP PREEMPT Sat, 13 Feb 2021 20:50:18 +0000",
		"machine": "x86_64",
		"processor": ""
	},
	"linux_dist_name": "arch",
	"linux_dist_version": "Arch",
	"cpu_count": 8,
	"fs_encoding": "utf-8",
	"ulimit_soft": 8192,
	"ulimit_hard": 524288,
	"cwd": "-",
	"umask": "0o2",
	"python": {
		"argv": "-",
		"bin": "-",
		"version": "3.7.10 (default, Feb 18 2021, 17:50:07) [GCC 10.2.0]",
		"compiler": "GCC 10.2.0",
		"build_date": "Feb 18 2021 17:50:07",
		"version_info": [3, 7, 10, "final", 0],
		"features": {
			"openssl": "OpenSSL 1.1.1j  16 Feb 2021",
			"expat": "expat_2.2.8",
			"sqlite": "3.34.1",
			"tkinter": "",
			"zlib": "1.2.11",
			"unicode_wide": true,
			"readline": true,
			"64bit": true,
			"ipv6": true,
			"threading": true,
			"urandom": true
		}
	},
	"time_utc": "2021-02-19 10:10:34.969315",
	"time_utc_offset": -5.0,
	"_eco_version": "1.0.1"
}

Thank you!

from mail-deduplicate.

kdeldycke avatar kdeldycke commented on August 25, 2024 1

little innovation in the field of Mbox dialects going on in general

Indeed! I apologize for that part being well over-engineered. I wanted that part to be future-proof, why the vague idea of extending it to other source of mails (Gmail? S3?). But it ended up increasing complexity with little benefits.

Anyway, thanks a lot @pechfunk for diving deep into the root cause and proposing a fix! I just merged it back upstream, and try to cur a new release.

from mail-deduplicate.

dschrempf avatar dschrempf commented on August 25, 2024

So it turns out this was caused by me executing mdep from outside the virtual environment. Pretty stupid that this can actually be done :).

from mail-deduplicate.

dschrempf avatar dschrempf commented on August 25, 2024

Sorry I have to reopen. This was not my fault. The error does not happen when using -n.

from mail-deduplicate.

kaz-yos avatar kaz-yos commented on August 25, 2024

Same as #135?

I got the same error with version 6.1.2.

mdedup 6.1.2
{'username': '-', 'guid': '7d002aa8ff457a7721f6a7ad164505f', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Darwin', 'node': '-', 'release': '20.3.0', 'version': 'Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64', 'machine': 'x86_64', 'processor': 'i386'}, 'linux_dist_name': '', 'linux_dist_version': '', 'cpu_count': 12, 'fs_encoding': 'utf-8', 'ulimit_soft': 256, 'ulimit_hard': 9223372036854775807, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.7.1 (default, Oct 23 2018, 14:07:42) [Clang 4.0.1 (tags/RELEASE_401/final)]', 'compiler': 'Clang 4.0.1 (tags/RELEASE_401/final)', 'build_date': 'Oct 23 2018 14:07:42', 'version_info': [3, 7, 1, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1d  10 Sep 2019', 'expat': 'expat_2.2.6', 'sqlite': '3.25.3', 'tkinter': '8.6', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-02-10 23:31:27.276025', 'time_utc_offset': -5.0, '_eco_version': '1.0.1'}

from mail-deduplicate.

kaz-yos avatar kaz-yos commented on August 25, 2024

@alisraza, thanks for the detailed investigation!

from mail-deduplicate.

github-actions avatar github-actions commented on August 25, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from mail-deduplicate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.