Giter Site home page Giter Site logo

Need more flexibility in defining Subject Line and S-D regex in .rc file (Was: RegEx suddenly does not match anymore) about dupreport HOT 16 CLOSED

MrFloppy avatar MrFloppy commented on July 20, 2024
Need more flexibility in defining Subject Line and S-D regex in .rc file (Was: RegEx suddenly does not match anymore)

from dupreport.

Comments (16)

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Hi @MrFloppy , apologies for the error.

Looking at the source code, the comments aren't very helpful:

# Correct subject but delimeter not found. Something is wrong.

Can you please paste the log file section where the error is occurring, starting with the line that reads:

Processing next message on server XXX. Protocol=XXX' and ending with the same line after the error occurs? That will give the full log for processing that particular email message.

Also, please include the dupreport.rc config lines for the following:

subjectregex = XXX
srcregex = XXX
destregex = XXX
srcdestdelimiter =XXX

Let's see if that gives any better clue as to what's happening.

from dupreport.

MrFloppy avatar MrFloppy commented on July 20, 2024

Nothing to apologize for ;). Perhaps (and most probable) the error is somewhere on my side.

Yes, I went through the sources, too.

Here is the extract from the log file:

[2022-02-14T17:14:48.687994][NOTICE][EmailServer][processNextMessage]Processing next message on server XXX. Protocol=imap
[2022-02-14T17:14:48.688020][NOTICE][EmailServer][connect]Connecting to email server 'XXX'
[2022-02-14T17:14:48.708485][NOTICE][EmailServer][extractHeaders]Extracting headers from (Date: Mon, 14 Feb 2022 17:05:14 +0100
Subject: Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik
Message-Id: <ARA44UDU2GU4.YF02Q8K44EZH3@6b40b52cee17>
Content-Transfer-Encoding: 7bit

)
[2022-02-14T17:14:48.708557][NOTICE][EmailServer][extractHeaders]Header fields extracted: [{'date': 'Mon, 14 Feb 2022 17:05:14 +0100', 'subject': 'Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik', 'message-id': '<ARA44U$
[2022-02-14T17:14:48.709589][NOTICE][EmailServer][processNextMessage]Correct subject 'Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik' but regex doesn't match ^Duplicati (Success|Error|Warn), Backup report for (\w*)-(\w$
[

I stated the config options above. Somehow they got scrambled:

subjectregex = ^Duplicati (Success|Error|Warn), Backup report for
srcregex = \w*
destregex = \w*
srcdestdelimiter = -

Thank you for your prompt response and help!

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Would it be possible to run dupReport again using the '-v7' option and post the same segment? That will give a fuller (debug) log picture and hopefully help uncover the problem.

After you've done that, you might try changing your options in the .rc file to the following:

srcregex = \w+
destregex = \w+

That slightly changes the way the program searches for the source and destination. That's what I use on my systems and it seems to work well.

from dupreport.

MrFloppy avatar MrFloppy commented on July 20, 2024

First I changed the parameters. The result stays the same.

Below the part of the log file:

[2022-02-15T09:02:47.145099][NOTICE][EmailServer][connect]Connecting to email server 'XXX'
[2022-02-15T09:02:47.145153][DEBUG][EmailServer][connect]serverconnect=[<imaplib.IMAP4_SSL object at 0x7f6543111850>] keepalive=[False]
[2022-02-15T09:02:47.170080][DEBUG][EmailServer][processNextMessage]Server.fetch(): retVal=[OK] data=[[(b'1 (BODY[HEADER.FIELDS (DATE SUBJECT MESSAGE-ID CONTENT-TRANSFER-ENCODING)] {204}', b'Date: Tue, 15 Feb 2022 09:02:33 +0100\r\nSubject: Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik\r\nMessage-Id: <IRFXIPP13GU4.6QXS9AZ4L14J2@6b40b52cee17>\r\nContent-Transfer-Encoding: 7bit\r\n\r\n'), b')']]
[2022-02-15T09:02:47.170135][NOTICE][EmailServer][extractHeaders]Extracting headers from (Date: Tue, 15 Feb 2022 09:02:33 +0100
Subject: Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik
Message-Id: <IRFXIPP13GU4.6QXS9AZ4L14J2@6b40b52cee17>
Content-Transfer-Encoding: 7bit

)
[2022-02-15T09:02:47.170188][NOTICE][EmailServer][extractHeaders]Header fields extracted: [{'date': 'Tue, 15 Feb 2022 09:02:33 +0100', 'subject': 'Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik', 'message-id': '<IRFXIPP13GU4.6QXS9AZ4L14J2@6b40b52cee17>', 'content-transfer-encoding': '7bit'}]
[2022-02-15T09:02:47.170222][DEBUG][EmailServer][processNextMessage]Next Message: headers=[{'date': 'Tue, 15 Feb 2022 09:02:33 +0100', 'subject': 'Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik', 'messageId': '<IRFXIPP13GU4.6QXS9AZ4L14J2@6b40b52cee17>', 'content-transfer-encoding': '7bit'}]
[2022-02-15T09:02:47.170668][DEBUG][EmailServer][processNextMessage]subjectregex='[^Duplicati (Success|Error|Warn), Backup report for]' srcregex='[\w+]' srcdestdelimiter='[-]' destregex='[\w+]'
[2022-02-15T09:02:47.171203][NOTICE][EmailServer][processNextMessage]Correct subject 'Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik' but regex doesn't match ^Duplicati (Success|Error|Warn), Backup report for (\w+)-(\w+) . Skipping message.
[2022-02-15T09:02:47.171241][NOTICE][EmailServer][processNextMessage]Processing next message on server XXX. Protocol=imap

I used regex101 to validate the regex and it does match on that side. Somehow the regex seems to be working incorrectly.

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

I'll take a look at what you sent. Is it just this one message that's causing problems or is the program failing for all emails you are receiving?

from dupreport.

MrFloppy avatar MrFloppy commented on July 20, 2024

It is affecting all messages/emails. For the sake of overview, I ran just one job manually and dupReport also manually afterwards.

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

OK, so I have found the problem. If you change your subjectregex to the following:

subjectregex = ^Duplicati \w*, Backup report for

That should get you up and running again. The underlying root cause is an interesting use case I hadn't considered before, but that will take some time to unravel and recode. Hopefully this fix works for you. Please let me know.

from dupreport.

MrFloppy avatar MrFloppy commented on July 20, 2024

Thanks for your great support. Now it works again for the time being.

I am just wondering, that an update broke this, as the same configuration has been working every day for several months. Maybe there is a library, that causes this problem?

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

In the latest update I changed the way the subject line is parsed based on the suggestion of a user. Unfortunately I didn't think of enough use cases to test (such as yours) and that directly led to the bug.

If you're interested in more detail:

Your subjectregex gets translated within the program as ^Duplicati (Success|Error|Warn), Backup report for (\w+)-(\w+). In this specification it is looking to extract 3 things (noted by the parentheses in the spec):

  1. (Success|Error|Warn)
  2. (\w+)
  3. (\w+)

When the program parses the incoming email subject ("Duplicati Success, Backup report for OMV_traefik-HiDrive_traefik") it does find those three things:

  1. Success
  2. OMV_traefik
  3. HiDrive_traefik

Unfortunately, dupReport only wants to see two things matched (the source and destination), not three, so it throws the error and moves on. By changing the subjectregex to ^Duplicati \w*, Backup report for, the full translation internally becomes: ^Duplicati \w*, Backup report for (\w+)-(\w+). Thus it is only trying to extract two things instead of three, and the code starts working again.

How I handle this in the code is something I didn't consider before (the arrangement of the job name in the subject and the addition of additional elements, like job status). This work-around will keep you going while I think of a better long-term fix.

from dupreport.

MrFloppy avatar MrFloppy commented on July 20, 2024

Thanks for the detailled explanation. That is totally what I expected as a reason ;). If I can be of any help, please let me know!

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Issue summary:

Prior to 3.0.7 (Issue #174) subject parsing was handled in 3 separate parts:

  1. Extract subject regex
  2. Extract source regex
  3. Extract destination regex

3.0.7 optimized the parsing process and now assumes the subject is in the form:

Subject: <subject regex> <source><delimiter><destination>

However, if the source-destination comes before the subject regex the program will be unable to parse the subject.

However, this is more of a theoretical problem. It it not an issue anyone has asked for. Therefore, the issue will be closed until such time as it becomes a problem.

from dupreport.

DorianVasco avatar DorianVasco commented on July 20, 2024

I do not understand the regex rules. From the log:
Correct subject 'Duplicati Success, Backup report for Test testordner-nextcloud' but regex doesn't match ^Duplicati \w*, Backup report for (\w+)-(\w+) . Skipping message.

My cf has included the values:

subjectregex = ^Duplicati \w*, Backup report for
srcregex = \w+
destregex = \w+
srcdestdelimiter = -

Regex Tester looks actually ok:
grafik

What else can I try?

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Looks good to me as well. Can you re-run the program in verbose mode (-v7) and post the log output? That might give a clue as to what's happening.

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Ah, figured it out.

Your regex is : '^Duplicati \w*, Backup report for (\w+)-(\w+)'

But your subject line is: 'Duplicati Success, Backup report for Test testordner-nextcloud

The extra "Test" in the subject is causing the regex to fail. Take that out and the subject line should be fully parsable. Please let me know if that works.

HG

from dupreport.

HandyGuySoftware avatar HandyGuySoftware commented on July 20, 2024

Hi Dorian, please let me know if your're still having the regex problem with your email subjects. I hope my suggestion allowed you to fix it.

from dupreport.

DorianVasco avatar DorianVasco commented on July 20, 2024

I do not understand the regex rules. From the log:
Correct subject 'Duplicati Success, Backup report for Test testordner-nextcloud' but regex doesn't match ^Duplicati \w*, Backup report for (\w+)-(\w+) . Skipping message.

My cf has included the values:

subjectregex = ^Duplicati \w*, Backup report for
srcregex = \w+
destregex = \w+
srcdestdelimiter = -

Regex Tester looks actually ok:
grafik

What else can I try?

Ah, figured it out.

Your regex is : '^Duplicati \w*, Backup report for (\w+)-(\w+)'

But your subject line is: 'Duplicati Success, Backup report for Test testordner-nextcloud

The extra "Test" in the subject is causing the regex to fail. Take that out and the subject line should be fully parsable. Please let me know if that works.

HG

Indeed, that seems to fix it. Thanks a lot, I am really not much into regex :)

from dupreport.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.