spamexperts / pyzor Goto Github PK

View Code? Open in Web Editor NEW

137.0 137.0 31.0 756 KB

Pyzor is a Python implementation of a spam-blocking networked system that use spam signatures to identify them.

License: GNU General Public License v2.0

Python 98.82% Shell 0.28% CSS 0.16% HTML 0.74%

pyzor's People

Contributors

Stargazers

Watchers

pyzor's Issues

Offer simple web service for whitelist requests

Since the public server doesn't permit unauthenticated whitelisting, it would be useful for there to be some way for people to request whitelisting other than emailing the mailing list.

This could ask for the exact message and the digest (and generate an error if they do not match) so that it's clear that the message is legitimately ham (and we can provide assurances around privacy).

I'm thinking of just a simple page asking for the relevant information (with the digest check being the only really dynamic bit) and then emailing the developers with the appropriate information. Since we're spread around the world, this ought to get reasonably quick action.

Ensure that we close all file pointer explicitly

We need to ensure that we close the all file pointers explicitly rather than relying on auto-closing, which only works under CPython.

For example this is currently causing an issue when running under PyPy because the PID file is not being closed, and the data is not written to disk.

Non-breaking spaces

I recently received a spam containing a non-breaking space (encoded as =C2=A0 in quoted-printable UTF-8 if that is relevant). When running pyzor predigest, the non-breaking space is kept in the predigest output. I have no idea if spammers do this but they could randomly replace spaces with non-breaking spaces before sending mail to generate a different fingerprint each time and evade detection.

I believe that simply changing

    ws_ptrn = re.compile(r'\s')

    ws_ptrn = re.compile(r'\s', flags=re.UNICODE)

would address this (including all the other unicode space characters), but at the cost of breaking compatibility with signatures from older versions of pyzor.

Inconsistency between accounts and addresses

The pyzor client supports address that have port as string, for example:

>>> str(client.ping(address=("public.pyzor.org", "24441")))
'Code: 200\nDiag: OK\nPV: 2.1\nThread: 65482\n\n'

But this must match the accounts, for example if the accounts uses integers instead, then the correct account is not used. The Client API should be more malleable with this.

Add support for one-step increments to the Redis engine.

I.e. in the same way we added support for the MySQL engine in #23 but for the Redis engine.

In the current way the records are encoded in the Redis engine, one-step increments are not possible. We'll need to change the records from string to hashes, and also provide a way to migrate the database.

The existing migration script can be used. A simple check should be performed when starting the pyzord server looking for a "version" entry in the database and checking if it matches the current implementation. If not then a error message should be displayed with instructions on migrating the database.

Improve Pyzor client error codes

We should improve the pyzor client script to return appropriate error codes when the server is for example unreachable. So the result is easily parsed.

For example in case of timeout, the pyzor script does not output anything:

$ pyzor -d ping
2014-07-21 15:23:14,018 WARNING No accounts are setup.  All commands will be executed by the anonymous user.
2014-07-21 15:23:14,018 DEBUG sending: 'Op: ping\nThread: 63596\nPV: 2.1\nUser: anonymous\nTime: 1405945394\nSig: b9e0f69deb13376b4517911e09f01abd526f39e7\n\n'
2014-07-21 15:23:19,024 ERROR ('127.0.0.1', 24441)  TimeoutError: Reading response timed-out.

Or in case the command is not valid:

$ pyzor -d pign
2014-07-21 15:26:53,056 WARNING No accounts are setup.  All commands will be executed by the anonymous user.
2014-07-21 15:26:53,056 ERROR Unknown command: pign

AttributeError: 'module' object has no attribute 'get_digest'

on http://pyzor.readthedocs.org/en/release-1-0-0/pyzor.client.html the documentation says:

>>> digest = pyzor.digest.get_digest(msg)

I get
>>> import pyzor.digest
>>> pyzor.digest.get_digest
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'get_digest'

WL-Entered is always equal to Entered

_1 Upvote_ This should reflect the the date that the first whitelist occured not the date when the message was first reported as spam.

For example:

$ pyzor info < testausta.eml 
public.pyzor.org:24441  (200, 'OK')
    Count: 6
    Entered: Thu Sep 25 02:39:13 2014
    Updated: Sat Jan 17 16:52:32 2015
    WL-Count: 12
    WL-Entered: Thu Sep 25 02:39:13 2014
    WL-Updated: Thu Jan 29 16:15:45 2015

Create a base abstract class for server engines

The servers engines in the pyzor.engines package all respect a certain pattern, and must implement certain methods.

In order to make the package more consistent, and to make adding more engines easier, we should create a base abstract class from which all engines should inherit.

gdbm: ValueError: time data ... does not match format '%Y-%m-%d %H:%M:%S.%f'

The gdbm backend assumes that dates are stored in the format "%Y-%m-%d %H:%M:%S.%f" which is usually true:

str(datetime.datetime(2014, 8, 14, 11, 42, 17, 1337))
'2014-08-14 11:42:17.001337'

However, if a record is inserted exactly "on the second", the microseconds are removed:

str(datetime.datetime(2014, 8, 14, 11, 42, 17, 0))
'2014-08-14 11:42:17'

on busy servers this happens sometimes, and after a restart, the database can not be loaded:

Traceback (most recent call last):
  File "/usr/bin/pyzord", line 4, in <module>
    __import__('pkg_resources').run_script('pyzor==0.8.0', 'pyzord')
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 534, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/lib/python2.7/site-packages/pyzor-0.8.0-py2.7.egg/EGG-INFO/scripts/pyzord", line 389, in <module>

  File "/usr/lib/python2.7/site-packages/pyzor-0.8.0-py2.7.egg/EGG-INFO/scripts/pyzord", line 363, in main

  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 40, in __init__
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 100, in start_reorganizing
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 65, in apply_method
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 111, in _really_reorganize
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 71, in _really_getitem
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 143, in decode_record
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 162, in decode_record_1
  File "build/bdist.linux-x86_64/egg/pyzor/engines/gdbm_.py", line 23, in <lambda>
  File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '2014-07-03 14:59:31' does not match format '%Y-%m-%d %H:%M:%S.%f'

Can't run pyzor

I installed this on Ubuntu, and am just trying to get this to run.

I tried installing via PIP, and it said it worked, but didn't actually....

$ pip install pyzor
Downloading/unpacking pyzor
  Downloading pyzor-1.0.0.tar.gz
  Running setup.py (path:/tmp/pip-build-L9bD5P/pyzor/setup.py) egg_info for package pyzor

Installing collected packages: pyzor
  Running setup.py install for pyzor
    changing mode of build/scripts-2.7/pyzor from 664 to 775
    changing mode of build/scripts-2.7/pyzord from 664 to 775
    changing mode of build/scripts-2.7/pyzor-migrate from 664 to 775

    changing mode of /home/ace/.local/bin/pyzor-migrate to 775
    changing mode of /home/ace/.local/bin/pyzor to 775
    changing mode of /home/ace/.local/bin/pyzord to 775
Successfully installed pyzor
Cleaning up...

Then got this:

$ pyzor
The program 'pyzor' is currently not installed. You can install it by typing:
sudo apt-get install pyzor

So i did what it suggested:

$ sudo apt-get install pyzor

And it seemed to install:

$ sudo apt-get install pyzor
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  python-gdbm python-support
Suggested packages:
  python-gdbm-dbg
The following NEW packages will be installed:
  python-gdbm python-support pyzor
0 upgraded, 3 newly installed, 0 to remove and 3 not upgraded.
Need to get 67.6 kB of archives.
After this operation, 428 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://us.archive.ubuntu.com/ubuntu/ wily/main python-gdbm amd64 2.7.9-1 [11.9 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ wily/universe python-support all 1.0.15 [26.7 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu/ wily/universe pyzor all 1:0.5.0-2fakesync1 [29.1 kB]
Fetched 67.6 kB in 0s (466 kB/s)
Selecting previously unselected package python-gdbm.
(Reading database ... 211423 files and directories currently installed.)
Preparing to unpack .../python-gdbm_2.7.9-1_amd64.deb ...
Unpacking python-gdbm (2.7.9-1) ...
Selecting previously unselected package python-support.
Preparing to unpack .../python-support_1.0.15_all.deb ...
Unpacking python-support (1.0.15) ...
Selecting previously unselected package pyzor.
Preparing to unpack .../pyzor_1%3a0.5.0-2fakesync1_all.deb ...
Unpacking pyzor (1:0.5.0-2fakesync1) ...
Processing triggers for man-db (2.7.4-1) ...
Setting up python-gdbm (2.7.9-1) ...
Setting up python-support (1.0.15) ...
Setting up pyzor (1:0.5.0-2fakesync1) ...
Processing triggers for python-support (1.0.15) ...

But when I tried to run anything, I get this:

$ pyzor ping
Traceback (most recent call last):
  File "/usr/bin/pyzor", line 8, in <module>
    pyzor.client.run()
AttributeError: 'module' object has no attribute 'run'

Am I doing something wrong or is there an error? I'm running Ubuntu 15.10.

Script to migrate data between engine types.

It would be useful to have a script that would allow migrating data between various engine types. (for e.g. migrating the MySQL data to redis)

This doesn't necessarily have to be deployed.

nice move

Thank you for moving here.

Add support for logging to a Sentry server

It would be nice if you could set a DSN for a Sentry server in the configuration and, if Raven was installed, this would add logging to that server (perhaps also a log level in the configuration).

Consider switching to python3.3

Now that the code can work with python3.3, considering that the 2to3 has been ran before, we can consider to switch to code to be native in python3.3 and support python2.7 with the 3to2 tool.

This would make supporting both versions considerably much easier, because there are more (good) limitations in python3 (especially in the differentiation between unicode and byte array).

Update release

Hello.

Please add new release to https://github.com/SpamExperts/pyzor/releases.

Add support for pre-forking

It seems like handling request in separate threads/processes doesn't work well under real traffic conditions. (Although benchmarking PyPy + multi-threading did show some promising results)

We should try and add support for pre-forking, and see how that performs. This does mean will likely need to hack around SocketServer.UDPServer, or even drop using completely and implement our own version.

Do increments (report/whitelist) a single step.

We currently increment the report and whitelist count in two steps. First we get the record from the database engine and then reinsert it. Some back-end engines (such as MySQL, and Redis if we use hashes instead of strings) will support doing this a single step.

We should change the way the commands are dispatched and have the MySQL engine use this enhancement.

We'll do MySQL first because it's easiest, hopefully in time for 0.9 release. With Redis we'll need to change the data structure, and we'll also have to provide a migration tool. (we'll do that as well in a future version).

Create a suite of benchmarking and stress tests for the pyzor server

In order to evaluate various improvements and new technologies we are considering adding to the pyzor server we would require a suite of tests.

Test fragility because send() is not mocked

This build broke, not because of the change, but because of a timing issue.

The tests don’t mock out the send() method, so there’s an assumption that the send() will take place in the same second as the expected request is constructed. That’s fragile, so it will break some times, like here.

We probably don’t want to mock send(), because that will change what the tests are testing too much. It might be simplest to mock time.time() itself.

Delisting

Sorry if this is the wrong place to ask.

I got in report that I am listed in Pyzor:
-1.985 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)

How can I request delisting?
I am lost in this technology.

Delist from Pyzor

Hello, I have an issue
Checking mail-tester.com shows me that I'm listed in Pyzor.
"PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)"

But I don't have the required information to complete the white list form. (http://public.pyzor.org/whitelist/)
I need the Pyzor digest, and Raw message. I don't have those, and I don't know where to find that information.

Is there anyone I could contact for help on this, or any way to get the required information?

Whitelisting problem

My server is listed as spam on pyzor , when i try to whitelist it from your form it respond : Message not reported as spam. I don't now how to do that. What's wrong ?

Thanks

Adjust the digest algorithm to get some unique data when the normalized message is empty

There are various situations where after normalization the message ends up empty. For example this happens when the message is short and/or only contains links.

In this case we would still want to attempt to create a unique signature for the messages. This is, however, difficult because we don't have too much to go on.

time data '2016-03-23T15:23' does not match format '%Y-%m-%d %H:%M'

SERIALIZER.PY

from rest_framework import serializers
from sponsorapp.models import Sponsormodel
from datetime import datetime
import datetime as dtime
from dateutil.parser import parse
import pdb
from rest_framework.serializers import ValidationError 


class SponsorSerializer(serializers.ModelSerializer):

    offer_date = serializers.DateTimeField(default =None)
    sponsorship_date = serializers.DateTimeField(default =None)


    class Meta:
        model = Sponsormodel
        fields = ('sponsorship_date' , 'offername' , 'offer_date')
        #   pdb.set_trace()

    def validate_offername(self , value):
        offername = value
        if(len(offername) > 9):
            raise ValidationError("Please enter number of characters less than 9 or 9 ")

        return value    


    def validate_sponsorship_date(self , value):

        data = self.get_initial()
        offerdate = data.get('offer_date')


        offerdate = dtime.datetime.strptime(offerdate,'%Y-%m-%d %H:%M').strftime('%Y-%m-%d %H:%M')
        sponsorshipdate = value


        if(sponsorshipdate > offerdate):
            raise ValidationError("Check the dates")

        return value

MODELS.PY



from django.db import models



class Sponsormodel(models.Model):
    sponsorship_date = models.DateTimeField(default =None)
    offername = models.CharField(max_length = 100)
    offer_date = models.DateTimeField(default = None)

    def __str__(self):
        return self.offername

crash while parsing spam message

Hi,

i got the following traceback while trying to check a message with pyzor.

$ pyzor -s mbox check < spam.mbox
Traceback (most recent call last):
File "/usr/bin/pyzor", line 408, in
main()
File "/usr/bin/pyzor", line 152, in main
if not dispatch(client, servers, config):
File "/usr/bin/pyzor", line 237, in check
for digested in get_input_handler(style):
File "/usr/bin/pyzor", line 181, in _get_input_mbox
tfile.write(sys.stdin.read().encode("utf8"))
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 2649: invalid start byte

I'm using pyzor 1.0.0 with python 3.4.3. I can provide the spam message that caused the crash if needed.

BR
Atanas

Is the licence "GPLv2 only" or "GPLv2 or later"

Fedora really cares about getting the licensing absolutely correct, and we'd like to know if the "any later version" clause in v2 of the GPL applies. In an old version (0.5.0) usage.html seemed to indicate that this was the case, and Fedora had the license as "GPLv2 or later". But usage.html is gone now, and the license section of the docs at http://pyzor.readthedocs.org/en/release-1-0-0/introduction.html#license seems to indicate "GPLv2 only". I know it's minor, but could you clarify?

Add an "all but anonymous" pseudo-user for the access file

Rather than listing every username other than anonymous, it would be convenient to have a "non-anonymous" or "authenticated" (or whatever name) string that could be used in the access file.

For example, this could be used to provide whitelist access to any user other than anonymous users.

Use python-future for 2 and 3 Python compatibility

Right now the Pyzor code is compatible with both Python 2 and Python 3 but only if the the code is converted with 2to3 first. When installing using pip this is done automatically, however it would be better to just use python-future for this and so no conversion will be required.

Expiring not working while using --detach

When pyzord daemonizes itself with the -detach option the Timer thread that handles expiry will be killed.

We need to start expiring after detaching.

Add support for batching reports to the Pyzor server.

For long running daemons/operations (such as the new forwarder system), the Pyzor client could be improved to batch the reports (or whitelist) to the server. The server will also need to adjust it's protocol to accept multiple digests.

ISTM that the simplest way to do this is by appending more headers Op-Digest headers to the request.

DCC like technology additions

It could be interesting to add similar concepts used by "Distributed Checksum Clearinghouses" (http://www.rhyolite.com/dcc/) to the Pyzor filtering system.

Error digesting badly formed message (decode() argument 1 must be string without null bytes)

~ # pyzor check < 1ZKiRG-0007ja-DV--7132194856063680649
Traceback (most recent call last):
  File "/usr/local/bin/pyzor", line 408, in <module>
    main()
  File "/usr/local/bin/pyzor", line 152, in main
    if not dispatch(client, servers, config):
  File "/usr/local/bin/pyzor", line 237, in check
    for digested in get_input_handler(style):
  File "/usr/local/bin/pyzor", line 175, in _get_input_msg
    digested = digester(msg).value
  File "/usr/local/lib/python2.7/site-packages/pyzor/digest.py", line 82, in __init__
    for payload in self.digest_payloads(msg):
  File "/usr/local/lib/python2.7/site-packages/pyzor/digest.py", line 160, in digest_payloads
    payload = payload.decode(charset, errors)
TypeError: decode() argument 1 must be string without null bytes, not str

Probably something to do with this:

Content-Type: text/plain;
    charset="iso-8859-1^@^@^@Content-Transfer-Encoding: quoted-printable

We need to remove any '\x00' (NULL characters) when processing messages.

TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'

A user reported the following backtrace in https://bugzilla.redhat.com/show_bug.cgi?id=1288853
(Please ignore the fact that the reporter has no interpersonal skills whatsoever.)

Traceback (most recent call last):
File "/usr/bin/pyzor", line 408, in <module>
  main()
File "/usr/bin/pyzor", line 152, in main
  if not dispatch(client, servers, config):
File "/usr/bin/pyzor", line 239, in check
  send_digest(digested, mock_runner, servers)
File "/usr/bin/pyzor", line 262, in send_digest
  _send_digest(runner, servers[0], digested)
File "/usr/bin/pyzor", line 253, in _send_digest
  runner.run(server, (digested, server))
File "/usr/lib/python3.4/site-packages/pyzor/client.py", line 258, in run
  response = self.routine(*args, **kwargs)
File "/usr/lib/python3.4/site-packages/pyzor/client.py", line 122, in _mock_check
  pyzor.proto_version))
TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'

I'm pretty sure this is simply python3-incompatible code. What I don't yet understand is why I can't repeat it. Will have to dig into this further

Add support for Unix sockets

In would be nice if we could support listening/connecting on Unix sockets as well.

Check compatibility with PyPy3

We need to check if Pyzor is compatible with PyPy3 and automatically run tests on Travis-CI.

Whitelist from website does not work?

Hello,

Every single message coming from my server are being marked as "listed" on Pyzor.

Please find attached one of those emails that was sent using the https://www.mail-tester.com website.
false-positive.txt

You can see the report here: https://www.mail-tester.com/web-1rRmDh

I installed Pyzor using Debian and followed your documentation to generate the digest of this email file.

pyzor digest < false-positive.txt
a3917acbee3f33d744611512992355f721fdfdb7

I then go to the formulary to whitelist it and the website reports "Digest does not match message."

Am I missing something obvious here?
Thank you in advance,

pyzor.client docstring refers to function that does not exist

The module docstring for pyzor.client refers on line 22 to a function pyzor.digest.get_digest that does not exist.

To get a digest (of an email.message.Message object, or similar):
>>> digest = pyzor.digest.get_digest(msg)

I had to read the pyzor script to figure out how to get a digest. Would be neat with up to date documentation.

Also, I noticed there's a typo in the docstring for ClientRunner.handle_response on line 266 in client.py.

Thanks

Have the pyzor client connect use threads when connecting to multiple server

Rather then sending requests to multiple servers sequentially we could use threads and do them all at once. Then collate the final result.

White list content "test"

This is not really an issue.

But it's the tenth time a customer tells me "the mail server is not properly configured" because they send a message to mail-tester.com with just "test" in the content.

Maybe it would be nice to whitelist this content.

RSS/ATOM feed

Please consider adding a release feed.
pyzor notification could be received through an RSS reader.

Create a Travis-CI configuration

We should start using Travis-CI to run the tests. To run the full suite of tests we need the following libraries:

MySQLdb
redis
gdbm

As a pre-execution step we (still) require:

a test.conf file that contains MySQL connection details (perhaps this should just be hardcoded since the password in Travis-CI is empty string)
a test database (tests could perhaps be improved to create this database)

Running the tests on Python 3 does also require re-factoring the source code with 2to3.

We should also take this opportunity to test the compatibility with PyPy.

UnicodeDecodeError

Yet another unicode decode problem that needs to be solved.

File "pyzor/digest.py", line 59, in __init__
    lines.append(norm.encode("utf8"))
UnicodeDecodeError  'ascii' codec can't decode byte 0xed in position 7: ordinal not in     range(128)

Listed on Pyzor

Hi there!

I don't know why I'm listed on Pyzor and I don't know how to remove. Please can you help me?
My domain is: upnetworks.com.br
I'm following this link: http://www.mail-tester.com/web-wb0iPJ

Thanks

SELinux is preventing pyzor from getattr access on the file /usr/bin/rpm

I'm running a Fedora 21 system and have noticed that SELinux complains with this error message quite frequently. Sometimes it is preceded by the following error but not every time:

python[22066]: detected unhandled Python exception in '/usr/bin/pyzor'

Below are some more specifics regarding the system:
kernel-4.1.5-100.fc21.x86_64
pyzor-0.5.0-10.fc21.noarch
Python 2.7.8 (default, Apr 15 2015, 09:26:43)
[GCC 4.9.2 20150212 (Red Hat 4.9.2-6)] on linux2

If it should have access then I want to file a bug report with the SELinux folks but I figured it made sense to start here. Can someone tell me why pyzor is attempting to access this file?

Thanks!

Pyzor nailed a legit email from Twitter - how to report?

A legit email from Twitter ("you have new followers") got falsely marked as spam by SpamAssassin, due in part to Pyzor for some odd reason (see below). How do I report this or figure out which link triggered it?

Content analysis details:   (5.3 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 3.5 BAYES_99               BODY: Bayes spam probability is 99 to 100%
                            [score: 1.0000]
 1.1 URI_HEX                URI: URI hostname has long hexadecimal sequence
 0.2 BAYES_999              BODY: Bayes spam probability is 99.9 to 100%
                            [score: 1.0000]
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not necessarily valid
 1.4 PYZOR_CHECK            Listed in Pyzor (http://pyzor.sf.net/)
 0.0 T_DKIM_INVALID         DKIM-Signature header exists but is not valid

Skip digest of empty payload (or static whitelist)

Originally reported by gryphius.

We're seing a lot of pyzor "false positives" from messages with attachments but little or no body text. these messages are all different but generate the same digest da39a3ee5e6b4b0d3255bfef95601890afd80709, which is the sha1-sum of the empty string . It looks like this is is the digest produced if all content is stripped out by the pyzor normalizer.

current public.pyzor.org result for this hash:
public.pyzor.org:24441 (200, 'OK') 159015 5706
pyzord could maybe treat this special hash as statically whitelisted (whithout the need to have clients submit this hash into the whitelist first) and always return a zero hitcount.

This would be especially helpful in spamassassin setups, where only the hitcount is checked ( https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6108 )
if hardcoding this hash is not an option, you could maybe add a config option to read a static whitelist from a file.

Attached is a quick & dirty patch we're using to skip this hash.

Strip content in <style> and <script> tags in HTML normalisation

When a message has a lot of <style> content, often the pre-digest is filled with this instead of something that actually identifies the content. It would be better to remove this in the same way that the tags themselves are removed. (<script> is uncommon since it's generally ignored, but we might as well remove that as well).

Don't send whitelist requests if the message is already whitelisted.

We should add a check to whitelist request web-service (http://public.pyzor.org/whitelist/) if the message has been actually reported as spam, and isn't already whitelisted.

If either of these conditions are not met, we should show an appropriate error message.

Correctly handle timezones in the MySQL server tests.

When running tests in a non UTC timezone the MySQL servers tests are failing:

    self.fail("Delta %s is too big: %s, %s" % (delta , date1, date2))
E   AssertionError: Delta 18000.877231 is too big: 2016-01-15 09:41:12, 2016-01-15 14:41:12.877231

We need to fix the tests or the code to handle this as well.

spamexperts / pyzor Goto Github PK

pyzor's People

Contributors

Stargazers

Watchers

Forkers

pyzor's Issues

SERIALIZER.PY

MODELS.PY

Recommend Projects

Recommend Topics

Recommend Org