Giter Site home page Giter Site logo

anonip's People

Contributors

datenreisen avatar ganti avatar htgoebel avatar mvoehringer avatar open-dynamix avatar ukleinek avatar znerol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anonip's Issues

Apache Error: (OS 193)%1 ist keine zulässige Win32-Anwendung.

Hallo,

danke für die ausführlichen Antworten. Leider funktioniert es immer noch nicht. :/

Also nochmal step by step:

    • Testen, ob das CLI-Tool läuft:

$ echo "192.168.0.123" | anonip.py
192.168.0.0

Der Befehl echo funktioniert ja so bei Windows nicht. Was aber funktioniert ist:

C:\Apache24\Anonip-main>python anonip.py --input test.log
10.xxx.xxx.0 [11/Jan/2022:09:06:26 +0100] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62"
10.xxx.xxx.0 [11/Jan/2022:09:06:26 +0100] "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62"

Die IP Adresse wird hier anonymisiert. Das Skript läuft also durch. Die anonymisierten IP-Adressen werden aber nicht in die test.log geschrieben, sondern lediglich in der Konsole ausgegeben.

    • In der Apache-config den vollen Pfad zu Anonip angeben.

Hab ich gemacht. Bringt nix.

    • In der Apache-config das Python-executable explizit angeben. "(OS
      193)%1 ist keine zulässige Win32-Anwendung." deutet darauf hin, dass das was bringen könnte:

CustomLog "|/usr/bin/python3.10 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined

Da verstehe ich nicht, was das bringen soll, bzw. ich verstehe nicht, warum hinte python 3.10 kein "/" kommt. Letztenendes würde die Anonip ja dann nur in einem anderen Ordner liegen. Pyhton ist global für alle User installiert, da ich Admin-Rechte habe.

Ich habe es also so probiert:

CustomLog "C:/Program Files (x86)/Python37-32 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined

=> bringt auch nichts, da gibt es nichtmal eine Fehlermeldung im error.log

    • Ein VirtualEnv einrichten und dann das Python-executable dieses VirtualEnvs angeben:

CustomLog "|/path/to/virtualenv/bin/python3.10 Anonip-main/anonip.py --ipv4mask 8 --output logs/test.log" combined

Weiß nicht, wie das gehen soll, bzw. Python ist ja sowieso global installiert, Umgebungsvariable gesetzt und es müsste also so laufen.

    • A shot in the dark: Ev. ist's ein Architektur-mismatch. 64-bit Python oder so.

Wie gesagt, habe Python in 32 und 64 bit probiert. Der Server ist ein 64 bit Server, Apache als 64 bit Version installiert.

    • Anonip Version(en): 1.0.0 und 1.1.0
  • Python Version(en): Python37-32, Python37 als 64 bit, python 3.8.8

  • OS Version: Windows Server 2016 Standard, Version 1607

  • Apache Version:

C:\Apache24\bin>httpd -v
Server version: Apache/2.4.41 (Win64)
Apache Lounge VS16 Server built: Aug 9 2019 16:46:32

  1. Kannst Du unter apache einen Aufruf machen auf "C:\irgendow\phyton.EXE c:\hier\anonip.py" ?

Das verstehe ich nicht.

[RFC] Regex based IP detection

Rationale

Our column-based approach of specifying the location of an IP address is not flexible enough to cover all usecases.

A good example of such a usecase can be seen in this issue. Since it's not possible to configure the log format for error logs in nginx, Anonip can't reliably detect IP addresses.

Proposal

I propose an alternative regex matching IP detection.

I don't intend to match IP addresses with regexes! But I'd like to provide a way to point Anonip to the locations of IP addresses with a regex.

This alternative approach should be provided alongside the already existing column-based approach.

When using the new --regex argument, the arguments --column and --delimiter will become obsolete.

--replace can still be used, for cases, where we have matching groups, but they're not valid IP addresses.

Example

The regexes provided in the examples are simplified and should just illustrate the proposed feature. For production environments you want to have more robust ones.

Let's use the log line from the before mentioned issue:

2020/03/05 19:27:43 [error] 1253#1253: *15347 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: XXX.XXX.XXX.XXX, server: address.tld, request: "GET /favicon.ico HTTP/1.1", host: "address.tld"

With the new feature in place, we could do:

$ ./anonip.py --regex ".* client\: ([^,]+), .*"

This would then match the provided log line and capture the IP address (XXX.XXX.XXX.XXX) into the first group.

In order to find all IP addresses, Anonip would then iterate over all available matched groups (just one in this example).

More involved example

Let's say we still want to handle above log line, but additionally we expect lines in the following format:

1970-01-01 - somefixedstring: XXX.XXX.XXX.XXX - exception foo - XXX.XXX.XXX.XXX

Note the two IP addresses.

This can be handled in one single regex:

$ ./anonip.py --regex "(?:.*, client\: ([^,]+), .*|.* - somefixedstring\: ([^,]+) - .* - ([^,]+))"

Considerations

This opens a box of very verbose and hardly readable commands needed to run Anonip against certain logs.

But for more advanced users, it would fill the gap which exists now for parsing log files with formats that are not parseable by Anonip.

Doesn't handle binary logfile content

When processing a logfile that contains binary parts, the following exception gets thrown:

Traceback (most recent call last):
  File "anonip.py", line 508, in <module>
    main()
  File "anonip.py", line 491, in main
    for line in anonip.run(input_file):
  File "anonip.py", line 161, in run
    line = input_file.readline()
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1040: invalid start byte

While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:

2022/01/09 05:21:49 [info] 58271#58271: *55771 client sent invalid method while reading client request line, client: 192.0.2.0, server: foo.example.org, request: "<binary rubbish>"

Note that there's even an IP address in that line that needs to be anonymized!

So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?

Error with IPv6 short form notation + port

When processing ErrorLog entries from Apache 2.4 in the following format (IPv6 short form notation + port)

2001:db8:1::ab9:C0A8:102:46824 [Wed Jul 06 21:28:43 2022] [error] [pid 68812] mod_proxy_fcgi.c(887): AH01071: Got error 'Primary script unknown'

I get the following error:

$ echo "2001:db8:1::ab9:C0A8:102:46824" | ./anonip.py
WARNING:__main__:'2a06' does not appear to be an IPv4 or IPv6 network
2a06:6440:0:2c80::1:46824

When I remove the port, it works:

$ echo "2001:db8:1::ab9:C0A8:102" | ./anonip.py
2001:db8::

This occurs with Python 3.8 and 3.9. Can this be fixed?

Python 2.7 did not detect IPv6 addresses

echo "test.de 2001:0db8:85a3:0000:0000:8a2e:0370:7334 - - [21/Nov/2019:11:07:56 +0000] - - - -" | /usr/local/sbin/anonip.py --column 2 WARNING:__main__:u'2001' does not appear to be an IPv4 or IPv6 network test.de 2001:0db8:85a3:0000:0000:8a2e:0370:7334 - - [21/Nov/2019:11:07:56 +0000] - - - -

see #33 for an possible fix

Exclude some addresses from anonymization

I currently use anonip to change the last byte (I have IP4 addresses only, for now) in the logs, changing e.g. 123.123.123.123 to 123.123.123.0 (the default behaviour).

Now I'd like to except my "own" addresses from log analyzing. Let's say the external IP addresses from our company are 123.123.123.122 and 123.123.123.123 (or 123.123.123.122/31). I'd like to leave these addresses intact for e.g. AWStats to recognize them as "internal" and ignore them (SkipHosts configuration setting).

This is similar to (but much simpler than) the --skip-private option which allows to except Special-Use Addresses from masking.

Currently I'd need to tell AWStats to ignore 123.123.123.0 altogether, missing e.g. all requests from 123.123.123.128/25. Or do very complicated things. It shouldn't be necessary to parse the logs for particular IP addresses more than once, right?

I propose a --skip option which could be used independently from --skip-private; it could add to a list or take more than one value.

Examples for nginx and systemd

Using anonip with nginx could be made much more easy for the admin when leveraging systemd. I already prepared the required information, which includes two systemd unit-files. Just to avoid unnecessary work:

  • Shall this be put into the Readme, into some extra document or into some directory?
  • If into some directory: How should this be called?

Too broad regex replacement

Consider the following slightly modified access log line from the first regex test:

3.3.3.3 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"

The requested URI could be an OID from an SNMP MIB or something like that.

The current implementation would replace the 3.3.3.3 in that URI, even though it has nothing to do with the client IP address and isn't, in fact, an IP address at all:

3.3.0.0 - - [20/May/2015:21:05:01 +0000] "GET /723.3.0.057 HTTP/1.1" 200 13358 "-" "useragent"'

Note that this depends on the real client address being contained in the URI. The following line

2.2.2.2 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"

doesn't modify the URI:

2.2.0.0 - - [20/May/2015:21:05:01 +0000] "GET /723.3.3.357 HTTP/1.1" 200 13358 "-" "useragent"

Hence this could even reveal the real client IP address if only 723.3.3.357 makes sense in that place, and 723.3.0.057 doesn't.

I'll open a PR in a minute that - among other things - addresses this problem by only replacing the groups where they actually matched and modifies the first test case to highlight this problem.

piped anonymization in apache not working

Hey,
for days now i try to get anonip.py combined with piped logs in apache2 up and running. I'm really frustrated since i've tried so many things, from changing permissions of severeal files and folders, trying every possible notation in the access.conf and even to re-write the anonip.py itself, but to no avial. The ip adresses in access.log doesn't change at all.
Did i miss something?

Current Behavior:

Adding a custom log pipe to Apache with anonip.py as target doesn't change the ip adresses in access.log. The entries simply doesn't change.

Expected Behavior:

Changed ip adresses in access.log from something like:

192.168.137.95 - - [28/Jul/2022:15:51:14 +0200] "GET / HTTP/1.1" 200 1162 ....

to

192.168.[another number].[yet another number] - - [28/Jul/2022:15:51:14 +0200] "GET / HTTP/1.1" 200 1162 ...

Steps To Reproduce:

  1. Add CustomLog "|/var/www/anonip.py --ipv4mask 12 --output /var/log/apache2/access.log" combined to /etc/apache2/apache2.conf
    so the whole logging part of the config file looks like:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog "|/var/www/anonip.py --ipv4mask 12 --output /var/log/apache2/access.log" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

  1. restart apache with sudo systemctl restart apache2
  2. connect to webserver

So far there no error messages. The only error.log entries are:
[Thu Jul 28 15:54:40.131208 2022] [mpm_prefork:notice] [pid 46167] AH00169: caught SIGTERM, shutting down
[Thu Jul 28 15:55:11.728647 2022] [mpm_prefork:notice] [pid 46218] AH00163: Apache/2.4.41 (Ubuntu) configured -- resuming normal operations
[Thu Jul 28 15:55:11.728716 2022] [core:notice] [pid 46218] AH00094: Command line: '/usr/sbin/apache2'

Environment:

  • OS: Ubuntu 20.04 x64 (VMImage running on VMWare Workstation Player)
  • Apache Version: Server version: Apache/2.4.41 (Ubuntu) ; Server built: 2022-06-14T13:30:55
  • Python Version: 3.8

Anything else:

However calling the script by hand, like:
/path/to/script/anonip.py --ipv4mask 12 < /var/log/apache2/access.log --output /home/user/foobar.log
or /path/to/script/anonip.py --ipv4mask 12 < /var/log/apache2/access.log --output /var/log/apache2/access.log
works just fine.

Anonip doesn't want to run

I followed the instructions, and configured it properly. Restarted the Server -even made a reboot of the system- yet it doesn't run. I checked the running processes and realized, that it's a Zombie Process in my system: 2955 0.0 0.0 0 0 ? Z 13:14 0:00 [anonip.py]
killing the process does not help either. any ideas?

IP Adresses with Ports doesn't get masket

i have some apache logs that looks like

[Tue Apr 21 21:10:08.859997 2020] [php7:warn] [pid 16519] [client 89.154.188.26:57424] PHP Warning:

thats in the error.log. The combined log looks fine. but there is no port behind the IP Adress.
Some quick test with python 2 looks like:

192.168.2.1
--> 192.168.0.0
192.168.2.1:16852
--> 192.168.2.1:16852

nginx error logs not masked

I have trouble getting anonip to mask IPs in my nginx error.log file.

When I execute anonip as root, I receive the following error message:

# anonip < /var/log/nginx/error.log
WARNING:anonip:'2020' does not appear to be an IPv4 or IPv6 network
2020/03/05 19:27:43 [error] 1253#1253: *15347 open() "/usr/share/nginx/html/favicon.ico" failed (2: No such file or directory), client: XXX.XXX.XXX.XXX, server: address.tld, request: "GET /favicon.ico HTTP/1.1", host: "address.tld"

With XXX.XXX.XXX.XXX being some IP address (not masked) and address.tld the domain of the server.

I already tried the --delimiter and --column options:

# anonip --delimiter "," --column 2 < /var/log/nginx/error.log
WARNING:anonip:' client' does not appear to be an IPv4 or IPv6 network

Somehow I would need to tell anonip to look behind "client:", but that's not possible.

I thought anonip would work with error logs out of the box. Or have there been some recent changes in the way nginx is formating the error logs?

I am using nginx version nginx/1.14.0 (Ubuntu) and anonip 1.0.0 (installed through pip3).

Could it be made work with ErrorLog too?

I tried it also with ErrorLog, but it doesn't replace the IP address in format like that line:
[Wed Jun 27 20:37:49.123456 2018] [cgi:error] [pid 1234] [client 222.111.222.111:12345] script not found or unable to stat: /var/www/cgi-bin/test.php5
Would be nice if it could work here too. Thanks.

Exaple for --regex

Hello,

I've working only with IPv4 and I have error Logs in like followed:
[Mon Jul 18 17:54:15.281165 2022] [ssl:info] [pid 32202] [client 11.22.33.44:11388] Some Text bla...

I try
cat error.log | egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
and got sucessful my IP's

Next I try
anonip.py --input error.log -4 8 --regex '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' -d
and got
DEBUG:__main__:Regex did not match!
The same with:
anonip.py --input error.log -4 8 --regex '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' -d

Any idears?

Regards,
Heiko

Python 2.7.17 yields an error while Python 3 runs fine.

Im use Apache and have several Python versions installed
"python" defaults to the "2.7.17" binary

Using anonyip in Apache2 yields:

Traceback (most recent call last):
  File "/pathto/anonip.py", line 422, in <module>
    main()
  File "/pathto/anonip.py", line 406, in main
    print(line, file=output_file)
TypeError: write() argument 1 must be unicode, not str
AH00106: piped log program '/pathto/anonip.py --ipv4mask 8 --ipv6mask 64 --column 2 --output /var/log/apache2/access.log' failed unexpectedly

According to some research, it might be to some change were in this line, python2 would expect a str, where python3 would expect a utf8.
This might be fixed by adding a conversion in that case.

In my case I just changed the shebang to python3 and now it works fine for me.

Import anonip has side effects

When using the anonip module from Python applications, a logger instance is created at import time and basicConfig is called immediately. Regrettably logging.basicConfig modifies global state, and thus will result in side effects for the calling application.

Consider the following very simple Python script:

import logging
 
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')
logging.error('This is an error message')

Output:

DEBUG:root:This is a debug message
ERROR:root:This is an error message

Importing anonip breaks the logging configuration of the script:

import logging

from anonip import Anonip
 
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a debug message')
logging.error('This is an error message')

Now the debug line is missing from the output:

ERROR:root:This is an error message

In order to fix this, I propose to move the call to 'basicConfig' into 'main()'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.