Giter Site home page Giter Site logo

cz-nic / convey Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 3.0 1.4 MB

CSV processing and web related data types mutual conversion

License: GNU General Public License v3.0

Python 99.39% Shell 0.61%
aggregation base64 converting-units csirt csv e-mail json ods regular-expressions spreadsheet webservice whois xlsx

convey's People

Contributors

e3rd avatar nicki-krizek avatar oskar456 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convey's Issues

Convey pro RT?

23.3.2016 – Pavel + já, sednout si s jednou Zuzkou, jestli convey využije pro RT / jestli budou přecházet na RTIR

OTRS depth analysis

(přes léto)

lab ticket_search_2016-06-15_15-50.csv -> každé IP

  • sloupec TicketFreeText2 má IPčka, ta se vezmou a obohatí se o sloupec ASN
    • pokud je tam IP adres víc, 1) můžu ignorovat (Pavel řekl), 2) více ASN oddělěných čárkou? 3) více ASN sloupců?
    • výstupem je prostě jiné CSV
  • takže stačí do convey přidat 2. funkcionalitu. Nyní je "Rozsekej na soubory pro poslání OTRS", teď bude obohať o ASN(/IP/country/abusemail).
    • Bude to chtít nějaký checkbox dialog asi.

Takže funkcionalita: buď přidávání sloupce (pokud sloupec ASN či netname, tak potřebuju IP; pokud sloupec CMS tak potřebuju URL či hostName), nebo rozsekávání dle mailu (nutno mít sloupec mail či IP či URL či hostName)

multiple IP per hostname

Hostname can have multiple IPs, we must have the possibility to duplicate the row for every record.

But how.

Alternative conversion method?
Global flag?
(Are there any similar conversions that can output multiple values?)

file descriptors

  • It is worthy to have many descriptors open.
# We should prefer have data in list a write them in a loop than join data to string and write it (twice as long). 
# Tested on local ram disk and distant network drive.

"""
Bigtext length 42743940
network drive: 1 loop, best of 1: 1.06 s per loop
network drive: 1 loop, best of 1: 1.39 s per loop
network drive: 1 loop, best of 1: 5min 51s per loop open(..,"w")
network drive: 1 loop, best of 1: 100 * (6min 37s) per loop open(..,"a")
"""

import lorem

path = "/tmp/ram/test"
#path = "/mnt/csirt/_test/test"

def init():
    koeficient = 30000
    sentences = [lorem.text() for x in range(koeficient)]
    return " ".join(sentences)
#bigtext = init()

print("Bigtext length", len(bigtext))
    
def x():    
    with open(path,"w") as f:
        f.write(bigtext)
    
def y():
    with open(path,"w") as f:
        for s in sentences:
            f.write(s)
            
def z():
    for s in sentences:
        with open(path,"w") as f:
            f.write(s)
#%timeit -n 1 -r 1 init()
#%timeit -n 1 -r 1 x()
%timeit -n 1 -r 1 y()
%timeit -n 1 -r 1 z()

"1.2.3.4.port" format support

Sometimes, I see IP address in this format, with port added.
But is it in every row (easy) or sometimes in random rows?

refresh webservice

Since uwsgi let the application cache the whois results, we have to refresh the cache:

  • add systemd config file that restarts every day or so
  • add parameter fresh=all that makes convey reset whois cache
  • add parameter fresh=1 that makes fresh prefix of current query

nicer dialogs

sourceParser structered something like:

def print_info(): "delimiter, header, ru (100), cn (200) "
def is_yes(user_input): return user_input.lower in ("y","yes")

for fn in [determineIp, questions...]:
print_info()
fn()
s = input()
if s == "x": cancel
else fn(input = s)

def determineIp(input = None): if not input: "whats the ip"; else: store input

OTRS http log errors

  • mailSender:86 if b'P\xc5\x99edat - Tiket - OTRS'.decode("utf-8") in title asi není vždy ok stav. Mohl bych to loggovat.

UI design

I want to be able

  • to do aside analysis only
  • to do aside rozstrkání to files (I may conveying method) – no redoing whois
  • open config file from menu
  • launch program with different config file

optional speed ups

  • if there is lots of prefixes, lets check that IP goes there quicker – dont have a dict of prefixes (csv.ranges) but ordered dict of first addresses in this prefixes. And check string of IP with the nearest first address only. (Could it work?)
  • multithreading whois
  • kompile re searches?

different line length

If two lines have different count of fields, the latte should be moved to the invalids.

ex we received this wrong line that is to be sent to cn country, but cn appears as the very last field. Hence, 7th field is the IP "217..." and IP servers wrongs as the country code.

2018-07-25 02:40:00.415945 CET +0000,217.31.201.4,3306,122.114.46.138,4033,Win32:Malware-gen,https://www.virustotal.com/search/?query=d6362bdf13a789790e7cad2018-07-26 04:52:25.463007 CET +0000,217.31.201.4,3306,103.55.13.68,56860,unknown,https://www.virustotal.com/search/?query=ab27f6c7634e9efc13fb2db29216a0a8

support stdin for text input

I'd find it useful if I could pass text input to convey using stdin. For example:

$ echo "acddf" | convey -f reg
$ cat file* | convey -f reg

I tried using the -i option as well, but convey crashes either way.

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/convey/__main__.py", line 18, in main
    Controller()
  File "/usr/lib/python3.8/site-packages/convey/controller.py", line 249, in __init__
    self.source_new_column(target_type, add, source_field, source_type, custom)
  File "/usr/lib/python3.8/site-packages/convey/controller.py", line 538, in source_new_column
    *custom, target_type = Preview(source_field, source_type, target_type).reg()
  File "/usr/lib/python3.8/site-packages/convey/wizzard.py", line 298, in reg
    self.search = self.reset_session().prompt('Regular match: ', **options, default=self.search,
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/shortcuts/prompt.py", line 797, in prompt
    return run_sync()
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/shortcuts/prompt.py", line 786, in run_sync
    return self.app.run(inputhook=self.inputhook, pre_run=pre_run2)
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/application/application.py", line 736, in run
    return run()
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/application/application.py", line 710, in run
    return f.result()
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/eventloop/future.py", line 151, in result
    raise self._exception
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/eventloop/coroutine.py", line 92, in step_next
    new_f = coroutine.throw(exc)
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/application/application.py", line 685, in _run_async2
    result = yield f
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/eventloop/coroutine.py", line 92, in step_next
    new_f = coroutine.throw(exc)
  File "/usr/lib/python3.8/site-packages/prompt_toolkit/application/application.py", line 637, in _run_async
    result = yield From(f)
EOFError
{'epilog': 'To launch a web service see README.md.', 'column_help': 'COLUMN is ID the column (1, 2, 3...), the exact column name, field type name or its usual name.', 'parser': ArgumentParser(prog='convey', usage=None, description='Data conversion swiss knife', formatter_class=<class 'convey.controller.SmartFormatter'>, conflict_handler='error', add_help=True), 'csv_flags': [('otrs_id', 'Ticket id'), ('otrs_num', 'Ticket num'), ('otrs_cookie', 'OTRS cookie'), ('otrs_token', 'OTRS token')], 'flag': ('otrs_token', 'OTRS token'), 'args': Namespace(compute_preview=None, config=None, csirt_incident=False, csv_processing=False, debug=False, delete=None, delete_whois_cache=False, delimiter=None, dig=None, disable_external=False, field=None, field_excluded=None, file=False, file_or_input=None, fresh=False, header=False, headless=False, input=False, json=False, multiple_cidr_ip=None, multiple_hostname_ip=None, nmap=None, no_header=False, otrs_cookie=None, otrs_id=None, otrs_num=None, otrs_token=None, output=None, quiet=False, quote_char=None, show_uml=None, single_detect=False, single_query=False, sort=None, split=None, user_agent=None, verbose=False, version=False, web=None, whois=None, whois_ttl=None, yes=False), 'module': 'dig', 'c': [], 'add': True, 'custom': [], 'target_type': Type(reg), 'm': None, 'source_field': <Field Listen=[::1]:53 (Datagram)\nListen=[::1]:53 (Stream)\nListen=127.0.0.1:53 (Datagram)\nListen=127.0.0.1:53 (Stream)\nListen=[::]:53 (Datagram)\nListenStream=192.0.2.1:443\nListenStream=[::beef]:443\nListenStream=[::1]:853\nListenStream=127.0.0.1:853\nListenStream=[::]:888\nListen=[::1]:853 (Stream)\nListen=127.0.0.1:853 (Stream)\nListen=[::]:888 (Stream)\nListenStream=[::1]:8453\nListenStream=127.0.0.1:8453\nListenStream=/unix/path\n(any_ip)>, 'source_type': Type(quoted_printable), 'self': <convey.controller.Controller object at 0x7f31f3e56760>, 'task': ['reg']}

proki base64 un/pack

methods = [("base64", "text") , ... ]

Usecase:

> ./convey.py events.zip --macro-proki-unpack-raw
unzipping...
> I want incident-contact.
From what field?
> from column base64
Unpacked base64 seems to be ip, okay?
> no, it's not always an ip, autodetect
processing rows:
base64 -> text -> autodetected ip -> whois -> incident-contact
base64 -> text -> autodetected mail -> it's ok

lacnic quota

When analysing a large number of IPs, I get into LACNIC query rate limit exceeded – and I have to wait 5 minutes. The program gets stuck.

A config flag "mark the row as invalid (default for file processing) / wait 5 minutes" could resolve this sufficiently.

For web service, default may be waiting 5 minutes / printing quota error message.

wrong netname when pings between registrars

whois 141.138.197.1 asks ARIN which tells "ask RIPE".

So there are two netnames etc.

There are two Origin – first empty. But convey takes '\norigin(.*)\d+' so this is alright. But convey takes the first netname which stands for ripe netname.

Maybe convey should take the last found value (even for abusemail?)?

$ whois 141.138.197.1

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/resources/registry/whois/tou/
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/resources/registry/whois/inaccuracy_reporting/
#
# Copyright 1997-2019, American Registry for Internet Numbers, Ltd.
#


NetRange:       141.0.0.0 - 141.255.255.255
CIDR:           141.0.0.0/8
NetName:        RIPE-ERX-141
NetHandle:      NET-141-0-0-0-0
Parent:          ()
NetType:        Early Registrations, Maintained by RIPE NCC
OriginAS:       
Organization:   RIPE Network Coordination Centre (RIPE)
RegDate:        1993-04-30
Updated:        2009-05-18
Comment:        These addresses have been further assigned to users in
Comment:        the RIPE NCC region.  Contact information can be found in
Comment:        the RIPE database at http://www.ripe.net/whois
Ref:            https://rdap.arin.net/registry/ip/141.0.0.0

ResourceLink:  https://apps.db.ripe.net/search/query.html
ResourceLink:  whois.ripe.net


OrgName:        RIPE Network Coordination Centre
OrgId:          RIPE
Address:        P.O. Box 10096
City:           Amsterdam
StateProv:      
PostalCode:     1001EB
Country:        NL
RegDate:        
Updated:        2013-07-29
Ref:            https://rdap.arin.net/registry/entity/RIPE

ReferralServer:  whois://whois.ripe.net
ResourceLink:  https://apps.db.ripe.net/search/query.html

OrgTechHandle: RNO29-ARIN
OrgTechName:   RIPE NCC Operations
OrgTechPhone:  +31 20 535 4444 
OrgTechEmail:  [email protected]
OrgTechRef:    https://rdap.arin.net/registry/entity/RNO29-ARIN

OrgAbuseHandle: ABUSE3850-ARIN
OrgAbuseName:   Abuse Contact
OrgAbusePhone:  +31205354444 
OrgAbuseEmail:  [email protected]
OrgAbuseRef:    https://rdap.arin.net/registry/entity/ABUSE3850-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/resources/registry/whois/tou/
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/resources/registry/whois/inaccuracy_reporting/
#
# Copyright 1997-2019, American Registry for Internet Numbers, Ltd.
#



Found a referral to whois.ripe.net.

% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: this output has been filtered.
%       To receive output for a database update, use the "-B" flag.

% Information related to '141.138.197.0 - 141.138.197.255'

% Abuse contact for '141.138.197.0 - 141.138.197.255' is '[email protected]'

inetnum:        141.138.197.0 - 141.138.197.255
netname:        XLIS-VPS
remarks:        VPS42
remarks:        INFRA-AW
descr:          XL Internet Services B.V.
country:        NL
admin-c:        XLIS-RIPE
tech-c:         XLIS-RIPE
status:         ASSIGNED PA
mnt-by:         XLIS-NL-MNT
mnt-domains:    XLIS-NL-MNT
created:        2011-09-22T15:37:50Z
last-modified:  2012-01-09T17:51:32Z
source:         RIPE

role:           XL Internet Services Hostmaster
address:        XL Internet Services BV
address:        Oostmaaslaan 71 (15th floor)
address:        3063 AN Rotterdam
address:        The Netherlands
phone:          +31 10 270 94 70
fax-no:         +31 10 433 44 60
abuse-mailbox:  [email protected]
nic-hdl:        XLIS-RIPE
admin-c:        NOC193-RIPE
tech-c:         NOC193-RIPE
remarks:        ------------------------------------------------
remarks:        E-mail is the preferred contact method!
remarks:        ------------------------------------------------
remarks:        Please use one of the following addresses:
remarks:        [email protected] - for abuse notification
remarks:        [email protected] - for technical questions
remarks:        [email protected] - for anything else
remarks:        ------------------------------------------------
mnt-by:         XLIS-NL-MNT
created:        2007-01-11T12:57:00Z
last-modified:  2016-06-17T14:34:50Z
source:         RIPE # Filtered

% Information related to '141.138.192.0/20AS35470'

route:          141.138.192.0/20
descr:          XL Network
origin:         AS35470
mnt-by:         XLIS-NL-MNT
created:        2011-07-01T13:45:00Z
last-modified:  2011-07-01T13:45:00Z
source:         RIPE

% This query was served by the RIPE Database Query Service version 1.94.1 (BLAARKOP)

nicer progress bar

Can I clear the screen with
print(chr(27) + "[2J")
import sys
sys.stderr.write("\x1b[2J\x1b[H")
?

During whoising, every 10 s OR 100 requests refresh screen and print info so that user sees how many IPs are from china etc...

autofetching csirtmails

This feature is not so neeeded. We have csirtmails in .csv and its roughly okay.
Closing for now.

Lze rozšířit i na automatické dohledávání:
preferenčně hledat na
trusted introducer - csv (certifikát)
hledám podle country - FR,
type, radši national nebo governement (jsou tam třeba i paskvily jako national-government), když je to banka, tak ignorovat
hledám na first.org a zkusím dohledat tam
http://www.first.org/members/teams

pak hledá statické kontakty.

03_01_Cert_De

Script error?
11 k addresses
Does it work now, after all the updates?
rovnou to použilo béčka!
ipsCzFound bylo záporné!

better mail templates

Possibility to have some more template variables.

put some URL to textVars or sth...

headless launch

headless launch: flag defines action, no interaction. I.E. automatically add Country column from IP column 2, print to stdin

Clean up?

If reanalysing, logs in the directory from the previous analysis are not cleaned up.
I fear deleting files.

new processing flow

Current depth-analysis branch means a completely reworking processing flow. What remains?

  • roughly working
  • remember settings between launches?
  • revive editing templates
  • revive OTRS support
  • macros (running with --splitted-by abuse-contact --process --send otrs flag or STH)
  • invalid lines ( & reprocessing)
  • unknown mails ( & reprocessing)
  • check this can replace current production version

Next:

  • recheck SMTP sending
  • clean up sourceParser attributes
  • every added field should be working (especially abuse-contact)
  • SMTP support
  • every method should be working (url->hostname etc, not sure if correctly implemented) (few are still not implemented)
  • cleaned up main file convey.py (no more Controller in there)
  • nicer menus?
  • if not splitting, resolve unknown won't work cause no "unknowns" file is created
  • If processed for the second time, output gets to the file for the second time. Should I delete the file every reprocessing? ✓ How to handle double reprocessing? (E.G. invalid lines would need to be reprocessed with other delimiter etc. :o )
  • Release to PyPi (first claim the project name :( when it's possible (in months) ) or as a deb package
  • if single file processed, we open Convey directly for it (preferable without closing Convey) (maybe just change self.csv and self.csv.settings or so?)
  • add a regex column?

whois timeout

Check better if there is not a possibility to timeout a popen. Or check if multithreading can be reimplemented now.

Problem:
whois -h whois.arin.net 199.175.52.138
This query ends with string: Found a referral to rwhois.vpscheap.net:4321., then it waits for 10-20 secods and Finishes with: "Timeout." I dont want to wait 20 secs in my script.

json api superfluous?

json api may be completely superfluous
Check if any of the records not using -r (for that the -r flag wasnt enough) benefits from JSON API

CIDR

0.0.0.0/XY fails, data type of CIDR is missing

corean KRNIC

Whoising IP 125.129.170.2 does not return a row with country.

impossible to exit interactive mode for reg

I'm not able to cleanly exit the interactive mode for -f reg using typical KeyboardInterrupt (with C-c), nor can I close the application with C-d.

Holding down C-c instead of hitting it sometimes work, but otherwise I get the errors below.

From what I saw in the code, it seems the issue is prompt_toolkit.PromptSession consumes these keypresses and it doesn't raise any exception.

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/convey/__main__.py", line 18, in main
    Controller()
  File "/usr/lib/python3.8/site-packages/convey/controller.py", line 249, in __init__
    self.source_new_column(target_type, add, source_field, source_type, custom)
  File "/usr/lib/python3.8/site-packages/convey/controller.py", line 568, in source_new_column
    path = graph.dijkstra(target_type, start=source_type, ignore_private=True)
  File "/usr/lib/python3.8/site-packages/convey/graph.py", line 37, in dijkstra
    visited = {target: 0}
TypeError: unhashable type: 'list'
{'epilog': 'To launch a web service see README.md.', 'column_help': 'COLUMN is ID the column (1, 2, 3...), the exact column name, field type name or its usual name.', 'parser': ArgumentParser(prog='convey', usage=None, description='Data conversion swiss knife', formatter_class=<class 'convey.controller.SmartFormatter'>, conflict_handler='error', add_help=True), 'csv_flags': [('otrs_id', 'Ticket id'), ('otrs_num', 'Ticket num'), ('otrs_cookie', 'OTRS cookie'), ('otrs_token', 'OTRS token')], 'flag': ('otrs_token', 'OTRS token'), 'args': Namespace(compute_preview=None, config=None, csirt_incident=False, csv_processing=False, debug=False, delete=None, delete_whois_cache=False, delimiter=None, dig=None, disable_external=False, field=None, field_excluded=None, file=False, file_or_input='all', fresh=False, header=False, headless=False, input=False, json=False, multiple_cidr_ip=None, multiple_hostname_ip=None, nmap=None, no_header=False, otrs_cookie=None, otrs_id=None, otrs_num=None, otrs_token=None, output=None, quiet=False, quote_char=None, show_uml=None, single_detect=False, single_query=False, sort=None, split=None, user_agent=None, verbose=False, version=False, web=None, whois=None, whois_ttl=None, yes=False), 'module': 'dig', 'c': [], 'add': True, 'custom': [], 'target_type': Type(reg), 'm': None, 'source_field': <Field Listen=[::1]:53 (Datagram)(None)>, 'source_type': Type(plaintext), 'self': <convey.controller.Controller object at 0x7f48e9b97760>, 'task': ['reg']}

cms scans

csirt.cz cms scanner may be transformed to a convey module

step sending

While sending, have the possibility to send one by one. Sth like:

1× mail send. Send another ([email protected])? [Yes/yes to All/Cancel]

When cancelled, check the mail that have already been sent and those who are going to be sent. Implement possibility to resume sending when interrupted.

  • Ability to send to testing mail without having to change config.ini.

regular match (aside csv)

Some files are not CSV at all.
The administrator should be able to write regular string that would grasp IP from every line.

test issue

We forked here from gitlab. How it will manage two issues system?


Gitlab first issue:
řetězec "binární soubor (standardní vstup) odpovídá" místo mailu
Katka místo mailu na adresu 188.75.172.28 měla nápis "binární soubor (standardní vstup) odpovídá". Asi whois vracel binárku. Našel jsem něco podobného na naetu a před každý grep jsem dal strings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.