Giter Site home page Giter Site logo

logpai / logparser Goto Github PK

View Code? Open in Web Editor NEW
1.5K 60.0 547.0 263.8 MB

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

License: Other

Python 79.79% Perl 12.02% C 7.97% Shell 0.23%
log log-mining log-analysis log-parser log-parsing anomaly-detection benchmark

logparser's People

Contributors

dependabot[bot] avatar gaiusyu avatar github-actions[bot] avatar isuruboyagane15 avatar jinyang88 avatar joehithard avatar pinjiahe avatar rustamtemirov avatar shilinhe avatar siyuexi avatar thomasryck avatar zhujiem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logparser's Issues

OpenStack testing

Hello,

Regarding Loglizer:

  1. I run the Drain preprocessing on the Openstack_2k.log but i get different structured and templates csv files in comparison with the ones you have in path logparser/logs/OpenStack. I used Drain.py and run the script benchmark/Drain_benchmark.py.

  2. I cannot find any dataloader.py for OpenStack similarly to HDFS. Can you provide this?

A sample of my generated structured csv can be seen here:
https://pastebin.com/LarMQwpd

and generated templates csv:
https://pastebin.com/2BBxLzv2

Thank you for your time.

no code of pop

In your TDSC(2018) "Towards automated log parsing for large-scale log data analysis", you said your team has published the source code of POP. However, I didn't find that.

Not able to reproduce results using benchmark

Hello,
I am trying to run a benchmark file for a spell as well as a drain but getting error as " ImportError: cannot import name 'Spell' from 'logparser' (C:\Users\Dell\anaconda3\lib\site-packages\logparser_init_.py)
"
Please guide me how to handle this error. I have created an environment as said in the instructions.

Windows installation guide

Please share detailed windows installation guide
Build dependency
and step by step run on windows with and without docker.

Drain:in step 3,the number of internal nodes is depth-2?

in paper ,step 3,The number of internal nodes that Drain traverses in this
step is (depth − 2), where depth is the parse tree parameter
restricting the depth of all leaf nodes.
but in code, the number of internal nodes is depth -2 -1.
Can you give me an explanation?

Unable to reproduce benchmark results

Hi,

I have been trying to recreate the performance benchmarks described in Drain extended (https://arxiv.org/pdf/1806.04356.pdf). I am using the benchmark/Drain_benchmark.py script ondev branch. I am using the full edition log files instead of the 2k versions and have commented out the f1 measure and accuracy computation part. The full script is at https://pastebin.com/aLVhkTeQ. On my macbook pro 2.2GHz i7 system the HDFS task takes ~3200 seconds as opposed to the ~100 seconds as shown in the paper. Should I be changing some settings/CPU?

Task list

  1. Fix TODO in SHISO_demo
  2. Finish docs/reference.md
  3. Finish benchmark settings, especially regex, and double-check parameters (e.g. Spell)

Drain vs DrainV1

Hello,

I'm playing with Drain and DrainV1.

Is Drain.py (which seems to be an interesting new version of the algo introduced in your paper) mature or still a work in progress ?

Do you have benchmarked the two approaches ?

thank a lot for your work and for your help also ;-)

Can not find how to create logTemplateMap.csv file

Hello,

I was working with your SVM_BGL.py file and this file asks for a logTemplateMap.csv. I am not able to figure out from looking into your log parser library, as from where is this file generated. Any help would be appreciated.

Thank you

Drain: Linux log parsing length

Hello authors,
I'm currently trying to parse the Linux dataset that you sent me (the full version of the log file, not the 2k version) and I'm encountering an error when trying to use the Drain_demo.py file on it. I didn't change anything in the Drain_demo other than the log format to match the Linux one and I still encounter this error.

This is the traceback:

Traceback (most recent call last):
  File "main.py", line 84, in <module>
    main()
  File "main.py", line 41, in main
    pre_processing()
  File "main.py", line 48, in pre_processing
    unstructured_to_structured()
  File "main.py", line 80, in unstructured_to_structured
    parser.parse(log_file)
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 255, in parse
    self.load_data()
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 291, in load_data
    self.df_log = self.log_to_dataframe(os.path.join(self.path, self.logName), regex, headers, self.log_format)
  File "/home/kupperu/Documents/CompSci/project/logparser/Drain/Drain.py", line 304, in log_to_dataframe
    for line in fin.readlines():
  File "/home/kupperu/.pyenv/versions/3.6.9/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 4536: invalid start byte

How would I go about fixing this? Apologies if this is a basic error, I'm quite new in this field.

Also I'm currently using python 3.6.9.

Thanks

Bug in LKE

## line 275 in LKE.py cause bug in python3,
## Please change it
# from 
if str(type(group[i]))=="<type 'list'>":
# to 
if type(group[i])==list:

Installation procedure

Hi,
I have two questions about logparser.

First, the installation procedure is absent.
I can not find a clear installation procedure using conda/pip. Can you please share one?

Second, support of python 3+.
2.7 is approaching EOL, and it seems natural for me to have compatibility with the current version of python.

Thanks for your comments in advance!

python 3.7 ‘s re module is incompatible with the code

python 3.7 ‘s re module is incompatible with the code,I found a solution, just change the code import re to import regex as re,but should pip3 install regex in command line first.
The previous error is as follows:
image

Parsing in a streaming manner

Hello,
I need to implement Drain (or Spell, but Drain seems to have better results) to parse logs in a streaming and timely manner to make a real-time predictive maintenance pipeline.
I read several times the code, it seems like right now the streaming manner to make a
centralized server that contains a log parser is not implemented right now, but the functions that already exist may help to do it.
For adding new logs to old parsed logs, I think that the variable "logCluL" (in Drain code) needs to be loaded from the template csv file (of the old logs) which contains the templates and rebuild the tree from the structured csv file. After that, new logs are added in the tree and each log is a new line which is appended in the structured and template csv files.
This approach seems to be good for a few runs of the parser, but not for a streaming manner where a server contains a log parser.
In my opinion, such a server needs to have the tree with the templates always in memory (RAM) but writing the logs in real-time in a database (to free the RAM usage). Maybe some blocks of the Drain code can help to do that.
Do you have another approach to propose?
Did you make any implementation of the online streaming manner that can help me?
Thank you for your answer!

Generating the ground truth data

Hello,

I plan to use the drain algorithm for the processing of my logs. l used it and it works fine, generating structured and the template files. Now I would like to check the quality of the template using your evaluation program and here I need your guidance. It expects the groundtruth files as an input. The question is how would I generate them at easiest. My plan would be to use your drain parser to generate the structured data and then take the message part of it, group e.g. in Excel and assign the event IDs to each distinct group.
Would that be a reasonable approach?

Kind Regards
Kamil

Sent with GitHawk

Drain EventId displays hexdecimal values

I am using the Drain logparser and I observed that the EventId always displays hexadecimal equivalent of the hash when written to the .csv files. Can it be instead represented as events 1, 2, 3,... and so on? I am using Python 3.7.

device log parsing

I would like to support the device log same as zookeeper log , but we have little custom in the log . Could you please direct to write the proprietary log processor .

Sort Output

Excellent -nHow do you sort the output on say occurrences descending?

thanks

[Question] Finding best Drain parameters

I was thinking to use genetic algorithms in order to find the best Drain parameters. Do you think that it is useless to use such time consuming algorithms for this problem or could it be worth it ? Did you use optimization algorithms to find the best parameters when benchmarking Drain ? And if yes, which one ?

How do we get parameterised key-value pairs in Drain?

Hi,

I have a requirement where i want the log sources after parsing automatically gets associated to some key. For instance if the log source is:
Sent 200KB from 1.1.1.1 to 2.2.2.2
I would like to create a json with following details:
properties{
"bytes_sent" : "200KB",
"srcIpV4" : 1.1.1.1,
"destIpV4" : 2.2.2.2
}

Can you please suggest a way if we can do something like these with Drain?

An escape error of re.py (re.error: bad escape \s at position 0) in "utils/logloader.py" based on Python3.7

Dear authors,
Thanks for your work. When I ran the file, "MoLFI_demo.py", based on Python3.7, I got the error as following.

Traceback (most recent call last):
  File "~/anaconda3/envs/tf/lib/python3.7/sre_parse.py", line 1021, in parse_template
    this = chr(ESCAPES[this][1])
KeyError: '\\s'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "MoLFI_demo.py", line 14, in <module>
    parser.parse(log_file)
  File "../logparser/MoLFI/MoLFI.py", line 41, in parse
    loader = logloader.LogLoader(self.log_format, self.n_workers)
  File "../logparser/utils/logloader.py", line 38, in __init__
    self.headers, self.regex = self._generate_logformat_regex(self.logformat)
  File "../logparser/utils/logloader.py", line 79, in _generate_logformat_regex
    splitter = re.sub(' +', '\s+', splitters[k])
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "~/anaconda3/envs/tf/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "~/anaconda3/envs/tf/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0

I'd be grateful if you could help me.

UnicodeDecodeError while parsing the log file

While I am using adb log file, Logparser throwing UnicodeDecodeError. Kindly help me how to resolve it.

UnicodeDecodeErrorTraceback (most recent call last)
<ipython-input-39-37c1bebf74a5> in <module>()
     19 
     20 parser = Drain.LogParser(log_format, indir=input_dir, outdir=output_dir,  depth=depth, st=st, rex=regex)
---> 21 parser.parse(log_file)
/home/jupyter/logparser/demo/logparser/logparser/Drain/Drain.py in parse(self, logName)
    283             os.makedirs(self.savePath)
    284 
--> 285         self.outputResult(logCluL)
    286 
    287         print('Parsing done. [Time taken: {!s}]'.format(datetime.now() - start_time))
/home/jupyter/logparser/demo/logparser/logparser/Drain/Drain.py in outputResult(self, logClustL)
    202             template_str = ' '.join(logClust.logTemplate)
    203             occurrence = len(logClust.logIDL)
--> 204             template_id = hashlib.md5(template_str.encode('utf-8')).hexdigest()[0:8]
    205             for logID in logClust.logIDL:
    206                 logID -= 1
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 71: ordinal not in range(128)

the problem of templates.csv is generated by Spell.py

Hi, @jimzhu

The templates.csv generated by Drain.py works fine with regexmatch.py. which is to run Drain_demo.py and logmatch_demo.py in order. However, Spell_demo.py does not match regexmatch.py.

I compare the templates created by Drain.py with that of Spell.py respectively. The big difference is '*'.

Here is the example templates with respect to HDFS_2k.log

The templates corresponding to Drain_demo.py
EventId,EventTemplate,Occurrences
dc2c74b7,PacketResponder <> for block <> terminating,311
5d5de21c,BLOCK* NameSystem.addStoredBlock: blockMap updated: <> is added to <> size <>,314
e3df2680,Received block <
> of size <> from <>,292
09a53393,Receiving block <> src: <> dest: <*>,292

The templates corresponding to Spell_demo.py
EventId,EventTemplate,Occurrences
5b95a418,* * for * * ,331
3949a200,BLOCK
* * * * *,659
67f0912e,Received block * of size * *,294
f7a1dcba,Receiving block * src * dest *,292
a8cc20dc,Deleting block * file ,263
94eb1881,
Served block * to *,80

Do I miss something which leads to such problem? Thanks

Unable to reproduce running time of parsing method.

Hello,

I am trying to reproduce your results about the running time provided by your paper on Drain (extended version) https://arxiv.org/pdf/1806.04356.pdf.
Unfortunately, I cannot reproduce the results for the running time of the IPLoM and Spell method. Indeed, I am supposed to find a running time of 447.14s for the Spell method and 140.57s for the IPLoM method. Using your implementation of IPLoM and Spell and the provided parameters by your benchmark tools, I find 817s for the IPLoM method and 34339s for the Spell method.

Do you have any idea where I did something wrong?

SLCT, error cython

Hi,
I try to use SLCT demo with the docker image of project, but the symbol Py_ZeroStruct was not defined inside a library SLCT in Git :

root@b63e6065c724:/logparser/logparser/SLCT/demo/SLCT_demo_BGL# python precision_10_times.py 
Traceback (most recent call last):
  File "precision_10_times.py", line 6, in <module>
    from SLCT_complete import *
  File "/logparser/logparser/SLCT/demo/SLCT_demo_BGL/SLCT_complete.py", line 5, in <module>
    from logparser import slct
ImportError: ../../logparser/slct.so: undefined symbol: _Py_ZeroStruct

For solve this problem, i have try to recompile this library (after i have fix import), but another problem:

root@b63e6065c724:/logparser/logparser/SLCT# python3 setup.py 
Building extenstion modules...
==============================================
logparser/slct/slct.pyx: cannot find cimported module '.cslct'
Compiling logparser/slct/slct.pyx because it changed.
[1/1] Cythonizing logparser/slct/slct.pyx

Error compiling Cython file:
------------------------------------------------------------
...
from cpython.string cimport PyString_AsString
from libc.stdlib cimport *
from .cslct cimport *
^
------------------------------------------------------------

logparser/slct/slct.pyx:3:0: relative cimport beyond main package is not allowed

Error compiling Cython file:
------------------------------------------------------------
...
    length = len(content)
    cdef int c_argc = <int> length

    cdef char ** c_argv = to_cstring_array(content)

    mainFunction(c_argc, c_argv)  #invoke the original C function
   ^
------------------------------------------------------------

logparser/slct/slct.pyx:22:4: undeclared name not builtin: mainFunction

Error compiling Cython file:
------------------------------------------------------------
...
    length = len(content)
    cdef int c_argc = <int> length

    cdef char ** c_argv = to_cstring_array(content)

    mainFunction(c_argc, c_argv)  #invoke the original C function
                        ^
------------------------------------------------------------

logparser/slct/slct.pyx:22:25: Cannot convert 'char **' to Python object
Traceback (most recent call last):
  File "setup.py", line 20, in <module>
    ext_modules = cythonize(ext_modules)
  File "/root/anaconda3/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1039, in cythonize
    cythonize_one(*args)
  File "/root/anaconda3/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 1161, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: logparser/slct/slct.pyx

How to use SLCT demo? What is the process to compile and run?

Thank you in advance

Typo in benchmark

A few benchmark python files has a typo where it says "bechmark" instead of "benchmark" for the output file name. This makes it hard to load the data to do analysis later.

Links with Loglizer project

Hello,

I am slightly confused about the links between this project and the Loglizer project. It seems to me that the whole goal of parsing logs is to use its output as input of some anomaly detection algorithm, supervised or unsupervised, and that this is handled by the Loglizer project.

However the input of the Loglizer project does not match with the output of the Logparser project. Furthermore, all demos of the Logparser project are now using HDFS.log which seemingly does not have any labeling information, whereas Loglizer expect them.

More generally, is testing the complete chain Logparser + Loglizer possible today with the code and the data provided? If not do you work on another project in order to process the output of Logparser?

Thanks for your help.

How to represent variable log input format

Dear authors,

First of all, thanks for your work.
this is more of a question, than an issue, but here it goes:

I'm trying to parse cloud foundry logs, using for example Drain or Spell.
I want to adapt your implementation to be as general as possible, for my Master's thesis, and use it for the parsing part.
Unfortunately, the experiments you've conducted using only 2.000 lines of log output, are in no relation to a realistic use case where you use 100k - 1m lines of log output.

I am using the output log format that you provide in the benchmark files for cloud foundry.
<Logrecord> <Date> <Time> <Pid> <Level> <Component> \[<ADDR>\] <Content>
The regex provided there is not doing well and produces a completely blown up template file.
So I'm using this regex, added some myself:

regex = [
        r'((\d+\.){3}\d+,?)+',
        r'/.+?\s',
        r'\d+',
        r'\[.*?\]',
        r'\[.*\]',
        r'\[.*\] \[.*\]',
        r'(/|)([0-9]+\.){3}[0-9]+(:[0-9]+|)(:|)',  # IP
        r'(?<=[^A-Za-z0-9])(\-?\+?\d+)(?=[^A-Za-z0-9])|[0-9]+$',  # Numbers
        r'\(\/.*\)'
    ]

I end up with a template file that's like the one I've attached.
openstack_val_normal_n2_templates.csv.zip

Still, as you can see in the logs for example for eventId 1, 2, 3 (I've changed your md5 EventIds to ascending ones), the directory is not being parsed well, so if I would have a log file for my model with different names of the vm instances, this won't work.
Of course, I could start putting there a regex for directories, but I'm not sure if that's the right approach.

Do you have any suggestions on how to improve the situation?

How do you actually use your model when you use it on more realistic use case? Do you have a larger collection of regexs?

Help would be much appreciated.

python 3.7 incompatible with the code

Hello I am running on my Mac OS python 3.7 and it does not run the code.

The following error I get when I try to run for example LogSig:
python LogSig_demo.py
finished loading
Parsing file: ../logs/HDFS/HDFS_2k.log
Loading logs...
Traceback (most recent call last):
File "/Users/eeaplal/anaconda3/lib/python3.7/sre_parse.py", line 1021, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\s'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "LogSig_demo.py", line 15, in
parser.parse(log_file)
File "../logparser/LogSig/LogSig.py", line 267, in parse
self.loadLog()
File "../logparser/LogSig/LogSig.py", line 43, in loadLog
headers, regex = self.generate_logformat_regex(self.para.logformat)
File "../logparser/LogSig/LogSig.py", line 254, in generate_logformat_regex
splitter = re.sub(' +', '\s+', splitters[k])
File "/Users/eeaplal/anaconda3/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Users/eeaplal/anaconda3/lib/python3.7/re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "/Users/eeaplal/anaconda3/lib/python3.7/re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "/Users/eeaplal/anaconda3/lib/python3.7/sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0

On some linux machines code runs with python 3.6
I searched a lot and I found this discussion that seems to be relevant to the problem.
Although.I do not know how to fix it.
emmett-framework/emmett#227

Any support will be appreciated.
Alex

How to handle logs separated by special characters(|,) in Drain

Hi Team,

I have few log samples which belongs to same log pattern, i.e. they should have only 1 template because the nature of their pattern. When i run them against Drain Algorithm, i see that each log line generates a new template. This is happening because we are splitting on the basis of space. But if the log line is splitted on the basis of special characters, we have different templates. Is there anyway where we can fix this as part of code? May be split by | or comma and then by space?

For instance checkout the following log files as attached.
Comma_Separated_Logs.txt
Pipe_Separated_Logs.txt
Same_Pattern_logs.txt

The Same pattern log file is interesting one as the pattern of log files is same but it still generates 2 different templates. This is because they have few extra tokens.
I read in the Drain Algo research paper that most log events have the same tokens and if they aren't then there's a way these can be handled in post processing. Can you please suggest what and how can we add post processing here to get the same patterns getting attched to same log group?

Is there any way to upload templates before going ahead of parsing in Drain?

Hi Team,

I have a use case where i need to upload/reload few templates before i proceed to parse the logs. These templates are generated by Drain on running against a particular set of events. And now i would like to reload those templates before going to parse another batch of events. Could you suggest some ways to do this?

Templates of LogMatch question

Hi,

I am running the LogMatch log parser using the sample files that you have (HDFS_2k.log).

I understand that you match the templates that you give as input to the log parser with the log lines and generate structured logs. But, how did you obtain this HDFS_2k.log_templates.csv file if you use it as input to the log parser?

Also, I after log parsing the HDFS_2k.log the templates I get as output (HDFS_2k.log_structured.csv), differ from the ones you have. Am I missing anything?

Thank you for your time.

How to extract the matched data during parsing process

@jimzhu Excellent work!

Since logparser is able to output the structured data and template expression like '<*>'. I would like to know how to write a function with the input of line and template expression, and returns the list of matched elements. The definition of function is as follows:

def extract_data(line, expression):
return list_of_matched_data

For example:

input:
#line: [52187162.990775] sd 0:0:7:0: attempting task abort! scmd(ffff88219e405b80)
#expression: [<>.<>] <> <> attempting task abort! <*>
output:
['52187162', '990775', 'sd', '0:0:7:0', 'scmd(ffff88219e405b80)']

Thanks a lot

Bug in Spell.py SimpleLoopMatch()

The original code snippet is below:

    def SimpleLoopMatch(self, logClustL, seq):
        retLogClust = None

        for logClust in logClustL:
            if float(len(logClust.logTemplate)) < 0.5 * len(seq):
                continue
            
            #If the template is a subsequence of seq
            it = iter(seq)
            if all(token in seq or token == '*' for token in logClust.logTemplate):
                return logClust

        return retLogClust

But in Spell's article, the explanation of the steps is as follows:

1) Simple loop approach. A naive method is to simply loop
through strs. For each stri, maintain a pointer pi pointing to
the head of stri, and another pointer pt pointing to the head
of σ. If the characters (or tokens in our case) pointed to by pi
and pt match, advance both pointers; otherwise only advance
pointer pt. When pt has gone to the end of σ, check if pi has
also reached the end of stri. A pruning can be applied which
is to skip stri if its length is less than 1|σ|. The worst time 2
complexity for this approach is O(m · n).

so you may compare the input string and template strings(in LCAObjects) by order, for your implementation, you only check whether the token is exists in input string.

Hava a nice day.

parsed templates

As mentioned in LogAnomaly:
The front 50% (according to the timestamps of logs) of the BGL dataset is used as the training set, which includes 257 log templates, and the rest 50% involving 503 templates is used as the testing set.

However, I got 1834 templates from Drain and 3000+ from Spell. Did I do something wrong here? Or should I filter templates that occurs one time out?

Error in slct

hi
i got this error when running SLCT_demo file

Traceback (most recent call last):
File "slct.py", line 14, in
parser.parse(log_file)
File "/home/z/Desktop/2kHDFS/HDFs/logparser/SLCT/SLCT.py", line 33, in parse
SLCT(self.para, self.log_format, self.rex)
File "/home/z/Desktop/2kHDFS/HDFs/logparser/SLCT/SLCT.py", line 46, in SLCT
stderr=subprocess.STDOUT, shell=True)
File "/usr/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'gcc -o ../logparser/SLCT/slct -O2 ../logparser/SLCT/cslct.c' returned non-zero exit status 1

More Typo

Android is spelled as "Andriod" in the log dataset and a lot of the parser files.

Drain does not extract text patterns?

Thanks for putting this together team. I have been trying to use Drain algorithm specifically and came across this issue.

'user=mike ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=mike ip=unknown-ip-addr cmd=Shutting down the object store',
'user=smith ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=smith ip=unknown-ip-addr cmd=Shutting down the object store',
'user=jackson ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=jackson ip=unknown-ip-addr cmd=Shutting down the object store',
'user=bob ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=bob ip=unknown-ip-addr cmd=Shutting down the object store'

So, ideally, the patterns look similar i.e. of the form

user=<*> ip=<*> cmd=<*>

But, the drain algorithm does not pick this up. I have tried with several params of sim_th, depth, and max_children.
It does pick up user and masks it but fails for ip and cmd and other similar text which might not be part of any dictionary. Is there a way to mask this automatically other than writing regex?

Am I missing something? Can someone help?

tuning param for RegexMatch

Hello,

If workers >1 what this really does is processing the raw file in parallel faster i suppose - with respect to RegexMatch log parser.
I would like to ask if there is any parameter for the above log parser, which we can tune so that we can change the number of templates we get?

Thank you.

Frequency vs. Cardinality in Get_Mapping_Position function of IPLoM implementation

In Get_Mapping_Position line 514. The comment says: "If the frequency of the freq_card>1 then", which is also aligned with the algorithm in the original paper.

# If the frequency of the freq_card>1 then
if maxIdx > 1:

However, in the code maxIdx is used, which, for my understanding, represents the cardinality itself and not its frequency. I believe its frequency value is in maxCount and therefore maxCount should be used to check if it's greater than 1. Or I am misinterpreting the implementation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.