Giter Site home page Giter Site logo

Comments (8)

ankit-nassa avatar ankit-nassa commented on September 4, 2024

The problem in my case is i have different log sources where most of them are space separated while some of them have these special characters like pipe or comma separated. Is there any way i can handle this at runtime which works for all kind of patterns?

Also for my second query, Same pattern log, can you comment on how to make it same template group?

from logparser.

PinjiaHe avatar PinjiaHe commented on September 4, 2024

You could define different delimiters for different log sources. Another option is to define multiple delimiters:

# delimiters for white space, "=", and ":"
delimiters = '=|:|\s+'
wordL = re.split(delimiters, logmessage.strip())

For the same pattern log with extra tokens, you could merge different templates/log groups by comparing the template strings. If two templates are similar, their corresponding groups could be merged into one group.

from logparser.

ankit-nassa avatar ankit-nassa commented on September 4, 2024

Hi @PinjiaHe ,

For the same pattern log with extra tokens case.... Can you please help me on how to merge different templates/log groups by comparing the template string
For instance:

  1. A=a B=b C=c D=d(length 4)
  2. A=a' a B=b' b C=c' c D=d' d(length 8)

The above 2 log lines belong to different first layer node length 4 and length 8 respectively. What part of code should we modify to handle these cases? As per my understanding we create the first layer based on token and later look for templates matching the log line and create if it isn't existing. But we search for template in same bucket of first layer node and not all the buckets to improve performance.

Appreciate your help in this :)

from logparser.

PinjiaHe avatar PinjiaHe commented on September 4, 2024

But we search for template in same bucket of first layer node and not all the buckets to improve performance.

Right, but if you want to merge two groups, you need to add some code to link two groups to a single group (i.e., add another layer). In this case, they will output the same template. This may need some coding effort. We have implemented similar methods in

def adjustOutputCell(self, logClust, logClustL):
similarClust = None
lcs = []
similarity = -1
logClustLen = len(logClust.logTemplate)
for currentLogClust in logClustL:
currentClustLen = len(currentLogClust.logTemplate)
if currentClustLen==logClustLen or currentLogClust.outcell==logClust.outcell:
continue
currentlcs = self.LCS(logClust.logTemplate, currentLogClust.logTemplate)
currentSim = float(len(currentlcs)) / min(logClustLen, currentClustLen)
if currentSim>similarity or (currentSim==similarity and len(currentlcs)>len(lcs)):
similarClust = currentLogClust
lcs = currentlcs
similarity = currentSim
if similarClust is not None and similarity>self.para.mt:
similarClust.outcell.logIDL = similarClust.outcell.logIDL + logClust.outcell.logIDL
removeOutputCell = logClust.outcell
for parent in removeOutputCell.parentL:
similarClust.outcell.parentL.append(parent)
parent.outcell = similarClust.outcell
removeOutputCell.logIDL = None
removeOutputCell.active = False

If all your logs are in the format of "A=a B=b ...", we would suggest finding "A, B, ..." first by inspecting word frequency.

from logparser.

ankit-nassa avatar ankit-nassa commented on September 4, 2024

Hi @PinjiaHe
Thanks for the reply. Is there any doc which explains what exactly is done in DrainJournal? Also why was this approach of getting a similar cluster not done in Drain as compare to DrainJournal... Was there any performance cost or some logical issue?

from logparser.

PinjiaHe avatar PinjiaHe commented on September 4, 2024

A draft could be found here.

We did not implement it in Drain (ICWS17) because most of the logs we encountered can be handled without merging the groups. Additionally, it's possible for the merging mechanism to wrongly merge two groups, leading to other parsing errors. Thus, a suitable parameter is important.

from logparser.

ankit-nassa avatar ankit-nassa commented on September 4, 2024

Thanks @PinjiaHe . I shall try out the suggestions you have given against my log files and see how it goes. One last query i have is that currently creates a new Root Node and new logCluL for each log file. Is there any way where we can make it global and have one root node and maintain 1 logCluL for all the logs and log files we parse through Drain?

from logparser.

PinjiaHe avatar PinjiaHe commented on September 4, 2024

The easiest way is to merge the log files first.

You could also modifier the log_to_dataframe method to let it accept a list of file names as parameters, and read them all.

def log_to_dataframe(self, log_file, regex, headers, logformat):
""" Function to transform log file to dataframe
"""
log_messages = []
linecount = 0
with open(log_file, 'r') as fin:
for line in fin.readlines():
try:
match = regex.search(line.strip())
message = [match.group(header) for header in headers]
log_messages.append(message)
linecount += 1
except Exception as e:
pass
logdf = pd.DataFrame(log_messages, columns=headers)
logdf.insert(0, 'LineId', None)
logdf['LineId'] = [i + 1 for i in range(linecount)]
return logdf

from logparser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.