Giter Site home page Giter Site logo

Comments (7)

hgascon avatar hgascon commented on August 21, 2024

Hi @who3411, thanks for your interest and PR.
Are you saying that some messages are not assigned to a cluster? Why is this a problem?

from pulsar.

who3411 avatar who3411 commented on August 21, 2024

@hgascon Thank you for your reply.

Are you saying that some messages are not assigned to a cluster?

Yes. All messages are assigned to a cluster in PRISMA. But, some messages are not assigned to a cluster in cluster_generator.R's variable clusters. Accurately, unique messages are assigned to a cluster in cluster_generator.R's variable clusters.

Why is this a problem?

None cluster appears. In my environment, there are 822 messages in itunes-xbmc.pcap. But, There aren't 822 messages in .cluster file. So, many messages are mapped None cluster.

pulsar.core.DataHandler.clusterAssignments needs to map cluster number to all messages. But, now implementation don't map cluster number to all messages. pulsar.core.DataHandler.clusterAssignments is processed in pulsar.core.DataHandler._readClusterAssignments.

def _readClusterAssignments(self):
    path = "%s.cluster" % self.datapath
    if not os.path.exists(path):
        print "Error during clustering (not enough data?)"
        print "Cluster file not generated:", path
        print "Exiting learning module..."
        sys.exit(1)

    def clusterProcessor(clusterRow):
        return clusterRow[0]
    self.clusterAssignments = self._processData(path, clusterProcessor,
                                                self.N, skipFirstLine=False)
    assert(len(self.clusterAssignments) == self.N)
    self.Ncluster = len(set(self.clusterAssignments))

pulsar.core.DataHandler._processData maps line numbers to cluster numbers in .cluster file. At this point, .cluster file should be 822 lines. But, now .cluster file is less than 822 lines. So, None cluster appears. Because part of implementation of pulsar.core.DataHandler._processData is:

def _processData(self, fname, process, init, skipFirstLine=False):
    f = file(fname, "r")
    data = csv.reader(f, delimiter="\t", quotechar=None, escapechar=None)
    if init is None:
        res = []
    else:
        res = [None] * init

from pulsar.

who3411 avatar who3411 commented on August 21, 2024

Sorry, There is supplement in the comment that I sent earlier.

Why is this a problem?

None cluster appears, and incorrect markov model is made. As I mentioned before, None cluster appears because now .cluster file is less than 822 lines. But practically, None cluster is not exists. In addition, .cluster file don't map cluster number to all messages. For this reasons, incorrect markov model is made.

from pulsar.

hgascon avatar hgascon commented on August 21, 2024

It seems that your data has many duplicates, which for efficient reasons
are filtered out (see duplicateRemover). If you want to have a
one-to-one correspondence later, you need to "explode" the labels back
to the original size... which is already done by the public method
getMatrixFactorizationLabels. The method calcDatacluster is private and,
thus, not listed in the documentation of the Prisma package.

from pulsar.

who3411 avatar who3411 commented on August 21, 2024

Thank you for the valuable information.
I overlooked the public method getMatrixFactorizationLabels. Sorry for being unfamiliar with the GitHub, should I resend new PR used getMatrixFactorizationLabels?

from pulsar.

hgascon avatar hgascon commented on August 21, 2024

Yes, please do.

from pulsar.

who3411 avatar who3411 commented on August 21, 2024

I resend new PR #22 used getMatrixFactorizationLabels. I am sorry to trouble you, but I would really appreciate it if you could confirm.

from pulsar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.