I am trying to understand the implementation of PRISMA and PULSAR for my research. <co

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I resend new PR <a class="issue-link js-issue-link" data-error-text="Failed to load ti

.cluster file need to be modified to relate the message with the cluster number about pulsar HOT 7 CLOSED

hgascon commented on August 21, 2024

.cluster file need to be modified to relate the message with the cluster number

from pulsar.

Comments (7)

hgascon commented on August 21, 2024

Hi @who3411, thanks for your interest and PR.
Are you saying that some messages are not assigned to a cluster? Why is this a problem?

from pulsar.

who3411 commented on August 21, 2024

@hgascon Thank you for your reply.

Are you saying that some messages are not assigned to a cluster?

Yes. All messages are assigned to a cluster in PRISMA. But, some messages are not assigned to a cluster in cluster_generator.R's variable clusters. Accurately, unique messages are assigned to a cluster in cluster_generator.R's variable clusters.

Why is this a problem?

None cluster appears. In my environment, there are 822 messages in itunes-xbmc.pcap. But, There aren't 822 messages in .cluster file. So, many messages are mapped None cluster.

pulsar.core.DataHandler.clusterAssignments needs to map cluster number to all messages. But, now implementation don't map cluster number to all messages. pulsar.core.DataHandler.clusterAssignments is processed in pulsar.core.DataHandler._readClusterAssignments.

def _readClusterAssignments(self):
    path = "%s.cluster" % self.datapath
    if not os.path.exists(path):
        print "Error during clustering (not enough data?)"
        print "Cluster file not generated:", path
        print "Exiting learning module..."
        sys.exit(1)

    def clusterProcessor(clusterRow):
        return clusterRow[0]
    self.clusterAssignments = self._processData(path, clusterProcessor,
                                                self.N, skipFirstLine=False)
    assert(len(self.clusterAssignments) == self.N)
    self.Ncluster = len(set(self.clusterAssignments))

pulsar.core.DataHandler._processData maps line numbers to cluster numbers in .cluster file. At this point, .cluster file should be 822 lines. But, now .cluster file is less than 822 lines. So, None cluster appears. Because part of implementation of pulsar.core.DataHandler._processData is:

def _processData(self, fname, process, init, skipFirstLine=False):
    f = file(fname, "r")
    data = csv.reader(f, delimiter="\t", quotechar=None, escapechar=None)
    if init is None:
        res = []
    else:
        res = [None] * init

from pulsar.

who3411 commented on August 21, 2024

Sorry, There is supplement in the comment that I sent earlier.

Why is this a problem?

None cluster appears, and incorrect markov model is made. As I mentioned before, None cluster appears because now .cluster file is less than 822 lines. But practically, None cluster is not exists. In addition, .cluster file don't map cluster number to all messages. For this reasons, incorrect markov model is made.

from pulsar.

hgascon commented on August 21, 2024

It seems that your data has many duplicates, which for efficient reasons
are filtered out (see duplicateRemover). If you want to have a
one-to-one correspondence later, you need to "explode" the labels back
to the original size... which is already done by the public method
getMatrixFactorizationLabels. The method calcDatacluster is private and,
thus, not listed in the documentation of the Prisma package.

from pulsar.

who3411 commented on August 21, 2024

Thank you for the valuable information.
I overlooked the public method getMatrixFactorizationLabels. Sorry for being unfamiliar with the GitHub, should I resend new PR used getMatrixFactorizationLabels?

from pulsar.

hgascon commented on August 21, 2024

Yes, please do.

from pulsar.

who3411 commented on August 21, 2024

I resend new PR #22 used getMatrixFactorizationLabels. I am sorry to trouble you, but I would really appreciate it if you could confirm.

from pulsar.

.cluster file need to be modified to relate the message with the cluster number about pulsar HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent