Comments (7)
Hi @who3411, thanks for your interest and PR.
Are you saying that some messages are not assigned to a cluster? Why is this a problem?
from pulsar.
@hgascon Thank you for your reply.
Are you saying that some messages are not assigned to a cluster?
Yes. All messages are assigned to a cluster in PRISMA. But, some messages are not assigned to a cluster in cluster_generator.R
's variable clusters
. Accurately, unique messages are assigned to a cluster in cluster_generator.R
's variable clusters
.
Why is this a problem?
None
cluster appears. In my environment, there are 822 messages in itunes-xbmc.pcap
. But, There aren't 822 messages in .cluster file. So, many messages are mapped None
cluster.
pulsar.core.DataHandler.clusterAssignments
needs to map cluster number to all messages. But, now implementation don't map cluster number to all messages. pulsar.core.DataHandler.clusterAssignments
is processed in pulsar.core.DataHandler._readClusterAssignments
.
def _readClusterAssignments(self):
path = "%s.cluster" % self.datapath
if not os.path.exists(path):
print "Error during clustering (not enough data?)"
print "Cluster file not generated:", path
print "Exiting learning module..."
sys.exit(1)
def clusterProcessor(clusterRow):
return clusterRow[0]
self.clusterAssignments = self._processData(path, clusterProcessor,
self.N, skipFirstLine=False)
assert(len(self.clusterAssignments) == self.N)
self.Ncluster = len(set(self.clusterAssignments))
pulsar.core.DataHandler._processData
maps line numbers to cluster numbers in .cluster file. At this point, .cluster file should be 822 lines. But, now .cluster file is less than 822 lines. So, None
cluster appears. Because part of implementation of pulsar.core.DataHandler._processData
is:
def _processData(self, fname, process, init, skipFirstLine=False):
f = file(fname, "r")
data = csv.reader(f, delimiter="\t", quotechar=None, escapechar=None)
if init is None:
res = []
else:
res = [None] * init
from pulsar.
Sorry, There is supplement in the comment that I sent earlier.
Why is this a problem?
None
cluster appears, and incorrect markov model is made. As I mentioned before, None
cluster appears because now .cluster file is less than 822 lines. But practically, None
cluster is not exists. In addition, .cluster file don't map cluster number to all messages. For this reasons, incorrect markov model is made.
from pulsar.
It seems that your data has many duplicates, which for efficient reasons
are filtered out (see duplicateRemover). If you want to have a
one-to-one correspondence later, you need to "explode" the labels back
to the original size... which is already done by the public method
getMatrixFactorizationLabels. The method calcDatacluster is private and,
thus, not listed in the documentation of the Prisma package.
from pulsar.
Thank you for the valuable information.
I overlooked the public method getMatrixFactorizationLabels
. Sorry for being unfamiliar with the GitHub, should I resend new PR used getMatrixFactorizationLabels
?
from pulsar.
Yes, please do.
from pulsar.
I resend new PR #22 used getMatrixFactorizationLabels
. I am sorry to trouble you, but I would really appreciate it if you could confirm.
from pulsar.
Related Issues (20)
- Question; how to capture using mysqldump or derrick HOT 1
- Bug; KeyError: '1' HOT 3
- Pulsar fails on certain PCAP files HOT 2
- Tips for installation HOT 1
- Error in rep(1:N,Sapply(ngrams[-total],length)): invalid times argument HOT 6
- Native SSL/TLS Support HOT 1
- βError during clustering (not enough data?)β HOT 2
- Error while running
- Err: No active connection to close HOT 3
- Bug in filter.py? HOT 3
- Error in 1:ncol(data) : argument of length 0 HOT 2
- [Question] Hidden Markov Models HOT 1
- TypeError: Expected str, got bytes HOT 5
- The pcap file size is about 300M, Error in rep(1:N, sapply(ngrams[-total], length)) : invalid 'times' argument HOT 1
- TimeoutError: [Errno 110] Connection timed out HOT 1
- test error? HOT 4
- itunes-xbmc fuzzing example HOT 2
- Problem when running pulsar in fuzzing mode HOT 1
- Unable to run learning mode HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pulsar.