Giter Site home page Giter Site logo

Comments (10)

tohojo avatar tohojo commented on September 23, 2024
  1. I'd like to be able to feed multiple test files to ping_cdf and
    have all the results of all files summed up in 1 cdf trace rather than
    1 trace per file.

Well I did recently add something like that, actually. Take a look at
the 'cdf_combine' plots defined for some of the tests. It's basically a
mechanism to combine a bunch of data files into an aggregate plot, while
grouping them based on file names.

At the moment this feature is rather narrowly tailored to my own needs
when adding it: averaging several test runs into one plot. In particular
it does not have a "just use all the data points" aggregation mode.
However, extending it to support your use case should be fairly doable.

I have a plan to refactor the plotting code to, among other things, make
these kinds of features more accessible and more generally usable. I'll
be happy to consider your use case while doing so. Can you elaborate a
bit on what exactly you want to do? Is it 'just' a case of "take all the
data points from these data files and do a CDF of the union of them"?

  1. I'd like the option of processing traces wrt absolute time so that
    I could provide several traces that were taken sequentially and have
    them concatenated onto a single graph rather than show up as separate
    traces on the same graph.

Hmm, actually doing 'absolute time' would require the absolute offset to
be recorded in the data files, which it is not currently. The test time
is; but using that would probably result in some gaps between data
files, since there's some processing overhead, startup time, etc.

Of course, recording the start time is quite doable, but would prevent
you from plotting already captured data files. Perhaps falling back to
the test time as currently recorded could be an idea.

Another option could be to just add a "assume these test files are
recorded after each other and concatenate them on the time axis" feature
would also be an option.

Which solution do you think would fit your use case best? :)

Thanks for all the work you have put into this tool.

You're welcome! Just happy it's useful to others as well!

-Toke

from flent.

smithbone avatar smithbone commented on September 23, 2024

On 06/17/2014 04:52 AM, Toke Høiland-Jørgensen wrote:

  1. I'd like to be able to feed multiple test files to ping_cdf and
    have all the results of all files summed up in 1 cdf trace rather than
    1 trace per file.

Well I did recently add something like that, actually. Take a look at
the 'cdf_combine' plots defined for some of the tests. It's basically a
mechanism to combine a bunch of data files into an aggregate plot, while
grouping them based on file names.

Ok. Ill take a look. That sounds close to what I want.

be happy to consider your use case while doing so. Can you elaborate a
bit on what exactly you want to do? Is it 'just' a case of "take all the
data points from these data files and do a CDF of the union of them"?

Just an easy method of looking at the frequency of the pings across all
the data files. Right now with all the plots on one graph you can sort
of tell from the width of the resulting traces but its not quite the
same. With 20 or 30 traces its a big black blur.

I want to be able to see what the outliers are across a really long
period but without having to run a single 8 hour dataset.

  1. I'd like the option of processing traces wrt absolute time so that
    I could provide several traces that were taken sequentially and have
    them concatenated onto a single graph rather than show up as separate
    traces on the same graph.

Hmm, actually doing 'absolute time' would require the absolute offset to
be recorded in the data files, which it is not currently. The test time
is; but using that would probably result in some gaps between data
files, since there's some processing overhead, startup time, etc.

Of course, recording the start time is quite doable, but would prevent
you from plotting already captured data files. Perhaps falling back to
the test time as currently recorded could be an idea.

Another option could be to just add a "assume these test files are
recorded after each other and concatenate them on the time axis" feature
would also be an option.

Which solution do you think would fit your use case best? :)

For my use case any of the above would work. :) I don't really need to
go back and plot old datafiles so a new piece of meta data is fine.

I've actually tried the concatenate all datafiles into one big set with
some json command line tools (like jq) but its just a hair more complex
than I've been able to do with those tools. And I don't need it quite
bad enough (yet) to write a python program to do it. :) it would just be
a nice to have.

The big use case is that after 8 hours of data (32 files) I want to look
at all of them and see if there were any hotspots. If so then see about
what time and then go look at those files in more detail.

Right now I just load them all up with --gui and walk through them.

Thanks again.

Richard A. Smith

from flent.

tohojo avatar tohojo commented on September 23, 2024

I want to be able to see what the outliers are across a really long
period but without having to run a single 8 hour dataset.

Right, that seems reasonable. Shouldn't be too hard to implement.

For my use case any of the above would work. :) I don't really need to
go back and plot old datafiles so a new piece of meta data is fine.

Right-oh. Still would like to at least fall back to something else
gracefully, though. Don't have a nice way to express "this is a data
file, but it's slightly incompatible with this function" in the UI.

Since just concatenating all the data is the simplest to implement (and
will also work for the CDFs), I'll probably start with that, then see if
adding timestamping and using that turns out to be useful later on...

-Toke

from flent.

smithbone avatar smithbone commented on September 23, 2024

On 06/17/2014 11:04 AM, Toke Høiland-Jørgensen wrote:

I want to be able to see what the outliers are across a really long
period but without having to run a single 8 hour dataset.

Right, that seems reasonable. Shouldn't be too hard to implement.

For my use case any of the above would work. :) I don't really need to
go back and plot old datafiles so a new piece of meta data is fine.

Right-oh. Still would like to at least fall back to something else
gracefully, though. Don't have a nice way to express "this is a data
file, but it's slightly incompatible with this function" in the UI.

yeah. Dealing with multiple versions of data files and new features is
always a PITA.

Since just concatenating all the data is the simplest to implement (and
will also work for the CDFs), I'll probably start with that, then see if
adding timestamping and using that turns out to be useful later on...

Looking at the metadata you save now I see that you already have the
time saved in the TIME: field.

Could you you do something as simple as just converting that value to
absolute seconds and add it to the value recorded in the x-axis?

Having the epoch type timestamp is kind of ugly for the time value but
you could do something like sort the data by timestamp and then subtract
off the lowest value from each data point to bring it back to a zero offset.

Richard A. Smith

from flent.

tohojo avatar tohojo commented on September 23, 2024

yeah. Dealing with multiple versions of data files and new features is
always a PITA.

Quite. I'm already way too lax in this regard, but would like to at
least try to limit it somewhat... :)

Looking at the metadata you save now I see that you already have the
time saved in the TIME: field.

Yeah. That was my plan for the fallback. However, the value recorded in
the TIME field is the time when netperf-wrapper starts up. There can be
some gap between this value and the first data point actually output by
the test command, depending on lots of things from test configuration to
simple startup times of the commands and monitor threads etc.

So the idea was to add a second timestamp corresponding to the absolute
value of x==0 and use that if available, falling back to the value in
TIME if not.

-Toke

from flent.

smithbone avatar smithbone commented on September 23, 2024

On 06/18/2014 08:39 AM, Toke Høiland-Jørgensen wrote:

Quite. I'm already way too lax in this regard, but would like to at
least try to limit it somewhat... :)

Over the years I've learned that I should just take the small complexity
hit and add a version or revision indicator to any file format or
database structure I make no matter how simple or temporary I think its
going to be. If I end up not using it then I consider it a win in my
ability to predict the requirements and specification. :)
Most often though I find I need to rev it and having a file version
indicator makes that so much nicer.

The switch to doing most of those type things in python vs C also helps
since it so much easier to do parsing in python.

So the idea was to add a second timestamp corresponding to the absolute
value of x==0 and use that if available, falling back to the value in
TIME if not.

Nod. Great. Let me know when you decide to work on this. I'll have
lots of data to test with.

Richard A. Smith

from flent.

tohojo avatar tohojo commented on September 23, 2024

Nod. Great. Let me know when you decide to work on this. I'll have
lots of data to test with.

Just committed the concatenation and absolute time plot features. Let me
know if you have any problems with it. :)

-Toke

from flent.

smithbone avatar smithbone commented on September 23, 2024

On 07/02/2014 08:48 AM, Toke Høiland-Jørgensen wrote:

Nod. Great. Let me know when you decide to work on this. I'll have
lots of data to test with.

Just committed the concatenation and absolute time plot features. Let me
know if you have any problems with it. :)

Appears to work perfectly and exactly what I needed. Thank you!

Interesting enough --absolute-time without --concatenate gives a nice
display where every file is a different color/marker and the legend
shows the filename. Handy if you want to be able to go look at just
that section.

The only suggestion I would have would be a --no-markers or --no-symbols
or some other name where it only plots the lines and not the point
markers. With lots of short files the markers just clutter the image.

Thanks again for such a useful tool.

Richard A. Smith

from flent.

tohojo avatar tohojo commented on September 23, 2024

Appears to work perfectly and exactly what I needed. Thank you!

Great!

Interesting enough --absolute-time without --concatenate gives a nice
display where every file is a different color/marker and the legend
shows the filename. Handy if you want to be able to go look at just
that section.

Yes, that just converts all x values to absolute UNIX time. Note that it assumes that the file names are supplied in chronological order. Not sure what will happen if they're not... Probably some spurious null data points will get added at the very least. Also, for all I know, matplotlib will blow up...

The only suggestion I would have would be a --no-markers or
--no-symbols
or some other name where it only plots the lines and not the point
markers. With lots of short files the markers just clutter the image.

Yeah, I see your point. Added a --no-markers option; will push the change once I get my laptop near an internet connection :)

Thanks again for such a useful tool.

You're very welcome! :)

-Toke

from flent.

smithbone avatar smithbone commented on September 23, 2024

On 07/02/2014 01:31 PM, Toke Høiland-Jørgensen wrote:

The only suggestion I would have would be a --no-markers or
--no-symbols
or some other name where it only plots the lines and not the point
markers. With lots of short files the markers just clutter the image.

Yeah, I see your point. Added a --no-markers option; will push the
change once I get my laptop near an internet connection :)

Sweet. Thanks. With that it's perfect! :)

Richard A. Smith

from flent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.