toshsan / caliper Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 8.12 MB

Automatically exported from code.google.com/p/caliper

License: Apache License 2.0

Java 99.97% Shell 0.03%

caliper's People

Contributors

Watchers

caliper's Issues

Create a Google spreadsheet from webapp

(a) Create a Google spreadsheet representing this view I've configured
(b) Create a Google spreadsheet containing a raw data dump

How to do this with respect to authentication and such I don't know, but this 
would be an awesome feature, possibly more useful than just the ascii thing.

Original issue reported on code.google.com by [email protected] on 16 Jun 2010 at 5:04

caliperrc option for web browser

if set, new benchmark results would automatically open in a new tab in the
browser.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:57

"Pluggable" result processors

One result processor might wish to post results to BRRD or to Sponge, and 
others might want to do god knows what.

One approach: do nothing, except spit out XML results to a filename 
specified on the command line. Other tools can subsequently read that.

Possibly-nicer approach: let result processor classes be set in caliperrc or 
wherever; then they could be passed real live Java objects instead of 
spending a lot of effort re-parsing XML.

I see us just doing the first approach for now, but I don't know long-term..
just filing this for future reference I guess.

Original issue reported on code.google.com by [email protected] on 19 Jan 2010 at 5:32

Caliper doesn't work if classes are on the boot classpath, use native libraries, etc.

When we fork the VM, we pass only the classpath of the original process. This 
works for most benchmarks, but cannot support benchmarks that load classes in 
the bootclasspath.

Original issue reported on code.google.com by [email protected] on 29 Jun 2010 at 8:41

Do more per VM invocation

Suppose I'm testing 4 x 3 different parameter values against 5 different 
benchmarks (different time- methods in the same class) on 2 vms.  
Currently, to get one measurement each, we'll run 4x3x5x2=120 vm 
invocations.  I think 10 would be enough -- vms times benchmarks, and let 
each run handle all 4 x 3 parameter combinations for that (vm,benchmark) 
pair.

The problem with the way it is today is that hotspot can optimize away 
whole swaths of implementation code that doesn't happen to get exercised by 
the *one* scenario we run it with. By warming up all 12 of these benchmark 
instances, it should have to compile to something more closely resembling 
real life (maybe).  And with luck, we can avoid the expense of repeating 
the warmup period 12 times over.

After warming up all the different scenarios and then starting to do trials 
of one of them, I'm not sure if we need to worry about hotspot deciding to 
*re*compile based on the new favorite scenario.  If that happens, maybe it 
makes sense for us to round-robin through the scenarios as we go...... 
we'll see.

I'm also not sure how concerned we need to be that the order the scenarios 
are timed in can unduly affect the results. It could be that for each 
"redundant" measurement we take, we vary up the order (e.g. we rotate it?) 
in order to wash that out.  Or maybe there's no problem with this; I dunno.

Original issue reported on code.google.com by [email protected] on 22 Jan 2010 at 10:53

"Publish these results as gadget"

Whenever I have a view of results that I'd like other people to see too, I 
could click something to create a gadget URL, which I can then easily embed in 
all kinds of things.

http://code.google.com/apis/gadgets/

Original issue reported on code.google.com by [email protected] on 9 Jul 2010 at 12:32

Detect significant events in child VM for logging

Once we are logging the full "play-by-play", it'd be helpful to turn on 
-verbosegc and -
XX:PrintInlining (or whatever it's called) and options like that, and watch for 
these messages in the 
child processes stdout so they can be logged.   The goal for the logging is to 
really give you a clear 
picture of "what's happening", and this is a key part of that.

Original issue reported on code.google.com by [email protected] on 22 Jan 2010 at 12:43

Rethink command line reporting

A few ideas from issue 3, 

Jesse:
Think of a means to show multiple results, either by reporting the standard 
deviation, or ASCII box plots:
  foo                     [--|-]----|
  bar                     [-|--]
  baz                        |-[--|--]
  quux             |---[---|--]--|

Elliott:
1. i wouldn't bother with ASCII-art graphs.
2. i might show scale factors like "2.3x" instead (though this might lead to me 
wanting a way to @Annotate the method that represents my baseline).
3. i would always use the same time units. switching between ms and us is all 
well and good, but it's a pain for this kind of use because it makes the output 
less directly comparable.
4. i'd include a timestamp showing when the run was performed, and i'd include 
a duration showing how long the benchmark run took.
5. i might include a hash of the test source, so i could easily distinguish 
results i shouldn't necessarily be comparing.
As for copy & paste, i'll always want to copy & paste some text form, because 
it's the only thing i trust to last and because it's an extra level of 
indirection (which is fine for "show more 
detail" but not for an overview in a check-in comment). the web's a nice 
optional extra, but that's all.

Original issue reported on code.google.com by [email protected] on 19 Jan 2010 at 7:27

Finish providing user error messages

Caliper should never throw a stack trace out to the user's console unless
it originated from the user's code. We must catch anything that goes wrong
in our code and communicate it properly.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:19

Report how much memory is allocated per rep

Users need to understand if one variant of their benchmark allocates more 
memory than another. I believe we cannot assume that simply letting GC happen 
during the timing loop will properly account for this cost.

We would almost certainly use this:
http://code.google.com/p/java-allocation-instrumenter/

And we'd do it as a separate test, not while actual benchmarking is happening.

We may or may not want to try to conceive this as a "pluggable measurer" as 
described in http://code.google.com/p/caliper/issues/detail?id=4 ... what 
matters is that we offer the feature somehow.

Original issue reported on code.google.com by [email protected] on 8 Jul 2010 at 4:02

Which situations should cause caliper to error out when timing?

From Robert Henry:

I run caliper for the first time on a loop that runs for 12 seconds.
OK, so I'm ambitious, and hardly "micro" benchmark.  I get:
 An exception was thrown from the benchmark code.
 com.google.caliper.ConfigurationException: Runtime 1.2537863806E10
out of range
       at com.google.caliper.Caliper.warmUp(Caliper.java:71)
       at com.google.caliper.InProcessRunner.run(InProcessRunner.java:54)
       at com.google.caliper.InProcessRunner.main(InProcessRunner.java:68)

Original issue reported on code.google.com by [email protected] on 30 Jun 2010 at 4:14

enable running Tutorial as main()

Patch attached.

I do see an exception when running:

Exception in thread "main" 
com.google.caliper.UserException$ExceptionFromUserCodeException
    at com.google.caliper.Runner.runOutOfProcess(Runner.java:161)
    at com.google.caliper.Runner.run(Runner.java:46)
    at tutorial.Tutorial1.main(Tutorial1.java:18)
Caused by: com.google.caliper.ConfigurationException: size has no values
    at com.google.caliper.ScenarioSelection.prepareParameters(ScenarioSelection.java:132)
    at com.google.caliper.ScenarioSelection.select(ScenarioSelection.java:69)
    at com.google.caliper.Runner.runOutOfProcess(Runner.java:148)
    ... 2 more

Original issue reported on code.google.com by [email protected] on 5 Jul 2010 at 4:35

Attachments:

Tutorial.java.diff

"copy ASCII art to clipboard" option in web UI

there should be a "copy ASCII art to clipboard" option in the web UI.

Original issue reported on code.google.com by [email protected] on 7 Jun 2010 at 7:21

Don't force treating implementations like parameters

Here's how we handle multiple implementations today:

http://code.google.com/p/caliper/source/browse/trunk/src/examples/SetContai
nsBenchmark.java?spec=svn91&r=91#41

  @Param private Impl impl;
  public enum Impl {
    Hash {
      @Override Set<Element> create(Collection<Element> contents) {
        return new HashSet<Element>(contents);
      }
    },
    LinkedHash {
      @Override Set<Element> create(Collection<Element> contents) {
        return new LinkedHashSet<Element>(contents);
      }
    },
     . . .

For starters, it's cumbersome. Also, according to the issue I'm about to 
file next :), it may make sense for us to run all the various parameter 
combinations for a single implementation in just one VM invocation, but 
this would not make sense for implementation.

What if, instead, we allow for this:

  public void setUpFoo() { ... }
  public int timeFoo() { ... }
  public void setUpBar() { ... }
  public int timeBar() { ... }

In fact, sometimes timeFoo() and timeBar() would be identical, so we could 
even support

  public void setUpFoo() { ... }
  public void setUpBar() { ... }

  public int time() { ... }

This might get confusing, but I think the behavior could be fairly simply 
defined in terms of the method names.  The benchmarks to run are the union 
of the names that appear after "setUp" and those that appear after "time"; 
for each, run "setUpName" if it exists, else "setUp"; then run "timeName" 
if it exists, else "time".

Original issue reported on code.google.com by [email protected] on 22 Jan 2010 at 10:45

use consistent units

run 1:

           benchmark       name     us logarithmic runtime
    _Charset_forName     UTF-16   1.43 ||||||||||||||||
    _Charset_forName      UTF-8   1.44 ||||||||||||||||
    _Charset_forName       UTF8 220.73 XXXXXXXXXXXXXXXXX|||||||||||
    _Charset_forName ISO-8859-1   1.59 |||||||||||||||||
    _Charset_forName     8859_1   1.42 ||||||||||||||||
    _Charset_forName ISO-8859-2   1.58 |||||||||||||||||
    _Charset_forName   US-ASCII   1.52 |||||||||||||||||
    _Charset_forName      ASCII   1.40 ||||||||||||||||
    _String_getBytes     UTF-16 123.59 XXXXXXXXX||||||||||||||||||
    _String_getBytes      UTF-8  89.68 XXXXXXX|||||||||||||||||||
    _String_getBytes       UTF8 379.60 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    _String_getBytes ISO-8859-1  98.77 XXXXXXX|||||||||||||||||||
    _String_getBytes     8859_1 107.22 XXXXXXXX|||||||||||||||||||
    _String_getBytes ISO-8859-2 104.41 XXXXXXXX||||||||||||||||||
    _String_getBytes   US-ASCII  99.28 XXXXXXX|||||||||||||||||||
    _String_getBytes      ASCII  97.75 XXXXXXX|||||||||||||||||||
         _new_String     UTF-16  92.62 XXXXXXX|||||||||||||||||||
         _new_String      UTF-8   7.60 ||||||||||||||||||||
         _new_String       UTF8   7.59 ||||||||||||||||||||
         _new_String ISO-8859-1   5.12 |||||||||||||||||||
         _new_String     8859_1  97.34 XXXXXXX|||||||||||||||||||
         _new_String ISO-8859-2  98.80 XXXXXXX|||||||||||||||||||
         _new_String   US-ASCII  88.92 XXXXXXX|||||||||||||||||||
         _new_String      ASCII  94.12 XXXXXXX|||||||||||||||||||

run 2 (with a small change so Charset.forName caches non-alias, non-canonical 
names like "UTF8" too):

           benchmark       name      ns logarithmic runtime
    _Charset_forName     UTF-16    1409 ||||||||||||||||||
    _Charset_forName      UTF-8    1443 ||||||||||||||||||
    _Charset_forName       UTF8     894 |||||||||||||||||
    _Charset_forName ISO-8859-1    1546 ||||||||||||||||||
    _Charset_forName     8859_1     896 |||||||||||||||||
    _Charset_forName ISO-8859-2    1541 ||||||||||||||||||
    _Charset_forName   US-ASCII    1529 ||||||||||||||||||
    _Charset_forName      ASCII    1368 ||||||||||||||||||
    _String_getBytes     UTF-16  123319 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    _String_getBytes      UTF-8   91527 XXXXXXXXXXXXXXXXXXXXXX|||||||
    _String_getBytes       UTF8   93828 XXXXXXXXXXXXXXXXXXXXXX|||||||
    _String_getBytes ISO-8859-1   96223 XXXXXXXXXXXXXXXXXXXXXXX||||||
    _String_getBytes     8859_1   97383 XXXXXXXXXXXXXXXXXXXXXXX||||||
    _String_getBytes ISO-8859-2   97833 XXXXXXXXXXXXXXXXXXXXXXX||||||
    _String_getBytes   US-ASCII   97084 XXXXXXXXXXXXXXXXXXXXXXX||||||
    _String_getBytes      ASCII   98335 XXXXXXXXXXXXXXXXXXXXXXX||||||
         _new_String     UTF-16   90155 XXXXXXXXXXXXXXXXXXXXX||||||||
         _new_String      UTF-8    7595 X|||||||||||||||||||||
         _new_String       UTF8    7532 X|||||||||||||||||||||
         _new_String ISO-8859-1    5083 X||||||||||||||||||||
         _new_String     8859_1   92862 XXXXXXXXXXXXXXXXXXXXXX|||||||
         _new_String ISO-8859-2   97338 XXXXXXXXXXXXXXXXXXXXXXX||||||
         _new_String   US-ASCII   93932 XXXXXXXXXXXXXXXXXXXXXX|||||||
         _new_String      ASCII   95531 XXXXXXXXXXXXXXXXXXXXXXX||||||

and now i have to use my brain to compare things because one set's all 
microseconds and the other's all nanoseconds.

Original issue reported on code.google.com by [email protected] on 7 Jun 2010 at 9:38

Multiple measurements per scenario

Multiple measurements per scenario (same measurer). For purposes of console
display, they can be averaged (?), but the xml result sent to the server
should contain all measurements; it's the job of the visualization package
to decide how to summarize all the data.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:32

Multiple "trials" per scenario in a single run

Unless running in dirty mode we should never show the user the result from 
just a single measurement in a single VM run. This is likely to be 
misleading. I'd prefer we default to five runs. Over time we can take a 
statistical look at how much variance is really happening in these runs.

There are two ways to handle those five independent runs in the web UI:

(a) average them together and show one result, with those funny error bar 
things.
(b) show them as five different runs, but give the user some ability to 
combine runs into one with those funny error bar things.

I don't think there's much value in showing those error bar things (box 
plots?) just based on the variations within a single vm run, like we 
currently do.

I'm not sure I want to give the user the ability to arbitrarily pick runs 
to combine into an uber-run; it would just be too easy to cheat and cherry-
pick the favorable ones.

Original issue reported on code.google.com by [email protected] on 7 Jun 2010 at 5:53

how to use proxy for uploading result

I am working behind a gateway and when uploading test result during caliper 
test, following error will be outputted:
java.lang.RuntimeException: java.net.UnknownHostException: 
microbenchmarks.appspot.com at 
com.google.caliper.Runner.postResults(Runner.java:83)

I seems I need setup a network proxy for caliper, but how to setup it? In 
caliper wiki and doc, I can not find a answer for it.

Can you tell me how to do it?

Original issue reported on code.google.com by [email protected] on 1 Jul 2010 at 8:38

What to do with stdout/stderr from user code

I think that are setting stdout and stderr to a null output stream.  We might 
want to capture user output somewhere instead.  As well, the user might have 
specified JVM flags that produce stdout, and that will have to go somewhere 
somehow.  This is vague, but I had to get it captured here somehow.

Original issue reported on code.google.com by [email protected] on 26 Jun 2010 at 12:08

First class Android support

I'd like to make it quite easy for developers to benchmark code on Android 
devices.

One option is to use DalvikRunner. Currently we Dalvik folks have running 
Caliper by plugging it into this framework. This is an internal tool owned by 
the Dalvik team that knows how to do the following:
1. compile .java files (for a JVM or Dalvik VM)
2. build classpaths (including coping code to a phone when necessary)
3. fork an executable on an arbitrary VM (desktop or device)
4. inspect the result, and compare it with an expectations file
5. do the above in aggregate for large suites of code
6. report the results for consumption by Hudson

The nice thing about DalvikRunner is that it's easy to drive. It makes Java 
programming kinda like scripting because it merges the compile and run 
steps:
  dalvikrunner com/foo/Foo.java

The drawback of this approach is that DalvikRunner is a logically separate 
project, which means folks interested in Caliper+Android would need both 
tools.

Original issue reported on code.google.com by limpbizkit on 14 Jan 2010 at 3:02

Alternate measurers

Our current elapsed time measurer should be the default measurer only,
which the user could override at the command line (or caliperrc?).  It may
suffice to do just one measurer for a run, but then again, it also might be
simpler overall to just treat it like every other variable -- specify as
many as you want and caliper will try them all.

Alternate measurers could be used for 
- taking other kinds of measurements
- taking the same kind of measurement but in a different way

Either way, it's essentially the same deal.  Every measurer needs a unique
name that will be reported.  The measurement made might be a simple
"quantity + unit" like the elapsed-time measurer makes, or it could be
something more...

This issue is resolved once we have *some* alternate measurer to use, say
CPU time.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:43

Ignore column in web ui

I want to be able to ignore certain columns in the web UI. For example, I ran 
the same benchmark on the Sun VM and on the Dalvik VM, and wanted to see them 
side by side to compare the general shapes of the charts. This could be done by 
ignoring the "vm" column.

Original issue reported on code.google.com by [email protected] on 25 Jun 2010 at 10:33

Hotspot - no background compilation!

According to Chuck Rasbold, Caliper should definitely be specifying -Xbatch to 
the subprocess so that JIT compilation will happen in the user thread instead 
of in some background thread in parallel with timing.  This also means that 
when we start logging compilation events (issue 30), we'll actually know when 
compilation happened, not just when it started.

Original issue reported on code.google.com by [email protected] on 9 Jun 2010 at 9:11

command-line tool should know what version it is

... and report this in the result XML.

Quite possibly, this version can just be "dev" when working with your own
local build, but when any official release is made, a real version number
needs to be set somehow.  Probably not too hard to make ant do this.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 11:06

idea plugin: "new benchmark"

new benchmark template

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:56

Report a quality score for each measurement

jesse's convinced me that i shouldn't be doing what i was trying to do the
other day (use "reps" as a convenient source of not-exactly-randomness),
but i think the exception could be improved. two suggestions:

1. show the actual value.

2. differentiate between the two cases, and maybe explain. presumably
sub-linear means "optimized out" but supra-linear means "you've made the
cost of an individual rep proportional to the total number of reps".

i can easily make these changes if you like, but i thought i'd check you
agree they make sense, since i've already been caught doing the wrong thing
here (and wasn't immediately convinced my benchmark was unreasonable,
though i now agree it was a bad idea).

Original issue reported on code.google.com by [email protected] on 12 Jan 2010 at 7:11

"Verbose" output / logging

I should have the ability to watch the full "play-by-play" of everything 
caliper is up to.  My 
preferred approach would be to write this information to a file and always keep 
console output 
uber-neat and clean.  I'm thinking JUnit-style dots -- seriously.

The easiest thing to do would be to read a logfile setting from .caliperrc.  
And not worry about 
rotating it etc. for now. 

I don't see any benefit to using java.util.logging for this; and we might still 
want to use j.u.l for 
regular debug-style logging that isn't really user-oriented.

Original issue reported on code.google.com by [email protected] on 21 Jan 2010 at 10:59

Merged into: #30

user code exceptions lost, turned into internal errors?

worse than issue 2, if the user code throws an exception, caliper *really*
needs to show me that.

i had something like:

    public void timeDns(int reps) throws Exception {
        InetAddress.getByName("unknown.host.exception.mtv.corp.google.com");
    }

and got this:

FAIL examples.DnsBenchmark (EXEC_FAILED)
  Executing examples.DnsBenchmark
  Jan 15, 2010 10:22:41 PM com.ibm.icu4jni.util.Resources
createTimeZoneNamesFor
  INFO: Loaded time zone names for en_US in 229ms.
  java.lang.Throwable: stack dump
    at java.lang.Thread.dumpStack(Thread.java:612)
    at com.ibm.icu4jni.util.Resources.createTimeZoneNamesFor(Resources.java:250)
    at com.ibm.icu4jni.util.Resources.access$200(Resources.java:34)
    at
com.ibm.icu4jni.util.Resources$DefaultTimeZones.<clinit>(Resources.java:200)
    at com.ibm.icu4jni.util.Resources.getDisplayTimeZone(Resources.java:163)
    at java.util.TimeZone.getDisplayName(TimeZone.java:277)
    at java.util.TimeZone.getDisplayName(TimeZone.java:250)
    at java.util.Date.toString(Date.java:722)
    at java.util.Properties.store(Properties.java:549)
    at com.google.caliper.Runner.getExecutedByUuid(Runner.java:63)
    at com.google.caliper.Runner.runOutOfProcess(Runner.java:170)
    at com.google.caliper.Runner.run(Runner.java:75)
    at com.google.caliper.Runner.main(Runner.java:214)
    at dalvik.runner.CaliperRunner.test(CaliperRunner.java:28)
    at dalvik.runner.TestRunner.run(TestRunner.java:76)
    at dalvik.runner.CaliperRunner.main(CaliperRunner.java:36)
    at dalvik.system.NativeStart.main(Native Method)

   0% Scenario{vm=dalvikvm, benchmark=Dns}                           Jan
15, 2010 10:22:41 PM java.io.BufferedReader <init>
  INFO: Default buffer size used in BufferedReader constructor. It would be
better to be explicit if an 8k-char buffer is required.
  An exception was thrown from the benchmark code.
  java.lang.NullPointerException
    at
org.apache.harmony.luni.util.FloatingPointParser.parseDouble(FloatingPointParser
.java:263)
    at java.lang.Double.parseDouble(Double.java:285)
    at java.lang.Double.valueOf(Double.java:324)
    at com.google.caliper.Runner.executeForked(Runner.java:137)
    at com.google.caliper.Runner.runOutOfProcess(Runner.java:179)
    at com.google.caliper.Runner.run(Runner.java:75)
    at com.google.caliper.Runner.main(Runner.java:214)
    at dalvik.runner.CaliperRunner.test(CaliperRunner.java:28)
    at dalvik.runner.TestRunner.run(TestRunner.java:76)
    at dalvik.runner.CaliperRunner.main(CaliperRunner.java:36)

Original issue reported on code.google.com by [email protected] on 15 Jan 2010 at 10:41

compute a hash of user code

compute a hash -- somehow -- that will change whenever the code that's
under test changes.  haven't thought about this much.  it could perhaps
only hash each class that actually gets loaded (then sum all these hashes,
so that loading order doesn't affect things?).

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 11:04

Integer overflow in warmup causing extremely long run

When a benchmark method's loop gets optimized out or the the warmup time is 
long enough relative to the time for each execution, the reps int can overflow 
and then become 0. The loop will then continue for a very long time, adding 
very little to the elapsedNanos each time. Caliper appears to hang.

I've attached a patch with a test that should expose the issue, both for the 
optimized out case and the long warmup time/fast execution case. I've also 
attached a possible fix for it.

Original issue reported on code.google.com by [email protected] on 29 Jun 2010 at 4:46

Attachments:

environment detection/reporting

bring over code from labs.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:52

idea plugin: run

On ctrl-shift-f10 ("run"), detect that this is a caliper class, and run it
via caliper.

Should bring up a dialog much like the run-application dialog, and just
show raw console output... not much fancier than that for now; in the
future we could make it more user-friendly in lots of ways (that should all
be filed separately).

Should need just one configuration setting, caliper_home.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:56

Should posting online be opt-in or opt-out?

Perhaps it should be opt-out, with a very simple one liner to enable online 
posting. When a user 
without a .caliperrc runs a benchmark, perhaps we could print an opt-in-please 
advertisement 
after the results:

  Warning: results not saved. To save results online automatically, run this once:
      "caliper --always-save-results-online"

We could also direct them to a URL like 
http://microbenchmarks.appspot.com/signin which would 
allow the local workstation to associate itself with the user's Google account.

Original issue reported on code.google.com by limpbizkit on 8 Jan 2010 at 7:15

Detect that caliper didn't get the cpu's "full attention"?

Should we be able to report that the machine seemed otherwise occupied 
during the test run, so the test results could be viewed as more "dirty" 
than others?

Original issue reported on code.google.com by [email protected] on 11 Jan 2010 at 8:32

Bar graph normalization in webapp

When logarithmic making a bar graph, choose minimum and maximum bar lengths 
(based on screen size only), then normalize the data so that the min and max 
values will be those lengths.

Original issue reported on code.google.com by [email protected] on 11 Jan 2010 at 9:05

Streamline and document the steps required to run the same benchmark both on local machine and on your phone

see summary.

This would be a pretty sexy illustration of what caliper can do.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:47

Easier command-line syntax for alternate params?

Sure would rather do 

-Dlist2Size=1,10,100

than

-Dlist2Size=1 -Dlist2Size=10 -Dlist2Size=100

but it's unclear how exactly to do this so as not to conflict with
parameter values that might actually contain literal commas?

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:53

support JDK optimization use case

If I have a rewrite to a method of, say, ArrayList, I'd like it to be easy 
to compare its performance with that of the installed JDK.

In fact, I might make a low-level tweak to the JDK and want to run all 
kinds of existing benchmarks against that to look for any improvements or 
degradations that result.

Roughly, the benchmark could point at the location of the alternate JDK 
implementation class(es), and we'd have to repackage those classes in 
jarjar-like fashion then prepend them to the bootclasspath.

I know this seems to be crossing a line in terms of the simplicity vs. 
magic quotient of caliper, but it feels useful to me, not just for the JDK 
itself but other third-party libraries (in which case it wouldn't be 
bootclasspath, just classpath).

One complication is that the JDK's implementation classes will continue to 
evolve while my local tweaked copy will just remain inert...

Original issue reported on code.google.com by [email protected] on 11 Jan 2010 at 8:31

add main() to BitSetBenchmark

Patch attached. Thanks

Original issue reported on code.google.com by [email protected] on 5 Jul 2010 at 4:49

Attachments:

BitSetBenchmark.java.diff

add links to pages of warnings/advice to benchmark results pages

Also feature this prominently on the web site.

A surgeon general's warning:  this is only microbenchmarking, and 
performance entails much bigger issues than just this. Microbenchmarks lie. 
Here are common gotchas to avoid. Etc.

We get Josh to help us with it.

Original issue reported on code.google.com by [email protected] on 11 Jan 2010 at 8:49

add a clean mode

Initial idea:  a --dirty command-line option which tells caliper to cut corners 
in order to get a result 
faster.

I'm not clear on what all the exact differences will end up being, but with 
this in mind, we are free to 
make sure that default "clean" behavior prioritizes getting the highest-quality 
results possible even 
at the expense of a little extra time.  We expect this to be run by continuous 
builds and large jobs 
that I kick off before going home for the night, as opposed to dirty mode which 
might be used for 
some quick validation while a programmer is trying various ideas.

Original issue reported on code.google.com by [email protected] on 21 Jan 2010 at 11:02

Permission to connect to the web service

(Are we allowed to say 'web service' without it meaning WSDL and that 
garbage?)

Requests that come into the web service are of one of the following kinds:
1. A user at a web browser, viewings results read-only
2. A user at a web browser, doing any write operations or accessing user 
preferences etc.
3. Caliper uploading results
4. Other applications retrieving XML results to do whatever with them

For mode 1, I think we need no authentication at all -- just a web page 
public to the eyes of the world.

For mode 2, the user must sign into their Google account. They can only 
remove (or otherwise modify?) data that is scoped to their own account.

For modes 3 and 4, the client request will have to pass along a token which 
the user would have had to generate using mode 2 at some earlier time. 
Probably it is enough for every user to be issued one and only one token 
(though be allowed to change it with the click of a button -- this is 
reminding me a lot of how codesite handles svn passwords).

We want to make sure that users have no incentive to share these tokens 
around. It only needs to be accessible by the "main" caliper command, not 
all subprocesses it invokes on other machines. We should tell users exactly 
what can happen if the token is not kept secret (mainly, if abuse occurs 
their account will be the one banned). I think most users would be 
naturally inclined to keep it a total secret right up until the moment 
where they realize something would be easier if they just spread it around 
-- so that's why we want to make sure they have no reason to do that.

Original issue reported on code.google.com by [email protected] on 8 Jan 2010 at 5:58

add running tests to dist build and corrected JUnit tests

Patch attached. Thanks

Original issue reported on code.google.com by [email protected] on 5 Jul 2010 at 3:39

Attachments:

testbed.diff

Establish the rules of security and visibility for the webapp

Here's what I'm thinking:

First of all, any feature that we ever implement, that would ever upload 
even small pieces of the user's source code or byte code, must never be on 
by default.  The user could only enable this feature intentionally and if 
at all possible with a warning of what they're doing.

Next, I think we should disavow all notion of privacy or security of your 
benchmark results.  Trying to protect access to these results is senseless. 
Users should assume that once they upload it, until they remove it, anyone 
might be able to find it.  Put more simply, all benchmark results on the 
webapp are to be considered public.

I do think that we should protect the ability to connect to the site in any 
way -- you have to either authenticate with your google account, or you 
have to include an owner key that you could have only obtained through the 
use of your google account.  Obviously caliper should never, ever have 
anything to do with your google password.

If we ever get to the point where one user is able to upload bytecode which 
another user would have the ability to download, that a line beyond which 
*huge* security concerns will come into play. For now we intend nothing of 
the sort.

So, I think that all we're left with is visibility rules to support a 
positive user experience; most users won't want to be swamped with seeing 
everybody else's stuff all the time.  Morever, I see no value in any kind 
of features that let a user just browse random other benchmarks present on 
the site.  Retrieval of benchmark data must be scoped by certain 
attributes. It can't just be scoped by "anything run on mac os".  But it 
can be scoped to an "owner id" (needs definition elsewhere), or it can be 
scoped to an exact literal benchmark name 
("com.google.common.collect.caliper.ImmutableSetBenchmark").

Decision: when I follow a link that essentially means "I want to see all 
results for that specific benchmark class name ever", will I see only ones 
under my own owner ID, or will I also see the runs that joe blow random 
user ran on his TRS-80 last week?  I like the latter, because (a) nothing 
is private (see above) and (b) I can always narrow the scope to my own id 
or ids if I want.

Original issue reported on code.google.com by [email protected] on 8 Jan 2010 at 5:44

Blocking: #170

Remote execution

Configure the list of environments to run the benchmark on as (hostname, 
user, jrePath) triplets. Run it on all environments, store results.

Original issue reported on code.google.com by [email protected] on 13 Jan 2010 at 5:30

JVM arguments

Users need to be able to configure JVM arguments, similar to other kinds of
variables -- from the command line or in the code.

The user should be able to divide the command line into arbitrary sections
that can vary independently.  Naively, we could recognize any variable
named "jvmargs*", take cartesian product, and then for each scenario just
concatenate all the resulting values together.

I don't know what this would look like in the code.  A single @JvmArguments
annotation with String[] value would not seem enough.

Original issue reported on code.google.com by [email protected] on 7 Jan 2010 at 10:14

Exclude GC from timing

In different situations, the user may either want the measured value to 
include time spent in GC, or they might want to report only measurements 
that were not interrupted by GC.  For caliper to support these two cases 
might take a bit of gymnastics...

For the amortize case, if the memory ceiling is too high and/or the timing 
interval too small, we might get a very unstable measurement.

For the exclude case, if the memory ceiling is too low and/or the timing 
interval too long, we might find it impossible to get any measurements that 
were never interrupted by GC.

This suggests that the framework would adaptively tweak the timing interval 
(and *maybe* even the Xmx??).

Needs more thought.

Original issue reported on code.google.com by [email protected] on 11 Jan 2010 at 7:54

Move IntelliJ plugin outside of trunk/

Otherwise I need to install an IntelliJ SDK in order to build regular Caliper. 
Thumbs down. As with the webapp backend, I don't really predict many end-
users will be interested in the plugin's source code.

But I do think they'll be interested in the command line app, and so we should 
keep that project simple and free from heavyweight dependencies.

Original issue reported on code.google.com by [email protected] on 17 Jan 2010 at 9:40

Create baseline against which to compare

Allow the user to define an additional method inside the benchmark that 
functions as a baseline against which to compare a benchmarked method. For 
example, suppose we have this benchmark:

{{{
void timeFoo(int reps) {
  for (int i = 0; i < reps; i++) {
    new Foo();
    // do stuff
  }
}
}}}

Then we could add a corresponding "baseline" method

{{{
void baselineFoo(int reps) {
  for (int i = 0; i < reps; i++) {
    new Foo();
  }
}
}}}

since we only want to measure the time taken for "// do stuff", not the 
construction time.

We could then measure each, and return the time as timeFor(timeFoo) - 
timeFor(baselineFoo).

Thoughts?

Original issue reported on code.google.com by [email protected] on 29 Jun 2010 at 12:47

User dashboard for web UI

A simple dashboard to show all the benchmarks you've run so you don't have to 
remember them would be nice.

Original issue reported on code.google.com by [email protected] on 26 Jun 2010 at 12:02

toshsan / caliper Goto Github PK

caliper's People

Contributors

Watchers

caliper's Issues

Recommend Projects

Recommend Topics

Recommend Org