toshsan / caliper Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/caliper
License: Apache License 2.0
Automatically exported from code.google.com/p/caliper
License: Apache License 2.0
(a) Create a Google spreadsheet representing this view I've configured
(b) Create a Google spreadsheet containing a raw data dump
How to do this with respect to authentication and such I don't know, but this
would be an awesome feature, possibly more useful than just the ascii thing.
Original issue reported on code.google.com by [email protected]
on 16 Jun 2010 at 5:04
if set, new benchmark results would automatically open in a new tab in the
browser.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:57
One result processor might wish to post results to BRRD or to Sponge, and
others might want to do god knows what.
One approach: do nothing, except spit out XML results to a filename
specified on the command line. Other tools can subsequently read that.
Possibly-nicer approach: let result processor classes be set in caliperrc or
wherever; then they could be passed real live Java objects instead of
spending a lot of effort re-parsing XML.
I see us just doing the first approach for now, but I don't know long-term..
just filing this for future reference I guess.
Original issue reported on code.google.com by [email protected]
on 19 Jan 2010 at 5:32
When we fork the VM, we pass only the classpath of the original process. This
works for most benchmarks, but cannot support benchmarks that load classes in
the bootclasspath.
Original issue reported on code.google.com by [email protected]
on 29 Jun 2010 at 8:41
Suppose I'm testing 4 x 3 different parameter values against 5 different
benchmarks (different time- methods in the same class) on 2 vms.
Currently, to get one measurement each, we'll run 4x3x5x2=120 vm
invocations. I think 10 would be enough -- vms times benchmarks, and let
each run handle all 4 x 3 parameter combinations for that (vm,benchmark)
pair.
The problem with the way it is today is that hotspot can optimize away
whole swaths of implementation code that doesn't happen to get exercised by
the *one* scenario we run it with. By warming up all 12 of these benchmark
instances, it should have to compile to something more closely resembling
real life (maybe). And with luck, we can avoid the expense of repeating
the warmup period 12 times over.
After warming up all the different scenarios and then starting to do trials
of one of them, I'm not sure if we need to worry about hotspot deciding to
*re*compile based on the new favorite scenario. If that happens, maybe it
makes sense for us to round-robin through the scenarios as we go......
we'll see.
I'm also not sure how concerned we need to be that the order the scenarios
are timed in can unduly affect the results. It could be that for each
"redundant" measurement we take, we vary up the order (e.g. we rotate it?)
in order to wash that out. Or maybe there's no problem with this; I dunno.
Original issue reported on code.google.com by [email protected]
on 22 Jan 2010 at 10:53
Whenever I have a view of results that I'd like other people to see too, I
could click something to create a gadget URL, which I can then easily embed in
all kinds of things.
http://code.google.com/apis/gadgets/
Original issue reported on code.google.com by [email protected]
on 9 Jul 2010 at 12:32
Once we are logging the full "play-by-play", it'd be helpful to turn on
-verbosegc and -
XX:PrintInlining (or whatever it's called) and options like that, and watch for
these messages in the
child processes stdout so they can be logged. The goal for the logging is to
really give you a clear
picture of "what's happening", and this is a key part of that.
Original issue reported on code.google.com by [email protected]
on 22 Jan 2010 at 12:43
A few ideas from issue 3,
Jesse:
Think of a means to show multiple results, either by reporting the standard
deviation, or ASCII box plots:
foo [--|-]----|
bar [-|--]
baz |-[--|--]
quux |---[---|--]--|
Elliott:
1. i wouldn't bother with ASCII-art graphs.
2. i might show scale factors like "2.3x" instead (though this might lead to me
wanting a way to @Annotate the method that represents my baseline).
3. i would always use the same time units. switching between ms and us is all
well and good, but it's a pain for this kind of use because it makes the output
less directly comparable.
4. i'd include a timestamp showing when the run was performed, and i'd include
a duration showing how long the benchmark run took.
5. i might include a hash of the test source, so i could easily distinguish
results i shouldn't necessarily be comparing.
As for copy & paste, i'll always want to copy & paste some text form, because
it's the only thing i trust to last and because it's an extra level of
indirection (which is fine for "show more
detail" but not for an overview in a check-in comment). the web's a nice
optional extra, but that's all.
Original issue reported on code.google.com by [email protected]
on 19 Jan 2010 at 7:27
Caliper should never throw a stack trace out to the user's console unless
it originated from the user's code. We must catch anything that goes wrong
in our code and communicate it properly.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:19
Users need to understand if one variant of their benchmark allocates more
memory than another. I believe we cannot assume that simply letting GC happen
during the timing loop will properly account for this cost.
We would almost certainly use this:
http://code.google.com/p/java-allocation-instrumenter/
And we'd do it as a separate test, not while actual benchmarking is happening.
We may or may not want to try to conceive this as a "pluggable measurer" as
described in http://code.google.com/p/caliper/issues/detail?id=4 ... what
matters is that we offer the feature somehow.
Original issue reported on code.google.com by [email protected]
on 8 Jul 2010 at 4:02
From Robert Henry:
I run caliper for the first time on a loop that runs for 12 seconds.
OK, so I'm ambitious, and hardly "micro" benchmark. I get:
An exception was thrown from the benchmark code.
com.google.caliper.ConfigurationException: Runtime 1.2537863806E10
out of range
at com.google.caliper.Caliper.warmUp(Caliper.java:71)
at com.google.caliper.InProcessRunner.run(InProcessRunner.java:54)
at com.google.caliper.InProcessRunner.main(InProcessRunner.java:68)
Original issue reported on code.google.com by [email protected]
on 30 Jun 2010 at 4:14
Patch attached.
I do see an exception when running:
Exception in thread "main"
com.google.caliper.UserException$ExceptionFromUserCodeException
at com.google.caliper.Runner.runOutOfProcess(Runner.java:161)
at com.google.caliper.Runner.run(Runner.java:46)
at tutorial.Tutorial1.main(Tutorial1.java:18)
Caused by: com.google.caliper.ConfigurationException: size has no values
at com.google.caliper.ScenarioSelection.prepareParameters(ScenarioSelection.java:132)
at com.google.caliper.ScenarioSelection.select(ScenarioSelection.java:69)
at com.google.caliper.Runner.runOutOfProcess(Runner.java:148)
... 2 more
Original issue reported on code.google.com by [email protected]
on 5 Jul 2010 at 4:35
Attachments:
there should be a "copy ASCII art to clipboard" option in the web UI.
Original issue reported on code.google.com by [email protected]
on 7 Jun 2010 at 7:21
Here's how we handle multiple implementations today:
http://code.google.com/p/caliper/source/browse/trunk/src/examples/SetContai
nsBenchmark.java?spec=svn91&r=91#41
@Param private Impl impl;
public enum Impl {
Hash {
@Override Set<Element> create(Collection<Element> contents) {
return new HashSet<Element>(contents);
}
},
LinkedHash {
@Override Set<Element> create(Collection<Element> contents) {
return new LinkedHashSet<Element>(contents);
}
},
. . .
For starters, it's cumbersome. Also, according to the issue I'm about to
file next :), it may make sense for us to run all the various parameter
combinations for a single implementation in just one VM invocation, but
this would not make sense for implementation.
What if, instead, we allow for this:
public void setUpFoo() { ... }
public int timeFoo() { ... }
public void setUpBar() { ... }
public int timeBar() { ... }
In fact, sometimes timeFoo() and timeBar() would be identical, so we could
even support
public void setUpFoo() { ... }
public void setUpBar() { ... }
public int time() { ... }
This might get confusing, but I think the behavior could be fairly simply
defined in terms of the method names. The benchmarks to run are the union
of the names that appear after "setUp" and those that appear after "time";
for each, run "setUpName" if it exists, else "setUp"; then run "timeName"
if it exists, else "time".
Original issue reported on code.google.com by [email protected]
on 22 Jan 2010 at 10:45
run 1:
benchmark name us logarithmic runtime
_Charset_forName UTF-16 1.43 ||||||||||||||||
_Charset_forName UTF-8 1.44 ||||||||||||||||
_Charset_forName UTF8 220.73 XXXXXXXXXXXXXXXXX|||||||||||
_Charset_forName ISO-8859-1 1.59 |||||||||||||||||
_Charset_forName 8859_1 1.42 ||||||||||||||||
_Charset_forName ISO-8859-2 1.58 |||||||||||||||||
_Charset_forName US-ASCII 1.52 |||||||||||||||||
_Charset_forName ASCII 1.40 ||||||||||||||||
_String_getBytes UTF-16 123.59 XXXXXXXXX||||||||||||||||||
_String_getBytes UTF-8 89.68 XXXXXXX|||||||||||||||||||
_String_getBytes UTF8 379.60 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
_String_getBytes ISO-8859-1 98.77 XXXXXXX|||||||||||||||||||
_String_getBytes 8859_1 107.22 XXXXXXXX|||||||||||||||||||
_String_getBytes ISO-8859-2 104.41 XXXXXXXX||||||||||||||||||
_String_getBytes US-ASCII 99.28 XXXXXXX|||||||||||||||||||
_String_getBytes ASCII 97.75 XXXXXXX|||||||||||||||||||
_new_String UTF-16 92.62 XXXXXXX|||||||||||||||||||
_new_String UTF-8 7.60 ||||||||||||||||||||
_new_String UTF8 7.59 ||||||||||||||||||||
_new_String ISO-8859-1 5.12 |||||||||||||||||||
_new_String 8859_1 97.34 XXXXXXX|||||||||||||||||||
_new_String ISO-8859-2 98.80 XXXXXXX|||||||||||||||||||
_new_String US-ASCII 88.92 XXXXXXX|||||||||||||||||||
_new_String ASCII 94.12 XXXXXXX|||||||||||||||||||
run 2 (with a small change so Charset.forName caches non-alias, non-canonical
names like "UTF8" too):
benchmark name ns logarithmic runtime
_Charset_forName UTF-16 1409 ||||||||||||||||||
_Charset_forName UTF-8 1443 ||||||||||||||||||
_Charset_forName UTF8 894 |||||||||||||||||
_Charset_forName ISO-8859-1 1546 ||||||||||||||||||
_Charset_forName 8859_1 896 |||||||||||||||||
_Charset_forName ISO-8859-2 1541 ||||||||||||||||||
_Charset_forName US-ASCII 1529 ||||||||||||||||||
_Charset_forName ASCII 1368 ||||||||||||||||||
_String_getBytes UTF-16 123319 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
_String_getBytes UTF-8 91527 XXXXXXXXXXXXXXXXXXXXXX|||||||
_String_getBytes UTF8 93828 XXXXXXXXXXXXXXXXXXXXXX|||||||
_String_getBytes ISO-8859-1 96223 XXXXXXXXXXXXXXXXXXXXXXX||||||
_String_getBytes 8859_1 97383 XXXXXXXXXXXXXXXXXXXXXXX||||||
_String_getBytes ISO-8859-2 97833 XXXXXXXXXXXXXXXXXXXXXXX||||||
_String_getBytes US-ASCII 97084 XXXXXXXXXXXXXXXXXXXXXXX||||||
_String_getBytes ASCII 98335 XXXXXXXXXXXXXXXXXXXXXXX||||||
_new_String UTF-16 90155 XXXXXXXXXXXXXXXXXXXXX||||||||
_new_String UTF-8 7595 X|||||||||||||||||||||
_new_String UTF8 7532 X|||||||||||||||||||||
_new_String ISO-8859-1 5083 X||||||||||||||||||||
_new_String 8859_1 92862 XXXXXXXXXXXXXXXXXXXXXX|||||||
_new_String ISO-8859-2 97338 XXXXXXXXXXXXXXXXXXXXXXX||||||
_new_String US-ASCII 93932 XXXXXXXXXXXXXXXXXXXXXX|||||||
_new_String ASCII 95531 XXXXXXXXXXXXXXXXXXXXXXX||||||
and now i have to use my brain to compare things because one set's all
microseconds and the other's all nanoseconds.
Original issue reported on code.google.com by [email protected]
on 7 Jun 2010 at 9:38
Multiple measurements per scenario (same measurer). For purposes of console
display, they can be averaged (?), but the xml result sent to the server
should contain all measurements; it's the job of the visualization package
to decide how to summarize all the data.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:32
Unless running in dirty mode we should never show the user the result from
just a single measurement in a single VM run. This is likely to be
misleading. I'd prefer we default to five runs. Over time we can take a
statistical look at how much variance is really happening in these runs.
There are two ways to handle those five independent runs in the web UI:
(a) average them together and show one result, with those funny error bar
things.
(b) show them as five different runs, but give the user some ability to
combine runs into one with those funny error bar things.
I don't think there's much value in showing those error bar things (box
plots?) just based on the variations within a single vm run, like we
currently do.
I'm not sure I want to give the user the ability to arbitrarily pick runs
to combine into an uber-run; it would just be too easy to cheat and cherry-
pick the favorable ones.
Original issue reported on code.google.com by [email protected]
on 7 Jun 2010 at 5:53
I am working behind a gateway and when uploading test result during caliper
test, following error will be outputted:
java.lang.RuntimeException: java.net.UnknownHostException:
microbenchmarks.appspot.com at
com.google.caliper.Runner.postResults(Runner.java:83)
I seems I need setup a network proxy for caliper, but how to setup it? In
caliper wiki and doc, I can not find a answer for it.
Can you tell me how to do it?
Original issue reported on code.google.com by [email protected]
on 1 Jul 2010 at 8:38
I think that are setting stdout and stderr to a null output stream. We might
want to capture user output somewhere instead. As well, the user might have
specified JVM flags that produce stdout, and that will have to go somewhere
somehow. This is vague, but I had to get it captured here somehow.
Original issue reported on code.google.com by [email protected]
on 26 Jun 2010 at 12:08
I'd like to make it quite easy for developers to benchmark code on Android
devices.
One option is to use DalvikRunner. Currently we Dalvik folks have running
Caliper by plugging it into this framework. This is an internal tool owned by
the Dalvik team that knows how to do the following:
1. compile .java files (for a JVM or Dalvik VM)
2. build classpaths (including coping code to a phone when necessary)
3. fork an executable on an arbitrary VM (desktop or device)
4. inspect the result, and compare it with an expectations file
5. do the above in aggregate for large suites of code
6. report the results for consumption by Hudson
The nice thing about DalvikRunner is that it's easy to drive. It makes Java
programming kinda like scripting because it merges the compile and run
steps:
dalvikrunner com/foo/Foo.java
The drawback of this approach is that DalvikRunner is a logically separate
project, which means folks interested in Caliper+Android would need both
tools.
Original issue reported on code.google.com by limpbizkit
on 14 Jan 2010 at 3:02
Our current elapsed time measurer should be the default measurer only,
which the user could override at the command line (or caliperrc?). It may
suffice to do just one measurer for a run, but then again, it also might be
simpler overall to just treat it like every other variable -- specify as
many as you want and caliper will try them all.
Alternate measurers could be used for
- taking other kinds of measurements
- taking the same kind of measurement but in a different way
Either way, it's essentially the same deal. Every measurer needs a unique
name that will be reported. The measurement made might be a simple
"quantity + unit" like the elapsed-time measurer makes, or it could be
something more...
This issue is resolved once we have *some* alternate measurer to use, say
CPU time.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:43
I want to be able to ignore certain columns in the web UI. For example, I ran
the same benchmark on the Sun VM and on the Dalvik VM, and wanted to see them
side by side to compare the general shapes of the charts. This could be done by
ignoring the "vm" column.
Original issue reported on code.google.com by [email protected]
on 25 Jun 2010 at 10:33
According to Chuck Rasbold, Caliper should definitely be specifying -Xbatch to
the subprocess so that JIT compilation will happen in the user thread instead
of in some background thread in parallel with timing. This also means that
when we start logging compilation events (issue 30), we'll actually know when
compilation happened, not just when it started.
Original issue reported on code.google.com by [email protected]
on 9 Jun 2010 at 9:11
... and report this in the result XML.
Quite possibly, this version can just be "dev" when working with your own
local build, but when any official release is made, a real version number
needs to be set somehow. Probably not too hard to make ant do this.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 11:06
new benchmark template
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:56
jesse's convinced me that i shouldn't be doing what i was trying to do the
other day (use "reps" as a convenient source of not-exactly-randomness),
but i think the exception could be improved. two suggestions:
1. show the actual value.
2. differentiate between the two cases, and maybe explain. presumably
sub-linear means "optimized out" but supra-linear means "you've made the
cost of an individual rep proportional to the total number of reps".
i can easily make these changes if you like, but i thought i'd check you
agree they make sense, since i've already been caught doing the wrong thing
here (and wasn't immediately convinced my benchmark was unreasonable,
though i now agree it was a bad idea).
Original issue reported on code.google.com by [email protected]
on 12 Jan 2010 at 7:11
I should have the ability to watch the full "play-by-play" of everything
caliper is up to. My
preferred approach would be to write this information to a file and always keep
console output
uber-neat and clean. I'm thinking JUnit-style dots -- seriously.
The easiest thing to do would be to read a logfile setting from .caliperrc.
And not worry about
rotating it etc. for now.
I don't see any benefit to using java.util.logging for this; and we might still
want to use j.u.l for
regular debug-style logging that isn't really user-oriented.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2010 at 10:59
worse than issue 2, if the user code throws an exception, caliper *really*
needs to show me that.
i had something like:
public void timeDns(int reps) throws Exception {
InetAddress.getByName("unknown.host.exception.mtv.corp.google.com");
}
and got this:
FAIL examples.DnsBenchmark (EXEC_FAILED)
Executing examples.DnsBenchmark
Jan 15, 2010 10:22:41 PM com.ibm.icu4jni.util.Resources
createTimeZoneNamesFor
INFO: Loaded time zone names for en_US in 229ms.
java.lang.Throwable: stack dump
at java.lang.Thread.dumpStack(Thread.java:612)
at com.ibm.icu4jni.util.Resources.createTimeZoneNamesFor(Resources.java:250)
at com.ibm.icu4jni.util.Resources.access$200(Resources.java:34)
at
com.ibm.icu4jni.util.Resources$DefaultTimeZones.<clinit>(Resources.java:200)
at com.ibm.icu4jni.util.Resources.getDisplayTimeZone(Resources.java:163)
at java.util.TimeZone.getDisplayName(TimeZone.java:277)
at java.util.TimeZone.getDisplayName(TimeZone.java:250)
at java.util.Date.toString(Date.java:722)
at java.util.Properties.store(Properties.java:549)
at com.google.caliper.Runner.getExecutedByUuid(Runner.java:63)
at com.google.caliper.Runner.runOutOfProcess(Runner.java:170)
at com.google.caliper.Runner.run(Runner.java:75)
at com.google.caliper.Runner.main(Runner.java:214)
at dalvik.runner.CaliperRunner.test(CaliperRunner.java:28)
at dalvik.runner.TestRunner.run(TestRunner.java:76)
at dalvik.runner.CaliperRunner.main(CaliperRunner.java:36)
at dalvik.system.NativeStart.main(Native Method)
0% Scenario{vm=dalvikvm, benchmark=Dns} Jan
15, 2010 10:22:41 PM java.io.BufferedReader <init>
INFO: Default buffer size used in BufferedReader constructor. It would be
better to be explicit if an 8k-char buffer is required.
An exception was thrown from the benchmark code.
java.lang.NullPointerException
at
org.apache.harmony.luni.util.FloatingPointParser.parseDouble(FloatingPointParser
.java:263)
at java.lang.Double.parseDouble(Double.java:285)
at java.lang.Double.valueOf(Double.java:324)
at com.google.caliper.Runner.executeForked(Runner.java:137)
at com.google.caliper.Runner.runOutOfProcess(Runner.java:179)
at com.google.caliper.Runner.run(Runner.java:75)
at com.google.caliper.Runner.main(Runner.java:214)
at dalvik.runner.CaliperRunner.test(CaliperRunner.java:28)
at dalvik.runner.TestRunner.run(TestRunner.java:76)
at dalvik.runner.CaliperRunner.main(CaliperRunner.java:36)
Original issue reported on code.google.com by [email protected]
on 15 Jan 2010 at 10:41
compute a hash -- somehow -- that will change whenever the code that's
under test changes. haven't thought about this much. it could perhaps
only hash each class that actually gets loaded (then sum all these hashes,
so that loading order doesn't affect things?).
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 11:04
When a benchmark method's loop gets optimized out or the the warmup time is
long enough relative to the time for each execution, the reps int can overflow
and then become 0. The loop will then continue for a very long time, adding
very little to the elapsedNanos each time. Caliper appears to hang.
I've attached a patch with a test that should expose the issue, both for the
optimized out case and the long warmup time/fast execution case. I've also
attached a possible fix for it.
Original issue reported on code.google.com by [email protected]
on 29 Jun 2010 at 4:46
Attachments:
bring over code from labs.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:52
On ctrl-shift-f10 ("run"), detect that this is a caliper class, and run it
via caliper.
Should bring up a dialog much like the run-application dialog, and just
show raw console output... not much fancier than that for now; in the
future we could make it more user-friendly in lots of ways (that should all
be filed separately).
Should need just one configuration setting, caliper_home.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:56
Perhaps it should be opt-out, with a very simple one liner to enable online
posting. When a user
without a .caliperrc runs a benchmark, perhaps we could print an opt-in-please
advertisement
after the results:
Warning: results not saved. To save results online automatically, run this once:
"caliper --always-save-results-online"
We could also direct them to a URL like
http://microbenchmarks.appspot.com/signin which would
allow the local workstation to associate itself with the user's Google account.
Original issue reported on code.google.com by limpbizkit
on 8 Jan 2010 at 7:15
Should we be able to report that the machine seemed otherwise occupied
during the test run, so the test results could be viewed as more "dirty"
than others?
Original issue reported on code.google.com by [email protected]
on 11 Jan 2010 at 8:32
When logarithmic making a bar graph, choose minimum and maximum bar lengths
(based on screen size only), then normalize the data so that the min and max
values will be those lengths.
Original issue reported on code.google.com by [email protected]
on 11 Jan 2010 at 9:05
see summary.
This would be a pretty sexy illustration of what caliper can do.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:47
Sure would rather do
-Dlist2Size=1,10,100
than
-Dlist2Size=1 -Dlist2Size=10 -Dlist2Size=100
but it's unclear how exactly to do this so as not to conflict with
parameter values that might actually contain literal commas?
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:53
If I have a rewrite to a method of, say, ArrayList, I'd like it to be easy
to compare its performance with that of the installed JDK.
In fact, I might make a low-level tweak to the JDK and want to run all
kinds of existing benchmarks against that to look for any improvements or
degradations that result.
Roughly, the benchmark could point at the location of the alternate JDK
implementation class(es), and we'd have to repackage those classes in
jarjar-like fashion then prepend them to the bootclasspath.
I know this seems to be crossing a line in terms of the simplicity vs.
magic quotient of caliper, but it feels useful to me, not just for the JDK
itself but other third-party libraries (in which case it wouldn't be
bootclasspath, just classpath).
One complication is that the JDK's implementation classes will continue to
evolve while my local tweaked copy will just remain inert...
Original issue reported on code.google.com by [email protected]
on 11 Jan 2010 at 8:31
Patch attached. Thanks
Original issue reported on code.google.com by [email protected]
on 5 Jul 2010 at 4:49
Attachments:
Also feature this prominently on the web site.
A surgeon general's warning: this is only microbenchmarking, and
performance entails much bigger issues than just this. Microbenchmarks lie.
Here are common gotchas to avoid. Etc.
We get Josh to help us with it.
Original issue reported on code.google.com by [email protected]
on 11 Jan 2010 at 8:49
Initial idea: a --dirty command-line option which tells caliper to cut corners
in order to get a result
faster.
I'm not clear on what all the exact differences will end up being, but with
this in mind, we are free to
make sure that default "clean" behavior prioritizes getting the highest-quality
results possible even
at the expense of a little extra time. We expect this to be run by continuous
builds and large jobs
that I kick off before going home for the night, as opposed to dirty mode which
might be used for
some quick validation while a programmer is trying various ideas.
Original issue reported on code.google.com by [email protected]
on 21 Jan 2010 at 11:02
(Are we allowed to say 'web service' without it meaning WSDL and that
garbage?)
Requests that come into the web service are of one of the following kinds:
1. A user at a web browser, viewings results read-only
2. A user at a web browser, doing any write operations or accessing user
preferences etc.
3. Caliper uploading results
4. Other applications retrieving XML results to do whatever with them
For mode 1, I think we need no authentication at all -- just a web page
public to the eyes of the world.
For mode 2, the user must sign into their Google account. They can only
remove (or otherwise modify?) data that is scoped to their own account.
For modes 3 and 4, the client request will have to pass along a token which
the user would have had to generate using mode 2 at some earlier time.
Probably it is enough for every user to be issued one and only one token
(though be allowed to change it with the click of a button -- this is
reminding me a lot of how codesite handles svn passwords).
We want to make sure that users have no incentive to share these tokens
around. It only needs to be accessible by the "main" caliper command, not
all subprocesses it invokes on other machines. We should tell users exactly
what can happen if the token is not kept secret (mainly, if abuse occurs
their account will be the one banned). I think most users would be
naturally inclined to keep it a total secret right up until the moment
where they realize something would be easier if they just spread it around
-- so that's why we want to make sure they have no reason to do that.
Original issue reported on code.google.com by [email protected]
on 8 Jan 2010 at 5:58
Patch attached. Thanks
Original issue reported on code.google.com by [email protected]
on 5 Jul 2010 at 3:39
Attachments:
Here's what I'm thinking:
First of all, any feature that we ever implement, that would ever upload
even small pieces of the user's source code or byte code, must never be on
by default. The user could only enable this feature intentionally and if
at all possible with a warning of what they're doing.
Next, I think we should disavow all notion of privacy or security of your
benchmark results. Trying to protect access to these results is senseless.
Users should assume that once they upload it, until they remove it, anyone
might be able to find it. Put more simply, all benchmark results on the
webapp are to be considered public.
I do think that we should protect the ability to connect to the site in any
way -- you have to either authenticate with your google account, or you
have to include an owner key that you could have only obtained through the
use of your google account. Obviously caliper should never, ever have
anything to do with your google password.
If we ever get to the point where one user is able to upload bytecode which
another user would have the ability to download, that a line beyond which
*huge* security concerns will come into play. For now we intend nothing of
the sort.
So, I think that all we're left with is visibility rules to support a
positive user experience; most users won't want to be swamped with seeing
everybody else's stuff all the time. Morever, I see no value in any kind
of features that let a user just browse random other benchmarks present on
the site. Retrieval of benchmark data must be scoped by certain
attributes. It can't just be scoped by "anything run on mac os". But it
can be scoped to an "owner id" (needs definition elsewhere), or it can be
scoped to an exact literal benchmark name
("com.google.common.collect.caliper.ImmutableSetBenchmark").
Decision: when I follow a link that essentially means "I want to see all
results for that specific benchmark class name ever", will I see only ones
under my own owner ID, or will I also see the runs that joe blow random
user ran on his TRS-80 last week? I like the latter, because (a) nothing
is private (see above) and (b) I can always narrow the scope to my own id
or ids if I want.
Original issue reported on code.google.com by [email protected]
on 8 Jan 2010 at 5:44
Configure the list of environments to run the benchmark on as (hostname,
user, jrePath) triplets. Run it on all environments, store results.
Original issue reported on code.google.com by [email protected]
on 13 Jan 2010 at 5:30
Users need to be able to configure JVM arguments, similar to other kinds of
variables -- from the command line or in the code.
The user should be able to divide the command line into arbitrary sections
that can vary independently. Naively, we could recognize any variable
named "jvmargs*", take cartesian product, and then for each scenario just
concatenate all the resulting values together.
I don't know what this would look like in the code. A single @JvmArguments
annotation with String[] value would not seem enough.
Original issue reported on code.google.com by [email protected]
on 7 Jan 2010 at 10:14
In different situations, the user may either want the measured value to
include time spent in GC, or they might want to report only measurements
that were not interrupted by GC. For caliper to support these two cases
might take a bit of gymnastics...
For the amortize case, if the memory ceiling is too high and/or the timing
interval too small, we might get a very unstable measurement.
For the exclude case, if the memory ceiling is too low and/or the timing
interval too long, we might find it impossible to get any measurements that
were never interrupted by GC.
This suggests that the framework would adaptively tweak the timing interval
(and *maybe* even the Xmx??).
Needs more thought.
Original issue reported on code.google.com by [email protected]
on 11 Jan 2010 at 7:54
Otherwise I need to install an IntelliJ SDK in order to build regular Caliper.
Thumbs down. As with the webapp backend, I don't really predict many end-
users will be interested in the plugin's source code.
But I do think they'll be interested in the command line app, and so we should
keep that project simple and free from heavyweight dependencies.
Original issue reported on code.google.com by [email protected]
on 17 Jan 2010 at 9:40
Allow the user to define an additional method inside the benchmark that
functions as a baseline against which to compare a benchmarked method. For
example, suppose we have this benchmark:
{{{
void timeFoo(int reps) {
for (int i = 0; i < reps; i++) {
new Foo();
// do stuff
}
}
}}}
Then we could add a corresponding "baseline" method
{{{
void baselineFoo(int reps) {
for (int i = 0; i < reps; i++) {
new Foo();
}
}
}}}
since we only want to measure the time taken for "// do stuff", not the
construction time.
We could then measure each, and return the time as timeFor(timeFoo) -
timeFor(baselineFoo).
Thoughts?
Original issue reported on code.google.com by [email protected]
on 29 Jun 2010 at 12:47
A simple dashboard to show all the benchmarks you've run so you don't have to
remember them would be nice.
Original issue reported on code.google.com by [email protected]
on 26 Jun 2010 at 12:02
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.