open-mpi / mtt Goto Github PK
View Code? Open in Web Editor NEWMPI Testing Tool
Home Page: https://open-mpi.github.io/mtt
License: Other
MPI Testing Tool
Home Page: https://open-mpi.github.io/mtt
License: Other
Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana University Research and Technology Corporation. All rights reserved. Copyright (c) 2004-2005 The University of Tennessee and The University of Tennessee Research Foundation. All rights reserved. Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, University of Stuttgart. All rights reserved. Copyright (c) 2004-2005 The Regents of the University of California. All rights reserved. Copyright (c) 2006-2007 Cisco Systems, Inc. All rights reserved. Copyright (c) 2006-2007 Sun Microsystems, Inc. All rights reserved. Copyright (c) 2018 IBM Corporation. All rights reserved. Copyright (c) 2018 Intel, Inc. All rights reserved. $COPYRIGHT$ Additional copyrights may follow This software includes code derived from software that is copyright (c) 1996 Randal L. Schwartz, distributed under the Artistic License. See the copyright and license notice in "mtt-relay" for details. $HEADER$ What is this software? ---------------------- This is the MPI Testing Tool (MTT) software package. It is a standalone tool for testing the correctness and performance of arbitrary MPI implementations. The MTT is an attempt to create a single tool to download and build a variety of different MPI implementations, and then compile and run any number of test suites against each of the MPI installations, storing the results in a back-end database that then becomes available for historical data mining. The test suites can be for both correctness and performance analysis (e.g., tests such as nightly snapshot compile results as well as the latency of MPI_SEND can be historically archived with this tool). The MTT provides the glue to obtain and install MPI installations (e.g., download and compile/build source distributions such as nightly snapshots, or copy/install binary distributions, or utilize an already-existing MPI installation), and then obtain, compile, and run the tests. Results of each phase are submitted to a centralized PostgresSQL database via HTTP/HTTPS. Simply put, MTT is a common infrastructure that can be distributed to many different sites in order to run a common set of tests against a group of MPI implementations that all feed into a common PostgresSQL database of results. The MTT client is written in Python; the MTT server side is written almost entirely in PHP and relies on a back-end PostgresSQL database. The main (loose) requirements that we had for the MTT are: - Use a back-end database / archival system. - Ability to obtain arbitrary MPI implementations from a variety of sources (web/FTP download, filesystem copy, Subversion export, etc.). - Ability to install the obtained MPI implementations, regardless of whether they are source or binary distributions. For source distributions, include the ability to compile each MPI implementation in a variety of different ways (e.g., with different compilers and/or compile flags). - Ability to obtain arbitrary test suites from a variety of sources (web/FTP download, filesystem copy, Subversion export, etc.). - Ability to build each of the obtained test suites against each of the MPI implementation installations (e.g., for source MPI distributions, there may be more than one installation). - Ability to run each of the built test suites in a variety of different ways (e.g, with a set of different run-time options). - Ability to record the output from each of the steps above and submit securely them to a centralized database. - Ability to run the entire test process in a completely automated fashion (e.g., via cron). - Ability to run each of the steps above on physically different machines. For example, some sites may require running the obtain/download steps on machines that have general internet access, running the compile/install steps on dedicated compile servers, running the MPI tests on dedicated parallel resources, and then running the final submit steps on machines that have general internet access. - Use a component-based system (i.e., plugins) for the above steps so that extending the system to download (for example) a new MPI implementation is simply a matter of writing a new module with a well-defined interface. How to cite this software ------------------------- Hursey J., Mallove E., Squyres J.M., Lumsdaine A. (2007) An Extensible Framework for Distributed Testing of MPI Implementations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2007. Lecture Notes in Computer Science, vol 4757. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75416-9_15 Overview -------- The MTT divides its execution into six phases: 1. MPI get: obtain MPI software package(s) (e.g., download, copy) 2. MPI install: install the MPI software package(s) obtained in phase 1. This may involve a binary installation or a build from source. 3. Test get: obtain MPI test(s) 4. Test build: build the test(s) against all MPI installations installed in phase 2. 5. Test run: run all the tests build in phase 4. 6. Report: report the results of phases 2, 4, and 5. The phases are divided in order to allow a multiplicative effect. For example, each MPI package obtained in phase 1 may be installed in multiple different ways in phase 2. Tests that are built in phase 4 may be run multiple different ways in phase 5. And so on. This multiplicative effect allows testing many different code paths through MPI even with a small number of actual tests. For example, the Open MPI Project uses the MTT for nightly regression testing. Even with only several hundred MPI test source codes, Open MPI is tested against a variety of different compilers, networks, number of processes, and other run-time tunable options. A typical night of testing yields around 150,000 Open MPI tests. Quick start ----------- Testers run the MTT client on their systems to do all the work. A configuration file is used to specify which MPI implementations to use and which tests to run. The Open MPI Project uses MTT for nightly regression testing. A sample Perl client configuration file is included in samples/perl/ompi-core-template.ini. This template will require customization for each site's specific requirements. It is also suitable as an example for organizations outside of the Open MPI Project. Open MPI members should visit the MTT wiki for instructions on how to setup for nightly regression testing: https://github.com/open-mpi/mtt/wiki/OMPITesting Note that the INI file can be used to specify web proxies if necessary. See comments in the ompi-core-template.ini file for details. Running the MTT Perl client --------------------------- Having run the MTT client across several organizations within the Open MPI Project for quite a while, we have learned that even with common goals (such as Open MPI nightly regression testing), MTT tends to get used quite differently at each site where it is used. The command-line client was designed to allow a high degree of flexibility for site-specific requirements. The MTT client has many command line options; see the following for a full list: $ client/mtt --help Some sites add an upper layer of logic/scripting above the invocation of the MTT client. For example, some sites run the MTT on SLURM-maintained clusters. A variety of compilers are tested, yielding multiple unique (MPI get, MPI install, Test get, Test build) tuples. Each tuple is run in its own 1-node SLURM allocation, allowing the many installations/builds to run in parallel. When the install/build tuple has completed, more SLURM jobs are queued for each desired number of nodes/processes to test. These jobs all execute in parallel (pending resource availability) in order to achieve maximum utilization of the testing cluster. Other scenarios are also possible; the above is simply one way to use the MTT. Current status -------------- This tool was initially developed by the Open MPI team for nightly and periodic compile and regression testing. However, enough other parties have expressed [significant] interest that we have open-sourced the tool and are eagerly accepting input from others. Indeed, having a common tool to help objectively evaluate MPI implementations may be an enormous help to the High Performance Computing (HPC) community at large. We have no illusions of MTT becoming the be-all/end-all tool for testing software -- we do want to keep it somewhat focused on the needs and requires of testing MPI implementations. As such, the usage flow is somewhat structured towards that bias. It should be noted that the software has been mostly developed internally to the Open MPI project and will likely experience some growing pains while adjusting to a larger community. License ------- Because we want MTT to be a valuable resource to the entire HPC community, the MTT uses the new BSD license -- see the LICENSE file in the MTT distribution for details. Get involved ------------ We *want* your feedback. We *want* you to get involved. The main web site for the MTT is: http://www.open-mpi.org/projects/mtt/ User-level questions and comments should generally be sent to the user's mailing list ([email protected]). Because of spam, only subscribers are allowed to post to this list (ensure that you subscribe with and post from *exactly* the same e-mail address -- [email protected] is considered different than [email protected]!). Visit this page to subscribe to the user's list: https://lists.open-mpi.org/mailman/listinfo/mtt-users Developer-level bug reports, questions, and comments should generally be sent to the developer's mailing list ([email protected]). Please do not post the same question to both lists. As with the user's list, only subscribers are allowed to post to the developer's list. Visit the following web page to subscribe: https://lists.open-mpi.org/mailman/listinfo/mtt-devel http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel When submitting bug reports to either list, be sure to include as much extra information as possible. Thanks for your time.
The killall in the after_each_exec of the MPI Details section only runs on the node where mpirun was invoked (duh). It does not spread to all the other nodes where MPI was running.
Need to figure out how to make that go across all nodes.
Title says all. HLRS needs this for their clusters.
As a point of cleanup, can you remove all entries for "iu-odin"? These were a bunch of getting the environment setup correctly for MTT runs, most of which failed.
Keep the entries for "Odin at IU - Testing" for the moment, as that is what the current version of the script will now report.
This is nothing major, just a bit of cleanup I wanted to note.
The back-end SVN "get" functionality currently always thinks that it has found new sources, even when it has not, in fact, obtained anything new.
This is repeatable by specifying a Test Get with an SVN checkout -- it will get a new version every time even if the SVN repository with the test has not changed at all.
Send a mail around providing executive summary of the previous day/night/24 hours/whatever failures (failed compiles, failed test runs, etc.). This mail should have some simple requirements:
This is a first cut at the requirements. Feel free to add/delete/edit.
Per Josh's comments on the MTT users list, if LD_LIBRARY_PATH is not initially set to ''something'' (even if it's blank), MTT runs of MPI tests will hang. Josh confirmed this by not having LD_LIBRARY_PATH set and seeing the hanging behavior. Then he set it to "" and the hanging behavior went away.
The relevant code in MTT is in lib/MTT/Test/Run.pm:
{{{
if ($mpi_install->{libdir}) {
if (exists($ENV{LD_LIBRARY_PATH})) {
$ENV{LD_LIBRARY_PATH} = "$mpi_install->{libdir}:" .
$ENV{LD_LIBRARY_PATH};
} else {
$ENV{LD_LIBRARY_PATH} = $mpi_install->{libdir};
}
}
}}}
So it ''looks'' like this should be handled correctly (but apparently is not). Will try to replicate this myself and dig into what is going on...
Now that we're using proper HTTP/Basic authentication to protect submitting MTT results, the HTTP username (and IP address?) should be stored with an incoming set of data in the database.
Make a trivially easy way for developers to run MTT against their workspaces / local installations. The biggest usage of this will likely be having developers be able to run a small set of "sanity" tests before doing a putback.
The current MPI details scheme might not be flexible enough for all scenarios. Here's one scenario that it does not do well. It's not an urgent problem, but it might be good to make MPI details be fleible enough to handle this kind of scenario:
For example, the following MPI details definition, when spanning multiple nodes, will not work because multi-node jobs will be launched with "--mca btl self,sm":
{{{
[MPI Details: Open MPI]
exec = mpirun -np &test_np() --prefix &test_prefix() --mca btl self,@btl@ &test_executable() &test_argv()
btl = &enumerate("tcp", "sm")
}}}
Instead, it seems like we want to make the value of @btl@ be a bit more conditional -- in this case, we want it to be dependent upon how many nodes (''not'' the value of np!) the job will run across.
When using .htaccess to protect the submit directory, the MTT client fails to submit properly, even though it seems to have the correct HTTP username/password in the ini file. The MTTDatabase reporter outputs messages similar to the following:
{{{
Failed to report to MTTDatabase: 401 Authorization Required
<title>401 Authorization Required</title>Authorization Required
This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.
Apache/2.0.52 (Red Hat) Server at www.open-mpi.org Port 443 }}}
By looking the mtt output I saw that the cleanup script cannot be executed (see below). I assume that a "real" command is required and not a shell script.
{{{
Timeout: 1 - 1156432342 (vs. now: 1156432332)
OUT:Can't execute command:
OUT:# This scriptlet ensures that all remnants of the prior mpirun are
OUT:# gone. It kills all orteds running under this user and whacks any
OUT:# session directories that it finds. Hence, do not expect to be able
OUT:# to run on the same machine/user as a user who is running MTT tests.
OUT:
OUT:# This scriptlet is not fully tested yet. Needs testing on: Linux,
OUT:# OSX, Solaris.
OUT:
OUT:who=whoami
OUT:which killall > /dev/null 2> /dev/null
OUT:if test "$?" = "0"; then
OUT: # If we have killall, it's easy.
OUT: killall -9 orted
OUT:else
OUT: # We're on an OS without killall. Which variant of ps do we have?
OUT: ps auxw > /dev/null 2> /dev/null
OUT: if test "$?" = "0"; then
OUT: ps_args="auxww"
OUT: else
OUT: ps_args="-eadf"
OUT: fi
OUT: pids=ps $ps_args | grep $who | grep -v grep | grep orted awk '{ print $2 }'
OUT: if test "$pids" != ""; then
OUT: kill -9 $pids
OUT: fi
OUT:fi
OUT:
OUT:# Whack any remaining session directories. This is a workaround for
OUT:# current bugs in OMPI.
OUT:rm -rf /tmp/openmpi-sessions-${who}*
OUT:
Command complete, exit status: 512
}}}
It would be good for MTT to track which resource manager is used for test runs.
This is a little complicated, however, because it is possible for the MPI Details section to override which RM is used (e.g., to explicitly test, say, the native RM and rsh). For example:
{{{
[MPI Details: foo]
exec = mpirun --mca pls fork,&enumerate("rsh", "slurm") ....
}}}
So we'd somehow need to track which RM is used ''for each test run result.'''
This was already done for the IBM test suite.
The idea is to have tests that require a specific number of processes to be tolerant of when they are not run with the right number. Hence, if the test needs 6 processes and it is run with 4 (or 8 or 3 or ...), it should shut down in an orderly fashion (MPI_FINALIZE), and exit with a status of 77 indicating that the test was skipped.
The value of 77 was taken from the GNU coding standards.
Title says all.
When using the Intel compiler the condfigure breaks because it cannot executed the compiled executeable. Looks like that something bad happens to LD_LIBRARY_PATH because the lib directory is not in the default path. Configure called "by hand" works fine.
{{{
configure:4154: $? = 0
configure:4177: checking for C compiler default output file name
configure:4180: icc conftest.c >&5
configure:4183: $? = 0
configure:4229: result: a.out
configure:4234: checking whether the C compiler works
configure:4240: ./a.out
./a.out: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
configure:4243: $? = 127
configure:4252: error: cannot run C compiled programs.
If you meant to cross compile, use --host'. See
config.log' for more details.
}}}
The "trim" phase needs to be completed so that scratch directories do not grow out of control after running for a while.
after updating to r245 I have the problem that the MTT doesn't (even try) to
submit the results to the database. In the older version (r231) MTT at least
tried to send the results but failed with an error.
{{{
*** Reporter initializing
Got hostname: noco084.nec
Found whatami: /home/HLRS/hlrs/hpcstork/mtt/client/whatami/whatami
Evaluating: MTTDatabase
Initializing reporter module: MTTDatabase
Evaluating: require MTT::Reporter::MTTDatabase
Evaluating: $ret = &MTT::Reporter::MTTDatabase::Init(@Args)
Evaluating: hlrs
Evaluating: hlrsompi
Evaluating: https://localhost:4323/mtt/submit/
Evaluating: OMPI
Evaluating: Cacau at HLRS
Evaluating: TextFile
Initializing reporter module: TextFile
Evaluating: require MTT::Reporter::TextFile
Evaluating: $ret = &MTT::Reporter::TextFile::Init(@Args)
Evaluating: cacau-$phase-$section-$mpi_name-$mpi_version.txt
Evaluating:----------------------------------------------------------<<<<
File reporter initialized
(/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-$
phase-$section-$mpi_name-$mpi_version.txt)
*** Reporter initialized
...
Command complete, exit status: 0
Evaluating: require MTT::Reporter::TextFile
Evaluating: $ret = &MTT::Reporter::TextFile::Submit(@Args)
File reporter
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Reported to text file
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Test run [test run: intel]
Evaluating: intel
Found a match! intel [intelEvaluating: Simple
}}}
MTT was designed to be able to be interrupted; if you re-start MTT with the same command line arguments and ini file and nothing has changed on the server side (e.g., no new version of MPI or version of tests), MTT should resume where it left off.
However, in some cases, results for all the tests won't be reported. For example, if you interrupt MTT in the middle of a long intel test run, although MTT has all the meta data for the tests that have already been run (and will properly resume where it left off if you restart MTT), it will only report the results of the tests that it executed during the current run. That is, the results of the tests of the previous run are not reported back to the database.
Allow MTT to run performance tests and save the results in a historical database. For example, run NetPIPE and save the data over time. Be able to report the NetPIPE data in graphical form where relevant (e.g., look at the NetPIPE data for a given BTL from a given cluster over arbitrary time periods).
Should have support for at least the following test suites:
Probably want to add support for more over time, such as:
The mtt database repeats many character strings thousands of times. For columns that contain such strings, a separate table should be created to index into from the main table. E.g., an entry that currently looks like:
||'''hostname''' ||'''test_name''' ||'''result''' ||
||somehost.com ||hello ||1 ||
Will instead look like:
||'''hostname''' ||'''test_name''' ||'''result''' ||
||index1 ||index2 ||1 ||
Where {{{hostname}}} and {{{test_name}}} tables exist that contain the following entries:
||'''index''' ||'''hostname''' ||
||index1 ||somehost.com ||
{{{}}}
||'''index''' ||'''test_name''' ||
||index2 ||hello ||
Will this significantly degrade performance?
The output reported by "whatami" on the cacau cluster is "linux-unknown_please_send_us_a_patch-x86_64".
We need to fix this (and send a patch to the whatami guys).
Make the ability to have a centralized INI file with a global set of configurations to test that apply to a set of users (E.g., the OMPI core testers). This allows standardization of the set of tests that are run, etc.
Need to provide "opt-out" capabilities from the centralized INI file -- for example, the centralized INI file may list the trunk and all the release branches for OMPI (e.g., trunk, 1.0, 1.1, 1.2). But Sun only cares about the trunk and 1.2, so they should be able to opt-out of the 1.1 and 1.0 tests.
Additionally, each MTT site will need to be able to customize some fields, such as which compilers to use, etc.
MTT needs support for torque/pbs scheduler.
HLRS is getting an extra "," in their mpirun command lines, preventing tests from being run. From a mail from Sven:
I configure ompi with TM. I'm using r229 and the tests are not executed. The
output of MTT is show below. Do you have an idea where the additional comma
after the "-np 4" comes from ?
{{{
String now: mpirun -np &test_np() --prefix &test_prefix()
&test_executable() &test_argv()
Got name: test_np
Got args:
_do: $ret = MTT::Values::Functions::test_np()
&test_np returning: 4,
String now: mpirun -np 4, --prefix &test_prefix() &test_executable()
&test_argv()
Got name: test_prefix
Got args:
_do: $ret = MTT::Values::Functions::test_prefix()
&test_prefix returning:
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
&test_executable() &test_argv()
Got name: test_executable
Got args:
_do: $ret = MTT::Values::Functions::test_executable()
&test_executable returning: src/MPI_Allreduce_loc_f
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f &test_argv()
Got name: test_argv
Got args:
_do: $ret = MTT::Values::Functions::test_argv()
&test_params returning
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f
Evaluating: &max(30, &multiply(10, &test_np()))
Got name: test_np
Got args:
_do: $ret = MTT::Values::Functions::test_np()
&test_np returning: 4,
String now: &max(30, &multiply(10, 4,))
Got name: multiply
Got args: 10, 4,
_do: $ret = MTT::Values::Functions::multiply(10, 4,)
&multiply got: 10 4
&multiply returning: 40
String now: &max(30, 40)
Got name: max
Got args: 30, 40
_do: $ret = MTT::Values::Functions::max(30, 40)
&max got: 30 40
&max returning: 40
String now: 40
Evaluating:
Running command: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f
Timeout: 1 - 1156505332 (vs. now: 1156505292)
OUT:-----------------------------------------------------------------------
OUT:Could not execute the executable
"/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/install
s/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install/bin/":
Permission denied
OUT:
OUT:This could mean that your PATH or executable name is wrong, or that you
do not
OUT:have the necessary permissions. Please ensure that the executable is
able to be
OUT:found and executed.
OUT:-----------------------------------------------------------------------
}}}
Sven suggests:
Maybe we can introduce a new section (e.g.error which acts as an error handler. The user should be able define an action (shell script, e.g. send mail) and a policy (e.g. stop, continue, ...).
Add support for MTT users who are behind firewalls or otherwise not directly connected to the internet. Specifically, allow scenarios like:
In the file lib/MTT/MPI/Install/OMPI.pm MTT before running the check deletes the LD_BIRARY_PATH to avoid any problems with other librarires. I run into the problem that is mentioned in the comment with the compiler that needs libs of the LD_LIBRARY_PATH.
I think it should be possible to avoid the deletion of the LD_LIBRARY_PATH and and the problems with other libraries. If we simple prepend the MTT paths to the LD_LIBRARY_PATH then it is supposed to work, because in this case the MTT libs are always infront of all the others libs in the LD_LIBRARY_PATH.
It would be useful to print some basic timing information at the end of an mtt client run (e.g., start/stop/elapsed time of each phase) upon demand (e.g., --print-times, or somesuch). This will be helpful in determining how long a particular ini file takes to run, and can help with planning purposes for how much to test, how frequently, etc.
Mutliple users have been burned by running through all the tests but then failing to submit properly because of some kind of issue (e.g., not having SSL perl support, typing the URL wrong, etc.).
We should have a test submit URL that the mtt client can test connecting during its init phase and try connecting to the submit URL. If it fails to connect properly, we can abort right away in the beginning and not waste potentially hours of compute time before realizing that there's an error.
This is simple to implement in the MTTDatabase reporter; we just need to be sure that submit.php can safely handle HTTP GET connections with no data (which I think it already can, but want to be sure).
So this ticket represents two things:
Although MTT allows the arbitrary definition of "pass" criteria, we have some large test suites where a small number of the tests are supposed to fail (e.g., IBM and Intel). I.e., most of them "pass" by having an exit status of 0, but some of them pass by having a non-zero exit status (e.g., testing MPI_ABORT).
Particularly when we find the test executables via &find_executables() (which finds ''all'' test executables -- both the ones that are supposed to pass and the ones that are supposed to fail), it's hard to have a global set of pass criteria for all of them. So a better scheme needs to be implemented to allow this kind of flexibility. Some ideas:
I have noticed that the "Test Runs" fields on:
{{{
http://www.open-mpi.org/mtt/summary.php
}}}
are always empty even if MTT (seems to) submits the results of the runs to the DB.
It would be good for HTTP users to be able to delete some or all of their results from the database (not from the MTT client, but probably from a web page). For example, if a user screws up and submits a bad batch of results (e.g., a compiler license expired, so it falsely reported compile failures), it would be good if the user had a relatively simple method of being able to delete those results from the database rather than skew the results and reports in the database.
The Cluster Summary table in summary.php currently combines the hostname where the results were submitted with the platform ID from the ini file. This is misleading in cases where MTT users are running on clusters with schedulers, meaning that they don't always run (and therefore submit) from the same host.
Case in point is HLRS who runs on some flavor of a PBS cluster (cacau). Right now, summary.php is showing a different entry in the Cluster Summary table for every run that they've done, when, in fact, they're all really from the same cluster (cacau@HLRS).
Hence, the Cluster Summary table should roll up all results from the same cluster, regardless of what node they were run on.
Using {{{CREATE INDEX}}}, the database can be optimized for performance. Since indexes have some overhead of their own, care must be taken to create them for the appropriate columns. Analyzing HTTP logs for how reporter.php is being used should help our decision making here. (Or maybe a better way would be to somehow audit the queries done on the mtt database?)
(See: http://www.postgresql.org/docs/8.1/interactive/sql-createindex.html)
From the MTT developer's conference notes:
Implement 'test specify' phase - replaces current test run INI stuff
[test specify: intel]
test_build = intel
module = intel
}}}
The Test Run phase then becomes an engine that simply takes the output of the Test Specify phase (which is kinda how the code is currently organized anyway, but the name "Test Run" implies that the modules for this phase have more control than they really do).
Rainer mentioned that we're requiring a bunch of Perl modules that aren't necessarily installed by default on some older machines (e.g., his). He installed them to make it work, but it might be nice if we can cut down on the number of requirements -- particularly when running on MTT on parallel compute nodes, where perl installs are likely to be minimal (i.e,. all we need to do is run thests and dump output to files there; no need for fancy downloading perl modules, etc.). From a mail from Rainer:
It seems that quite a few packets are required to build the ParallelUserAgent-2.56:
Some tests are deliberately skipped (e.g., not enough/not the right number of processes to run the test) and should not be counted as "passed" or "failed" -- instead, there should be a new category called "skipped".
For the moment, the ompi-core-template.ini file -- at least in the IBM test run section -- checks for status 77 from a test and marks that as a "pass" (tests return status 77 when they want to be skipped; a prededent established by the GNU coding standards).
The Test Get phase needs some kind of versioning, just like the MPI Get phase.
Without versioning, there is no way to know if there are new versions of tests that need to be downloaded/run (even if the MPI version has not changed).
Sun may have some tests that require checking stdout / stderr to see if a test passed. So we need to provide funclets that give access to the stdout / stderr of a test run, and probably some simple string checking funclets (e.g., &grep(), ®exp(), ...).
This ticket is conditional; talk to Sun to see if it's worthwhile before implementing.
Need to make the appropriate extension to mtt to be able to use the N1GE RM to run tests.
This is something that Ethan reported last week and I thought I had fixed it. Blah!
Sometimes the MPI version number comes up either blank or has a string in it. For example, in sumary.php, I'm currently seeing some bad version number for the tests that I just ran on the 1.1.1rc2 tarball:
{{{
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 gnu 1 0 0 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 ibm 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 imb 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 intel 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 trivial 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 mtt_version_major: 0 intel 0 0 0 0 0 88
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 mtt_version_major: 0 trivial 0 0 0 0 0 4
}}}
I also see the following in Test Build output:
{{{
Test build [test build: trivial]
Already have a build for [ompi-rc-v1.1] / [] / [gnu] / [trivial]
}}}
So I think there's another place in the code that isn't doing the MPI version number properly.
summary.php is basically a one-size-fits-all version of reporter.php. reporter.php should be used as a backend for summary.php such that patches applied to reporter.php will effectively be applied to both scripts.
Josh Hursey noticed that there are ''no'' test run results behind shown on summary.php.
We know that there are valid test run data in the db (e.g., he submitted some last night), but they aren't showing up on summary.php.
Could this be due to some mucking around that I did in summary.php?
I am seeing many PHP warnings in the web server logs, indicating problems with summary.php. Here's a snipit from the web logs (I am trying to get the IU admins to make these available to us in real-time; right now, you have to ask for them because the files are not readable by our logins) -- I'll attach the entire log that I have that shows all the problems:
{{{
[client 64.102.254.33] PHP Notice: Undefined index: debug in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 90, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: db in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 338, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined variable: argv in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 339, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: level in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 351, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: verbose in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 368, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: go in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 375, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: go in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 381, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 522, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 541, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 562, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 658, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 1 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 541, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 1 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: verbose in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 585, referer: http://www.open-mpi.org/mtt/
}}}
The rollup values for how many test runs failed don't seem to match in the output from summary.php. I have attached an html snapshot of summary.php from right now. There are 5 rows in the executive summary table; they show 11 / 1 / 1 / 1 / 1 test run failures, respectively.
Similarly, the Cluster Sumary table has 6 rows, showing 11 / 0 / 1 / 1 / 1 / 1 test run failures, respectively.
However, in the Test Suites summary, it shows numbers much larger than 11 and 1 (E.g., 14, 3, 468, etc.).
Am I reading these numbers wrong? Are some of these "normalized"? If so, it would be good to notate that on the column head, and describe what "normalized" means.
The MTTDatabase submit method can be made more efficient.
For example, a single run of the IBM test suite for 2 values of np (each with 1 variant), generates 362 results. This currently requires '''362 separate HTTP connections''', each of which averages around 2k of data transfer (combined send and receive). This is approximately 3/4 MB total transfer. It also takes '''several minutes''' to complete (submitting from a test cluster at Cisco).
I'm not so concerned about the total number of bytes transferred, but it could be significantly reduced. The MTT client currently sends a lot of repeated data for ''each result.'' The most obvious changes that I'm thinking of are:
Both the client and the server would need to be modified to make this happen. It would probably make the whole process significantly more efficient in the following ways:
The hostfile / hostlist functionality is currently half-implemented. It is read in the .ini file and put in the MTT meta-data, but it is not used anywhere.
If we're going to have users outside of developers using MTT, we need a way for them to report the version that they're running.
I added a new field in the Test Run report named "timed_out". This field is now sent to the MTT database via the MTTDatabase reporter. It's a logical value and will always be either 0 or 1.
This field indicates whether a test timed out or not (different than failing). The timeout in some OS's is somewhat fuzzy, so it's possible for a test to actually go [slightly] over its timeout value and still pass. Hence, this flag specifically indicates whether a test was killed because it had timed out.
More specifically:
The server side needs to now accept this flag and enter it into the data, and the reports need to be adjusted accordingly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.