wanted: TEST-OP to inform caller about success or failure of test suite

Bug #479478 reported by Tobias C. Rittweiler
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ASDF
Won't Fix
Wishlist
Unassigned

Bug Description

To automate running the test suites of lots of projects -- for example to test for
regression in CL implementations, or for endeavors like LibCL -- it is necessary
that TEST-OP somehow passes information about success/failure to its caller.

Discussion and proof-of-concept at

    http://thread.gmane.org/gmane.lisp.asdf.devel/309

Changed in asdf:
status: New → Confirmed
importance: Undecided → Wishlist
Faré (fahree)
Changed in asdf:
milestone: none → version3
Revision history for this message
Faré (fahree) wrote :

NB: You can look at how compile-op on systems does things in 2.26.121 to report that compilation did indeed happen without any non-cancelable deferred warning.

Revision history for this message
Faré (fahree) wrote :

So, we could split TEST-OP into TEST-REPORT-OP that makes a REPORT file, and TEST-OP that succeeds of fails depending on the report.

Or we could make TEST-OP the thing that creates the report, and REPORT-TEST-OP the thing that reads the report and displays it to the end-user.

In either case -- shouldn't this be moved out of ASDF itself intro some contrib or extension?

I'd like to resolve this bug as "Won't Fix", because it's "Not MY problem".

Revision history for this message
Matt Niemeir (matt-niemeir) wrote :

I recently bumped up against half of this problem a few times: I needed to programmatically determine whether a test suite passed. My use cases happened to be a script for a script for a git pre-commit hook, and an attempt to add the ability for a test library to test a system in a fresh remote lisp. I anticipate bumping into it again if I set up continuous integration.

(The other half of the problem, in my opinion, is the coordination issue. That is, given a large set of other people's systems how do I programmatically determine which ones pass their tests? There seemed to be considerable excitement on that front in the 2009 thread but I haven't run into this issue personally.)

I looked for a nice way to solve the first half of the issue in the ASDF manual, but found instead a reference that test-op has been discussed in the bug tracker and mailing list -- so here I am.

Proposal:

  1. Make ASDF:TEST-SYSTEM and the corresponding ASDF:OOS call return the value of the associated ASDF:PERFORM call. That would solve my issue nicely.

  2. Separately reconsider whether the second half is still desirable. If so, endorse a return format and ask users to follow it. CL-TEST-GRID seems relevant to this, both for being the largest example of programmatically testing arbitrary CL projects and for having its own test return format, perhaps we could get Anton to comment.

In the previous discussion it was pointed out that some systems are tested in a way such that they would not be able to profitably take advantage of this {particularly systems which use :in-order-to ((test-op (test-op subsystem)))}. I think the answer is "okay, this isn't for those systems. It won't break their current practice."

Here's the previously mentioned pre-commit hook. Note the circuitous logic of sending the output to a string, which I then both read (to get at the value which I would like to make test-system return) and print (to recover the output).

#!/usr/bin/sbcl --script
;; If I want to commit while the tests are failing, the hook requires me to pass a force flag.
(let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
                                       (user-homedir-pathname))))
  (when (probe-file quicklisp-init)
    (load quicklisp-init)))

(sb-ext:quit :unix-status
             (handler-case (let ((string
                                  (with-output-to-string (*standard-output*)
                                    (asdf:test-system 'dishes))))
                             (if (eq (read-from-string (print string)) :ok)
                                 0
                                 1))
               (serious-condition () 1)))

Revision history for this message
Faré (fahree) wrote :

I'm sure the maintainer will gladly consider a patch that implements your first point if said patch doesn't completely violate the structure and symmetries of ASDF. On the other hand, I don't see how you can indeed have OPERATE meaningfully return the value of PERFORM: it should never be guaranteed that any given action be in the plan. Sure, TEST-OP operations will be performed, because they are magically operation-done-p nil, which is an ASDF1-inherited kluge, really.

The real ASDFy solution would be to have an operation TEST-REPORT-OP that creates a report file. Then you can standardize the format of that report file all you want, or reuse an existing file format from an existing library. TEST-OP would just look at the report and throw an error if it's not satisfactory.

Now, if you are going for standardization, may I interest you in following the procedure correctly?
http://fare.livejournal.com/169346.html

Revision history for this message
Faré (fahree) wrote :

Note point 3 of the 10-point plan: work with library authors. They have the expertise.

Note that you can totally define a TEST-REPORT-OP within the current framework, as a regular extension to ASDF. You might also want to define a subclass of SYSTEM that automatically defines the correct methods for TEST-OP and TEST-REPORT-OP.

Revision history for this message
Robert P. Goldman (rpgoldman) wrote :

We use ASDF's TEST-OP integrated with Jenkins, based on FiveAM tests, and I have written a subclass -- the FiveAM-tester-system -- of SYSTEM that we use for the purpose. I could probably share this system definition, but it has a couple of drawbacks, most notably being that we have never upgraded from using the ARNESI-based FiveAM to the ALEXANDRIA-based version.

As Fare has explained ASDF does not return results from OPERATE, and for reasons including (a) the plan-build structure and (b) the nondeterminism, cannot easily be made to do so (where "cannot easily" may expand to "end-to-end rewrite"), we have a couple of expedients:

1) cause FIVEAM-TESTER-SYSTEM to signal an error condition when tests fail. If we do this, we can embed the call to ASDF:TEST-SYSTEM into a lisp script that exits with a non-zero error-code when a test fails. That script is then used in Jenkins.

2) we find that solution (1) is not always adequate. We have found error cases when we used this method where our tests "passed" because for some reason a test that would have failed was never run or, because of some bug, NONE of the FiveAM tests was run. Our solution to this problem is a general one: we have a testing script whose purpose is to grovel over one or more test transcript files. It's very generally configurable (it's a general transducer from transcript files to exit codes for the benefit of Jenkins) but in the case of a FiveAM-tested system what it does is (i) check for test failures and (ii) check that the expected number of tests is actually run. Typically it's a run of this script that we embed in our Jenkins builds.

So those are a couple of ways that one can build a continuous integration approach onto the existing ASDF framework.

Revision history for this message
Matt Niemeir (matt-niemeir) wrote :

Thank you both for the detailed answers.

Fare, TEST-REPORT-OP sounds even more complicated than my current solutions -- it keeps the trick of using serialization to cross the asdf-to-user-code-barrier that I am already using, but also enlists the filesystem. I'll reiterate that what drove me here is not the matter of consolidating libraries or the coordination problem (although of course here I am reporting my issue with ASDF -- I think if your consolidation process had an 11th step this would be it), but that I am looking for a direct way to acquire the return value of running a test suite as the product of testing a system with asdf.

I'm not suggesting that all operations return their perform value, just test-op -- so I think nondeterminism would not be an issue (but I am not familiar with the inner workings of asdf).

Robert, thanks for the expedients. They are similar to what I have been experimenting with. One I have toyed with is to have my test suite always signal a non-error condition which just wraps the test results. That way a script could error if the condition is never signaled, and otherwise behave according to the test results. I think in many common cases these solutions are less obvious and more complicated than what would be available if asdf:test-system returned the result of the user's testing.

I'm not entirely sure if the remaining objection is just that it might be prohibitively difficult to implement, or if also my proposed behavior would be antithetical to asdf's design.

Revision history for this message
Robert P. Goldman (rpgoldman) wrote :

This really has been discussed to death on the mailing list. The problem is that "the perform value" of an operation is not well-defined. To define this value, we would have to define a way that the value of all the perform calls in the build plan are to be rolled up into a value for the plan as a whole. And a way to roll the values of the operations that were performed to ENABLE performing this operation, etc.

If you look at the discussion in the mailing list, I proposed that one way to do this would be to have hierarchical plans, instead of flat plans. But this would be an extensive rewrite, really not adding a simple feature, but producing ASDF 4.0

For now, I'm afraid that makes it a WONTFIX. Sorry.

Changed in asdf:
status: Confirmed → Won't Fix
Revision history for this message
Faré (fahree) wrote :

Also, I think you're dismissing the TEST-REPORT-OP solution too fast. Considering the space used by fasl files, a successful test report should be nice and small in comparison, whereas a big failure report is something that's much better saved to disk for further inspection, shipping to upstream maintainers, etc. It also solves the communication problem in the standard ASDF way: actions communicate their results via files on disk. Frankly, I don't see any non-negligible downside to the approach.

Trying to return the results of only one PERFORM would make the code very dissymmetric, and would suck. Trying to return all the results of all PERFORMs would eat a lot of memory and would suck. Moreover, because only outdated actions are replayed, you would basically have to remember these results in a hash-table mapping action to perform result. But then, wouldn't you want to have results from past actions even if not performed in the current session? That would mean saving results for all actions. Ugh.

It's so much better to save the test results in a report file. Just do that.

Revision history for this message
Matt Niemeir (matt-niemeir) wrote :

Whether I might also want a TEST-REPORT-OP seems beside the point. Easy generation of a results file could come in handy, but I would still want to get at the actual returned value. Attempting to avoid using serialization (or global state, or signalling) to get a lisp value from a lisp function is exactly what brought me here.

But I don't plan to rewrite plans:) So thanks for the explanations and the clear answer.

Revision history for this message
Faré (fahree) wrote :

If you don't seen details on success, signalling is fine. If you do, you can side-effect a special variable to store your test results, and then the user can locally bind this variable if he wants to contain the side-effect. But the design of ASDF, inherited from ASDF 1, which got it from earlier defsystems, is such that no value can returned without ugly kludges.

A somewhat acceptable kluge might be for perform-plan to always keep the results (single or multiple) of the last action performed, and return that as one of its many return values, but only if that action matches the one requested by the user (which it would be for test-op, which has operation-done-p nil), which itself would be indicated by a boolean returned as an additional value. Operate would return that flag and multiple-value-list as third and fourth return values. That's somewhat ugly, but oh well. Matt, would that work for you? Note that values are thus never passed from one action to the other, only from the requested action (if performed) to the user.

Revision history for this message
Matt Niemeir (matt-niemeir) wrote :

It would. I think it would be additionally convenient if asdf:test-system returned (apply #'values multiple-values-list-value). Currently its docstring says it is a shorthand for (asdf:operate 'asdf:test-op system), but it always returns t.

Revision history for this message
Faré (fahree) wrote :

Matt: would you submit a patch that does that?

Robert: would you accept it?

Not my job anymore.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.