copilot-language / copilot-verifier Goto Github PK

System for verifying the correctness of generated Copilot programs

Haskell 98.80% C 0.53% Dockerfile 0.67%

copilot-verifier's Introduction

Copilot: a stream DSL

Copilot-language contains the actual embedded domain specific language that Copilot provides to its users. It comes with a series of basic operators and functionality, typically enough for most applications. Extended functionality is provided by the copilot-libraries module.

Copilot is a runtime verification framework written in Haskell. It allows the user to write programs in a simple but powerful way using a stream-based approach.

Programs can be interpreted for testing, or translated C99 code to be incorporated in a project, or as a standalone application. The C99 backend ensures us that the output is constant in memory and time, making it suitable for systems with hard realtime requirements.

Installation

Copilot-language can be found on Hackage. It is typically only installed as part of the complete Copilot distribution. For installation instructions, please refer to the Copilot website.

Further information

For further information, install instructions and documentation, please visit the Copilot website: https://copilot-language.github.io

License

Copilot is distributed under the BSD-3-Clause license, which can be found here.

copilot-verifier's People

Contributors

Stargazers

Watchers

Forkers

libertasspz

copilot-verifier's Issues

Use examples in a test suite

As a matter of convenience, it would be handy to be able to run cabal test after making some changes and determine if all the examples continue to verify. I think this should be relatively straightforward to accomplish using tasty. One complication is that the verify function does use anything like exitFailure when there are unproven goals, so we should add that first.

Ask Copilot for the names of generated global variables

copilot-verifier/copilot-verifier/src/Copilot/Verifier.hs

Lines 548 to 551 in ea2c64e

    
           do -- TODO, should get these from somewhere inside copilot instead of building these names directly 
        
              let idxName = "s" ++ show nm ++ "_idx" 
        
              let bufName = "s" ++ show nm 
        
              let buflen  = genericLength vs :: Integer

As alluded to in the above comment, we are currently building the names of stream variables in the generated code from hidden, internal knowledge of the Copilot code generator. It would be better and more robust to have the Copilot library itself tell us this information instead. Currently no API exists for obtaining this information, so we will probably need to design something and open an upstream PR.

Log when Crucible simplifies trivial proof goals

Crucible simplifies away certain proof goals if they can trivially be reduced to True. It would be worth seeing if we can still log them anyway, as their omission can be confusing if you aren't aware that Crucible does this.

Prepare for eventual release to Hackage

We'd like to make it so that one can install copilot-verifier directly from Hackage. To do so, we'll need to undertake some preparations described here.

Put all dependencies on Hackage

Currently, not all of copilot-verifier's dependencies are on Hackage. We will need to make sure that the following libraries (all from the crucible repo) are on Hackage:

We should also make sure that the following dependencies are up to date on Hackage:

Other preparations

Update the description field of copilot-verifier.cabal here.
Update the maintainer field of copilot-verifier.cabal here.
Update the copyright years here and here.
Add a source-repository stanza to copilot-verifier.cabal, inspired by the one here.
See if there is a way to include the copilot-verifier README file as part of the Hackage distribution. The README is located a directory up from copilot-verifier.cabal, which might give cabal trouble.
Bump version to 0.1.

Add CI for `copilot-verifier`

We should set up a basic continuous integration setup for copilot-verifier. At the very least, this should do the following:

Ensure that copilot-verifier and the associated demo programs build successfully
Run the copilot-verifier test suite

Can't verify trigger function using array of structs

copilot-verifer fails to verify the following program:

{-# LANGUAGE DataKinds #-}
module Main (main) where

import Copilot.Compile.C99
import Copilot.Verifier
import Language.Copilot
import Prelude             hiding ((++))

data S = S { field :: Field "field" Int16 }

instance Struct S where
  typename _ = "s"
  toValues s = [Value Int16 (field s)]

instance Typed S where
  typeOf = Struct (S (Field 0))

spec :: Spec
spec = trigger "f" true [arg stream]
  where
    stream :: Stream (Array 2 S)
    stream = [array [S (Field 27), S (Field 42)]] ++ stream

main :: IO ()
main = do
  spec' <- reify spec
  verify mkDefaultCSettings [] "delay" spec'

$ runghc Delay.hs
[copilot-verifier] Generated "results/delay/delay.c"
[copilot-verifier] Compiled delay into results/delay/crux~delay.bc
[copilot-verifier] Translated bitcode into Crucible
[copilot-verifier] Generating proof state data
[copilot-verifier] Computing initial state verification conditions
[copilot-verifier] Proving initial state verification conditions
[copilot-verifier] Proved 1 of 1 total goals
[copilot-verifier] Computing step bisimulation verification conditions
Delay.hs: user error (Simulation aborted!
Abort due to assertion failure:
  results/delay/delay.c:34:6: error: in step
  Failed to load function handle
  Details:
    No implementation or override found for pointer)

I suspect that the culprit is this code in copilotTypeToLLVMType:

https://github.com/GaloisInc/copilot-verifier/blob/3912e8fdf420d088bce4799263d017e1dbffa744/copilot-verifier/src/Copilot/Verifier.hs#L1006-L1007

The CT.Struct case in loop is returning an anonymous L.Struct. This is almost certainly not right, as the LLVM bitcode for this program references %struct.s (i.e., an L.Alias) rather than an anonymous struct. We should update the code to more closely resemble this line in copilotTypeToLLVMTypeCompositePtr, which does this correctly:

https://github.com/GaloisInc/copilot-verifier/blob/3912e8fdf420d088bce4799263d017e1dbffa744/copilot-verifier/src/Copilot/Verifier.hs#L1028-L1029

Display Copilot-related line number information

There are a handful of places in the verifier where we produce line number information:

copilot-verifier/copilot-verifier/src/Copilot/Verifier.hs

Line 521 in d8dc402

loc <- getCurrentProgramLoc sym

copilot-verifier/copilot-verifier/src/Copilot/Verifier.hs

Lines 574 to 577 in d8dc402

    
           -- TODO, would be lovely to be able to do better than dummy positions for all these things 
        
           -- so we can correspond assumptions and asserts back to the parts of the original spec that 
        
           -- gave rise to them. 
        
           dummyLoc = mkProgramLoc "<>" InternalPos

copilot-verifier/copilot-verifier/src/Copilot/Verifier.hs

Line 794 in d8dc402

let loc = mkProgramLoc "<>" InternalPos

None of these currently display any information about the Copilot program being verified. In the best case, we simply print out line information about the C program (obtained from crucible-llvm's getCurrentProgramLoc function, but in the worst case, we print out nothing at all (which is what the silly "<>" strings accomplish).

We should modify Copilot to make it possible to use line number information about the Copilot program within the verifier. Specifically, I'm thinking that we will add CallStack information to some parts of a Copilot Spec:

-- | A Copilot specification is a list of streams, together with monitors on
-- these streams implemented as observers, triggers or properties.
data Spec = Spec
  { specStreams    :: [Stream]
  , specObservers  :: [Observer]
  , specTriggers   :: [Trigger]
  , specProperties :: [Property]
  }

We aren't making use of Observers in our work, so we can ignore those. We are making use of Triggers, Propertys, and Streams however.

Let's start with Triggers, as they are a bit simpler. We should add something like a triggerCallStack :: CallStack field here. The trigger function is the primary means by which Triggers are produced, and In order to produce precise CallStack information for Triggers, we should add a HasCallStack constraint to trigger (and any other functions that produce Triggers).

We can handle Propertys in a similar way. The primary means by which Propety values are produced is the prop function, and we could add a propertyCallStack :: CallStack field to Property here.

Streams are a more interesting example because there are many different ways to construct them. The Copilot.Language.Operators directory in copilot-language contains various Stream-producing functions, including (but not limited to):

All of these will need HasCallStack constraints. But where should we store these? Most likely, we will want to change copilot-language's Stream data type. Stream has data constructors corresponding to all of the possible ways to construct stream expressions, and I suspect that we will need to give each data constructor a CallStack field. Once that is done, we can plumb this information up into the verifier. (I haven't fully thought through the details of how that will work yet, but one step at a time.)

We should do this work in a branch of our fork of copilot for now.

Floating-point `signum` operation fails to verify

If you try to verify an example using signum, e.g.,

{-# LANGUAGE NoImplicitPrelude #-}
module Main where

import Copilot.Compile.C99 (mkDefaultCSettings)
import Copilot.Verifier (verify)
import Language.Copilot

spec :: Spec
spec = do
  let stream :: Stream Double
      stream = extern "stream" Nothing

  trigger "copysign" (signum stream >= signum stream) [arg stream]

main :: IO ()
main = reify spec >>= verify mkDefaultCSettings [] "copysign"

It will eventually fail with:

Proving step bisimulation verification conditions
*** Exception: user error (user error (evalGroundExpr: could not evaluate expression: let uninterpreted_float_le = ??
    uninterpreted_float_abs = ??
    -- :1:0
    v4410 = apply uninterpreted_float_abs cstream_r0@4408:bv
 in apply uninterpreted_float_le v4410 v4410)

`FPOps` example fails when verifying `Float`s instead of `Double`s

If you change Double to Float in the FPOps example like so:

diff --git a/copilot-verifier/examples/Copilot/Verifier/Examples/FPOps.hs b/copilot-verifier/examples/Copilot/Verifier/Examples/FPOps.hs
index e503eb9..e8ba844 100644
--- a/copilot-verifier/examples/Copilot/Verifier/Examples/FPOps.hs
+++ b/copilot-verifier/examples/Copilot/Verifier/Examples/FPOps.hs
@@ -9,7 +9,7 @@ import qualified Prelude as P
 
 spec :: Spec
 spec = do
-  let stream :: Stream Double
+  let stream :: Stream Float
       stream = extern "stream" Nothing
 
   triggerOp1 "abs" abs stream
@@ -43,26 +43,26 @@ spec = do
   triggerOp2 "atan2" Copilot.atan2 stream
 
 triggerOp1 :: String ->
-              (Stream Double -> Stream Double) ->
-              Stream Double ->
+              (Stream Float -> Stream Float) ->
+              Stream Float ->
               Spec
 triggerOp1 name op stream =
   trigger (name P.++ "Trigger") (testOp1 op stream) [arg stream]
 
 triggerOp2 :: String ->
-              (Stream Double -> Stream Double -> Stream Double) ->
-              Stream Double ->
+              (Stream Float -> Stream Float -> Stream Float) ->
+              Stream Float ->
               Spec
 triggerOp2 name op stream =
   trigger (name P.++ "Trigger") (testOp2 op stream) [arg stream]
 
-testOp1 :: (Stream Double -> Stream Double) -> Stream Double -> Stream Bool
+testOp1 :: (Stream Float -> Stream Float) -> Stream Float -> Stream Bool
 testOp1 op stream =
   -- NB: Use (>=) rather than (==) here, as floating-point equality gets weird
   -- due to NaNs.
   op stream >= op stream
 
-testOp2 :: (Stream Double -> Stream Double -> Stream Double) -> Stream Double -> Stream Bool
+testOp2 :: (Stream Float -> Stream Float -> Stream Float) -> Stream Float -> Stream Bool
 testOp2 op stream =
   op stream stream >= op stream stream

Then it will fail to verify:

$ cabal run verify-examples -- FPOps
Up to date
=====
== Running the FPOps example...
=====

Generated "results/fpOps/fpOps.c"
Compiled fpOps into results/fpOps/crux~fpOps.bc
Translated bitcode into Crucible
Generating proof state data
Computing initial state verification conditions
Proving initial state verification conditions
All obligations proved by concrete simplification
Computing step bisimulation verification conditions
Proving step bisimulation verification conditions
verify-examples: user error (user error (evalGroundExpr: could not evaluate expression: let uninterpreted_float_le = ??
    uninterpreted_float_abs = ??
    -- :1:0
    v61 = apply uninterpreted_float_abs cstream_r0@59:bv
 in apply uninterpreted_float_le v61 v61)
)

The error message complains about uninterpreted_float_abs, but the problem is more general than just abs, as each individual trigger in FPOps will fail even if the others are commented out.

Add some examples that use floating-point operations

We should demonstrate the copilot-verifier is capable of handling of specifications that make use of floating-point operations. A good first milestone is the Heater example from the copilot repo (not to be confused with copilot-verifier's own Heater example, which avoids the use of floating-point). At the moment, if you try running copilot-verifier on this Heater example, it will fail to prove these goals:

=====
== Running the HeaterCelsius example...
=====

Generated "results/heaterCelsius/heaterCelsius.c"
Compiled heaterCelsius into results/heaterCelsius/crux~heaterCelsius.bc
Translated bitcode into Crucible
Generating proof state data
Computing initial state verification conditions
Proving initial state verification conditions
All obligations proved by concrete simplification
Computing step bisimulation verification conditions
Proving step bisimulation verification conditions
Proved 0 of 4 total goals
[Crux] Found counterexample for verification goal
[Crux]   results/heaterCelsius/heaterCelsius.c:30:6: error: in step
[Crux]   Mismatch between Copilot value and crucible value
[Crux]   XFloat let -- :1:0
[Crux]       v15 = floatMul RNE (bvToFloat RNE ctemperature_r0@13:bv) 0x0.969697
[Crux]    in floatSub RNE v15 0x32
[Crux]   FloatRepr SingleFloatRepr
[Crux] 
[Crux] 
[Crux] Found counterexample for verification goal
[Crux]   results/heaterCelsius/heaterCelsius.c:33:6: error: in step
[Crux]   Mismatch between Copilot value and crucible value
[Crux]   XFloat let -- :1:0
[Crux]       v15 = floatMul RNE (bvToFloat RNE ctemperature_r0@13:bv) 0x0.969697
[Crux]    in floatSub RNE v15 0x32
[Crux]   FloatRepr SingleFloatRepr
[Crux] 
[Crux] 
[Crux] Found counterexample for verification goal
[Crux]   internal: error: in <>
[Crux]   Trigger guard equality condition: "heaton"
[Crux]   Details:
[Crux]     let -- :1:0
[Crux]         v15 = floatMul RNE (bvToFloat RNE ctemperature_r0@13:bv) 0x0.969697
[Crux]      in not (floatLt (floatSub RNE v15 0x32) 0x12)
[Crux] 
[Crux] Found counterexample for verification goal
[Crux]   internal: error: in <>
[Crux]   Trigger guard equality condition: "heatoff"
[Crux]   Details:
[Crux]     let -- :1:0
[Crux]         v15 = floatMul RNE (bvToFloat RNE ctemperature_r0@13:bv) 0x0.969697
[Crux]      in not (floatLt 0x15 (floatSub RNE v15 0x32))

If we get that working, we can become more ambitious and try to verify examples that use transcendental functions, such as this Trig example. That will first require incorporating GaloisInc/what4#160 into crucible-llvm and overriding the necessary functions in LLVM. I have a WIP branch for this here.

Allow verification of more programs with partial functions

Currently, copilot-verifier is very strict about specifications that contain partial functions. If simulation discovers a model that invokes a partial function on an input for which the function is undefined, it will produce a failing goal, causing verification to fail. In order to avoid this, one must figure out a suitable assumption (using the prop) function that prevents those inputs from being considered. For example, one can use prop "nonzero" (forall (stream /= 0)) to prevent division-by-zero errors for specifications involving the div function, as is outlined in the UnderConstrained example.

The UnderConstrained example currently fails to verify with copilot-verifier by design. There is, however, a way to interpret UnderConstrained as a valid program. Namely, the specification would fail if a stream contained 0:

*** Exception: *** Exception: Copilot error: divide by zero.

The generated C code, will also fail if it encounters a 0, as it will trigger undefined behavior. The behavior of the Copilot specification the C program therefore coincide: on defined inputs, the programs behave identically, and on undefined inputs, the programs misbehave (for some definition of "misbehave").

It would be convenient (perhaps as a user-configurable option) to allow verifying programs up to misbehavior in this way. Some open questions:

Should copilot-verifier check that two programs are "crash-equivalent"? That is, should copilot-verifier check that the Copilot spec and the C program produce the same kind of error, or would it suffice to check that they both fail in some way or another?
We have crucible-llvm to check if the C program misbehaves, but how should we check if the Copilot spec behaves? I obtained the Copilot error: divide by zero error above by interpreting a specification at a particular input, but we would need a way to do this at a Crucible level.

Fix hyperlinks in `copilot-verifier` Hackage page

There is a minor glitch in the way that copilot-verifier.cabal renders its hyperlinks, as seen in this screenshot:

The most likely culprit is that Haddock cannot handle <https://url.com URL caption>-style links that span over multiple lines, which is what copilot-verifier.cabal currently does:

copilot-verifier/copilot-verifier/copilot-verifier.cabal

Lines 12 to 22 in 82751b4

    
           Description: 
        
             @copilot-verifier@ is an add-on to the <https://copilot-language.github.io 
        
             Copilot Stream DSL> for verifying the correctness of C code generated by the 
        
             @copilot-c99@ package. 
        
             . 
        
             @copilot-verifier@ uses the <https://github.com/galoisinc/crucible Crucible 
        
             symbolic simulator> to interpret the semantics of the generated C program and 
        
             and to produce verification conditions sufficient to guarantee that the 
        
             meaning of the generated program corresponds in a precise way to the meaning 
        
             of the original stream specification. The generated verification conditions 
        
             are then dispatched to SMT solvers to be automatically solved.

A more robust alternative would be to use markdown-style links, e.g., [URL caption](https://url.com). These are properly supported even when URL caption spans multiple lines.

Support arrays

We currently have some FIXMEs involving arrays that we should address:

`LLVMParseError` on LLVM 14.0.0

I was trying out some Copilot-Verifier stuff and encountered the following exception when I tried to run the heater example:

Build profile: -w ghc-8.8.4 -O1
In order, the following will be built (use -v for more details):
 - Copilot-Tester-0.0 (exe:Heater) (file Heater.hs changed)
Preprocessing executable 'Heater' for Copilot-Tester-0.0..
Building executable 'Heater' for Copilot-Tester-0.0..
[1 of 1] Compiling Main             ( Heater.hs, [snip]/Copilot-Tester-0.0/x/Heater/build/Heater/Heater-tmp/Main.o )

Heater.hs:18:49: warning: [-Wtype-defaults]
    • Defaulting the following constraints to type ‘Integer’
        (Integral a0)
          arising from a use of ‘fromIntegral’ at Heater.hs:18:49-62
        (Num a0) arising from the literal ‘5’ at Heater.hs:18:62
    • In the second argument of ‘(/)’, namely ‘fromIntegral 5’
      In the expression:
        (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
      In an equation for ‘avgTemp’:
          avgTemp = (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
   |
18 | avgTemp = (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
   |                                                 ^^^^^^^^^^^^^^
Linking [snip]/Copilot-Tester-0.0/x/Heater/build/Heater/Heater ...
[copilot-verifier] Generated "results/heater/heater.c"
[copilot-verifier] Compiled heater into results/heater/crux~heater.bc
Heater: LLVMParseError (Error {errContext = ["METADATA_GLOBAL_VAR","METADATA_BLOCK","METADATA_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "Invalid record size: 13\nExpected one of: [11,12]\nAre you sure you're using a supported version of LLVM/Clang?\nCheck here: https://github.com/GaloisInc/llvm-pretty-bc-parser\n"})

Here's the software versions I'm running:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.8.4

$ cabal --version
cabal-install version 3.0.0.0
compiled using version 3.0.1.0 of the Cabal library 

$ llvm-link --version
Ubuntu LLVM version 14.0.0
  
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: skylake

$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

And here is the source code:

module Main where

{-# LANGUAGE RebindableSyntax #-}

import Language.Copilot
import Copilot.Compile.C99
import Copilot.Verifier

import qualified Prelude as P ()

temp :: Stream Word8
temp = extern "temperature" Nothing

ctemp :: Stream Float
ctemp = (unsafeCast temp) * constant (150.0 / 255.0) - constant 50.0

avgTemp :: Stream Float
avgTemp = (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5

spec :: Spec
spec = do
  trigger "heaton"  (avgTemp < constant 18.0) [arg avgTemp]
  trigger "heatoff" (avgTemp > constant 21.0) [arg avgTemp]

main :: IO ()
main = do
  spec' <- reify spec
  compile "heater" spec'
  verify mkDefaultCSettings [] "heater" spec'

Clang 14 appears to pass randomized tests, according to this repository's README. Downgrading to Clang/LLVM 11.1.0 solves the problem:

Resolving dependencies...
Build profile: -w ghc-8.8.4 -O1
In order, the following will be built (use -v for more details):
 - Copilot-Tester-0.0 (exe:Heater) (first run)
Configuring executable 'Heater' for Copilot-Tester-0.0..
Preprocessing executable 'Heater' for Copilot-Tester-0.0..
Building executable 'Heater' for Copilot-Tester-0.0..
[1 of 1] Compiling Main             ( Heater.hs, [snip]/Copilot-Tester-0.0/x/Heater/build/Heater/Heater-tmp/Main.o )

Heater.hs:18:49: warning: [-Wtype-defaults]
    • Defaulting the following constraints to type ‘Integer’
        (Integral a0)
          arising from a use of ‘fromIntegral’ at Heater.hs:18:49-62
        (Num a0) arising from the literal ‘5’ at Heater.hs:18:62
    • In the second argument of ‘(/)’, namely ‘fromIntegral 5’
      In the expression:
        (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
      In an equation for ‘avgTemp’:
          avgTemp = (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
   |
18 | avgTemp = (sum 5 (replicate 5 19.5 ++ ctemp)) / fromIntegral 5
   |                                                 ^^^^^^^^^^^^^^
Linking [snip]/Copilot-Tester-0.0/x/Heater/build/Heater/Heater ...
[copilot-verifier] Generated "results/heater/heater.c"
[copilot-verifier] Compiled heater into results/heater/crux~heater.bc
[copilot-verifier] Translated bitcode into Crucible
[copilot-verifier] Generating proof state data
[copilot-verifier] Computing initial state verification conditions
[copilot-verifier] Proving initial state verification conditions
[copilot-verifier] Proved 5 of 5 total goals
[copilot-verifier] Computing step bisimulation verification conditions
[copilot-verifier] Proving step bisimulation verification conditions
[copilot-verifier] Proved 44 of 44 total goals

I am running on Linux Mint 21.2, if that helps. Thank you!

Produce explanations of why a verified program is correct

Currently, programs that copilot-verifier marks as correct are indicated by a lack of error messages being printed. This, however, doesn't give a sense to the user as to why a program is correct. We should investigate what it would take to equip copilot-verifier with the ability to produce human-readable explanations of what the verifier did in the course of proving a program is correct.

In very broad strokes, I imagine this would require taking ahold of the ProvedGoals that Crux returns after simulation and analyzing them to determine what explanations to produce. There are likely some finer details to work around, however.

Improve error reporting

The error messages that copilot-verifier gives when unable to verify a program can be rather difficult to understand by non-experts. Here is one example observed in #5:

$ cabal run verify-examples -- WCV
Up to date
=====
== Running the WCV example...
=====

Generated "results/wcv/wcv.c"
Compiled wcv into results/wcv/crux~wcv.bc
Translated bitcode into Crucible
Generating proof state data
Computing initial state verification conditions
Proving initial state verification conditions
All obligations proved by concrete simplification
Computing step bisimulation verification conditions
Proving step bisimulation verification conditions
verify-examples: user error (user error (evalGroundExpr: could not evaluate expression: let uninterpreted_float_lt = ??
    uninterpreted_sbv_to_float = ??
    uninterpreted_float_to_sbv = ??
    -- results/wcv/wcv.c:21:208
    v366 = apply uninterpreted_float_to_sbv 0 crelative_velocity_x_r0@58:bv
    -- results/wcv/wcv.c:21:202
    v373 = ite (bvSlt v366 0x0:[32]) (bvSum (bvMul 0xffffffff:[32] v366)) v366
    -- results/wcv/wcv.c:21:201
    v374 = apply uninterpreted_sbv_to_float 0 v373
 in apply uninterpreted_float_lt v374 0x3f50624dd2f1a9fc:[64])
)

This was fixed by picking a different translation of abs, but it seems quite likely that there are other similarly perplexing error-message scenarios awaiting us. We should:

Identify situations in which copilot-verifier produces dense error messages, and
Improve the reporting of these errors so that they are more comprehensible by users who aren't experts in What4/Crucible.

More complete test coverage

Currently, the copilot-verifier test suite only tests a small handful of examples that are mostly cribbed from the copilot repo itself. There are various code paths that are not yet well tested, including:

Programs that should be expected to fail to verify
Programs which exercise the various operations that copilot-c99 can generate (e.g., floating-point operations)

We should add test cases for these.

Add CI status and Hackage badges to README

Similarly to how the Copilot repo does it here.

Minor soundness issue with trigger functions

While reviewing how the verifier works in preparation for writing the paper, I noticed an issue with how we verify trigger functions. Currently, we set up a crucible global variable for each trigger function; it starts set to false and is set to true when/if the trigger function is called. At the end, we assert that the value of this global variable is equal to the trigger condition.

However, this isn't quite right, as we actually want to ensure that the trigger function is called exactly once when the guard is true, whereas the current setup only verifies that the function is called at least once. We should change the global variables to be integers, increment them in the body of the trigger, and verify at the end that the count is equal to if guard_cond then 1 else 0.

Support structs

We currently have some FIXMEs involving structs that we should address:

Here is a minimal Copilot example involving structs to use as a test case:

{-# LANGUAGE DataKinds #-}
module Main where

import Language.Copilot
import Copilot.Compile.C99
import Copilot.Verifier (verify)

newtype Battery = Battery
  { temp :: Field "temp" Word16
  }

instance Struct Battery where
  typename _ = "battery"
  toValues battery = [Value typeOf (temp battery)]

instance Typed Battery where
  typeOf = Struct (Battery (Field 0))

spec :: Spec
spec = do
  let battery :: Stream Battery
      battery = extern "battery" Nothing

  trigger "testfun" true [arg battery]

main :: IO ()
main = reify spec >>= verify mkDefaultCSettings [] "structs"

Before we can even get to the FIXMEs above, however, there is a more fundamental issue: LLVM compiles structs in somewhat surprising ways. Consider this C program with three different structs:

struct s1 {
  int x1;
  double y1;
};

struct s2 {
  int x2;
  double y2;
  char z2;
};

struct s3 {
  int x3;
  int y3;
};

void f1(struct s1 ss) {}
void f2(struct s2 ss) {}
void f3(struct s3 ss) {}

Although f1, f2, and f3 all look quite similar, they are compiled very differently. Here is the bitcode that results from compiling this program:

%struct.s2 = type { i32, double, i8 }

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local void @f1(i32 %ss.coerce0, double %ss.coerce1) local_unnamed_addr #0 {
entry:
  ret void
}

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local void @f2(%struct.s2* nocapture byval(%struct.s2) align 8 %ss) local_unnamed_addr #0 {
entry:
  ret void
}

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local void @f3(i64 %ss.coerce) local_unnamed_addr #0 {
entry:
  ret void
}

Three somewhat surprising things to note:

The bitcode for f1 takes two arguments rather than a single %struct.s1 argument. In fact, there is no %struct.s1 type at all in the bitcode! Instead, LLVM effectively unpacks the x1 and y2 fields into f1's arguments.
The bitcode for f2 has a %struct.s2* argument—a pointer type—rather than a %struct.s2 argument. That is, it passes its argument by reference, not by value.
The bitcode for f3 takes an i64 as an argument rather than a %struct.s3. Again, there is no %struct.s3 argument at all in the bitcode. Instead, LLVM combines the x3 and y3 fields into a single argument.

These oddities are all explained by the System V ABI (see section 3.2.3 of this document), which has very particular requirements for structs that are passed by value. There are similar oddities that arise when returning structs, such as in these examples:

struct s1 g1() {
  struct s1 ss = { .x1 = 0, .y1 = 0 };
  return ss;
}
struct s2 g2() {
  struct s2 ss = { .x2 = 0, .y2 = 0, .z2 = 0 };
  return ss;
}
struct s3 g3() {
  struct s3 ss = { .x3 = 0, .y3 = 0 };
  return ss;
}

Which give rise to this LLVM bitcode:

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local { i32, double } @g1() local_unnamed_addr #0 {
entry:
  ret { i32, double } zeroinitializer
}

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #1

; Function Attrs: nounwind uwtable
define dso_local void @g2(%struct.s2* noalias nocapture sret %agg.result) local_unnamed_addr #2 {
entry:
  %0 = bitcast %struct.s2* %agg.result to i8*
  tail call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(24) %0, i8 0, i64 24, i1 false)
  ret void
}

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i64 @g3() local_unnamed_addr #0 {
entry:
  ret i64 0
}

g1 and g3's treatment closely mirrors that of f1 and f3. g2 is perhaps the strangest of all, as it takes the struct s2 return type and converts it to an argument!

All of these quirks will make copilot-verifier's job more challenging, as it has to map the Copilot types, which closely correspond to the C source language, to the types that they are compiled to in the LLVM target language. This means that we will either need to:

Implement all of the corner cases of the System V ABI involving structs in copilot-verifier, or
Change copilot-c99's codegen such that it passes and returns structs by reference, not by name, in the functions that copilot-verifier overrides (e.g., the trigger functions).

Option (2) certainly sounds less tricky, although it would require coordination with the upstream copilot repo.

Mention ICFP paper in the README

Mention this somewhere: https://dl.acm.org/doi/abs/10.1145/3607841

Release 3.19 to Hackage

Now that copilot-3.19 and friends have been released, we should sync their versions with what is in copilot-verifier.cabal and release it to Hackage.

Make it clearer which equality-related proof goals happen before and after calling `step()`

The bisimulation step of the verifier logs 2 * n goals, where n is the total amount of storage needed for all ring buffers. This is because the verifier must check that the values are equal before calling step() as well as after calling step(). The verifier's Noisy logging doesn't make this obvious, however, as all 2 * n proof goals are simply tagged with asserting the equality between two stream values. We should augment this description to make it more obvious which goals come before or after the step() function.

	do -- TODO, should get these from somewhere inside copilot instead of building these names directly
	let idxName = "s" ++ show nm ++ "_idx"
	let bufName = "s" ++ show nm
	let buflen = genericLength vs :: Integer

	-- TODO, would be lovely to be able to do better than dummy positions for all these things
	-- so we can correspond assumptions and asserts back to the parts of the original spec that
	-- gave rise to them.
	dummyLoc = mkProgramLoc "<>" InternalPos

	Description:
	@copilot-verifier@ is an add-on to the <https://copilot-language.github.io
	Copilot Stream DSL> for verifying the correctness of C code generated by the
	@copilot-c99@ package.
	.
	@copilot-verifier@ uses the <https://github.com/galoisinc/crucible Crucible
	symbolic simulator> to interpret the semantics of the generated C program and
	and to produce verification conditions sufficient to guarantee that the
	meaning of the generated program corresponds in a precise way to the meaning
	of the original stream specification. The generated verification conditions
	are then dispatched to SMT solvers to be automatically solved.