<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://juliahimmel.de/feed.xml" rel="self" type="application/atom+xml" /><link href="https://juliahimmel.de/" rel="alternate" type="text/html" /><updated>2026-04-03T19:30:25+00:00</updated><id>https://juliahimmel.de/feed.xml</id><title type="html">Julia Markus Himmel</title><subtitle>My name is Julia Markus Himmel and this is my blog where I will be talking mostly about formal verification, in particular in the programming language Lean.</subtitle><author><name>Julia Markus Himmel</name></author><entry><title type="html">My first verified (imperative) program</title><link href="https://juliahimmel.de/blog/my-first-verified-imperative-program/" rel="alternate" type="text/html" title="My first verified (imperative) program" /><published>2025-07-06T11:00:00+00:00</published><updated>2025-07-06T11:00:00+00:00</updated><id>https://juliahimmel.de/blog/my-first-verified-imperative-program</id><content type="html" xml:base="https://juliahimmel.de/blog/my-first-verified-imperative-program/"><![CDATA[<p><strong>Important note:</strong> This post is out of date! It describes an old, early
version of the <code class="language-plaintext highlighter-rouge">mvcgen</code> tactic. As of Lean 4.25.0, the syntax has changed a bit
(for the better) and the system has become much more convenient to use. To
learn more about <code class="language-plaintext highlighter-rouge">mvcgen</code> as it is released today, I recommend the
<a href="https://lean-lang.org/doc/reference/latest/The--mvcgen--tactic/#mvcgen-tactic">official introduction</a>
that is part of the Lean reference manual. My reasoning why I find all of this
very cool hasn’t changed, of course, so if you’re interested in that, read on.</p>

<p>One of the many exciting new features in the upcoming Lean 4.22 release is a
preview of the new verification infrastructure for proving properties of imperative
programs. In this post, I’ll take a first look at this feature, show a simple
example of what it can do, and compare it to similar tools.</p>

<h2 id="guiding-example">Guiding example</h2>

<p>We will use the following simple programming task as an example throughout the
post: given a list of integers, determine if there are two integers at distinct
positions in the list that sum to zero.</p>

<p>For example, given the list <code class="language-plaintext highlighter-rouge">[1, 0, 2, -1]</code>, the result should be <code class="language-plaintext highlighter-rouge">true</code>, because
\(1 + (-1) = 0\), and given the list <code class="language-plaintext highlighter-rouge">[0, 0]</code>, the result should also be <code class="language-plaintext highlighter-rouge">true</code>,
but given the list <code class="language-plaintext highlighter-rouge">[1, 0, -2]</code>, the result should be <code class="language-plaintext highlighter-rouge">false</code>.</p>

<p>The simplest way to solve this is to use two nested loops to iterate over all
pairs of distinct positions. This takes quadratic time, which is inefficient. There are
several ways to improve upon this. Here is the one we will use: iterate over the
list, and keep all elements we have seen so far in a set data structure. When
encountering a number \(x\), efficiently check if we have seen \(-x\) before. If
so, the answer is positive. If we reach the end of the list, the answer is negative.
This takes expected time \(O(n)\) when using a hash set, or worst-case time \(O(n \log n)\)
when using a binary search tree. In Lean, both are available, under the names
<code class="language-plaintext highlighter-rouge">Std.HashSet</code> and <code class="language-plaintext highlighter-rouge">Std.TreeSet</code>, respectively.</p>

<h2 id="local-imperativity">Local imperativity</h2>

<p>Lean is a functional programming language, but it has good support for imperative
(stateful) programming both locally within a single function (via <code class="language-plaintext highlighter-rouge">do</code> notation)
and across functions (via monad transformers). In this post, we will use local
imperativity only.</p>

<p>Using local imperativity, it is easy to write down the set-based algorithm described
above:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">pairsSumToZero</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) : <span class="n">Id</span> <span class="n">Bool</span> := <span class="n">do</span>
  <span class="n">let</span> <span class="n">mut</span> <span class="n">seen</span> : <span class="n">HashSet</span> <span class="n">Int</span> := <span class="err">∅</span>

  <span class="n">for</span> <span class="n">x</span> <span class="n">in</span> <span class="n">l</span> <span class="n">do</span>
    <span class="n">if</span> <span class="o">-</span><span class="n">x</span> <span class="err">∈</span> <span class="n">seen</span> <span class="n">then</span>
      <span class="n">return</span> <span class="n">true</span>
    <span class="n">seen</span> := <span class="n">seen</span><span class="o">.</span><span class="n">insert</span> <span class="n">x</span>

  <span class="n">return</span> <span class="n">false</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">Id</code> and <code class="language-plaintext highlighter-rouge">do</code> in the first line of the code tell Lean that we would like to
work in “locally imperative” mode. Then we have access to a Python-like syntax
with the usual affordances of imperative programming, such as mutable state,
<code class="language-plaintext highlighter-rouge">for</code> loops and early returns<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="proving-properties-of-locally-imperative-programs">Proving properties of locally imperative programs</h2>

<p>Local imperativity is very useful when writing programs, and indeed much of Lean
itself is implemented in Lean using this style. However, Lean is not just a
programming language, but also an interactive theorem prover, and one of the
core features of Lean is that you can prove that your programs are correct.</p>

<p>Traditionally, proving properties about locally imperative programs was difficult
except in very simple cases, so if you were interested in proofs, it was usually
easiest to write your programs in a functional style, similarly to how you would
do it in a language like Haskell.</p>

<p>Lean 4.22 previews a new framework, called <code class="language-plaintext highlighter-rouge">Std.Do</code> after the place where it
lives in the Lean standard library, that aims to make verified imperative programming
(both local and global) easy.</p>

<p>The main thing that is still missing is documentation (and this post will not
change that in any meaningful way), but with a bit of digging we can already do
some initial experiments.</p>

<p>The foundation of <code class="language-plaintext highlighter-rouge">Std.Do</code> is given by the classic idea of <em>Hoare triples</em>. This
means that assertions about imperative programs are always of the form
“if \(P\) is true, and I run the command \(C\), then \(Q\) is true”. For example,
if a given variable is at least \(1\), and I decrement it, then the variable
will be at least \(0\).</p>

<p>The nice thing about Hoare triples is that they are composable. A large program
will be composed of many small functions that might operate on global state or
have other side effects, and Hoare triples allow stating properties that can
easily be reused when proving properties of larger programs using smaller programs.
Since our example only consists of a single function, this part isn’t important
for our example, but it hints at the generality of <code class="language-plaintext highlighter-rouge">Std.Do</code> which I might explore
in a future post.</p>

<p>As Lean is an interactive system, the walkthrough that follows is easiest to follow
by having Lean open. Click <a href="https://live.lean-lang.org/#url=https%3A%2F%2Fgist.githubusercontent.com%2FTwoFX%2Fbcdb6202fa8d8024b6a766a4d9df3f30%2Fraw%2Fe9263dddd43e868985614f689456a0adea50a3ee%2Fimperative.lean">here</a>
to open the online Lean playground pre-filled
with the proof. You can place your cursor inside the various places in the proof
to see what Lean has to say at that point.</p>

<p>The Lean syntax for Hoare triples is <code class="language-plaintext highlighter-rouge">⦃Precondition⦄ Command ⦃Postcondition⦄</code>. Using
this, let’s state the correctness property of our <code class="language-plaintext highlighter-rouge">pairsSumToZero</code> function:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">theorem</span> <span class="n">pairsSumToZero_spec</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) :
    <span class="err">⦃⌜</span><span class="n">True</span><span class="err">⌝⦄</span> <span class="n">pairsSumToZero</span> <span class="n">l</span> <span class="err">⦃⇓</span><span class="n">r</span> <span class="o">=&gt;</span> <span class="n">r</span> <span class="o">=</span> <span class="n">true</span> <span class="o">↔</span> <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>)<span class="err">⦄</span>
</code></pre></div></div>

<p>In our case, there are no preconditions, so we use the always-true proposition <code class="language-plaintext highlighter-rouge">True</code>
as the precondition. The postcondition reads as “the function returns <code class="language-plaintext highlighter-rouge">true</code> if
and only if there is a pair of distinct positions in <code class="language-plaintext highlighter-rouge">l</code> such that the corresponding
values sum to \(0\)”.</p>

<p>Now, Lean is an interactive theorem prover, so it expects us to tell it why this
Hoare triple is in fact true. To do this, <code class="language-plaintext highlighter-rouge">Std.Do</code> provides a piece of proof automation
called <code class="language-plaintext highlighter-rouge">mvcgen</code> (for “monadic verification condition generator”) which analyzes locally
imperative programs and tells us what we need to do to prove the triple. After
starting the proof of <code class="language-plaintext highlighter-rouge">pairsSumToZero_spec</code>, we can invoke</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mvcgen</span> [<span class="n">pairsSumToZero</span>] <span class="cd">--  generate verification conditions for the imperative code.</span>
</code></pre></div></div>

<p>Lean then tells us that as a next step it wants us to provide a <em>loop invariant</em> for
the <code class="language-plaintext highlighter-rouge">for</code> loop in our code. This is a property that is true at the beginning of the
loop and is preserved by each loop iteration. Loop invariants are how we can deduce that something
is true after we exit the loop.</p>

<p>In our case, the control flow is slightly more complicated than the trivial examples
that you usually see for loop invariants, because our loop has an early return which we have to consider.
Here is the correct loop invariant:</p>

<blockquote>
  <p><em>Either</em> we have not taken the early return yet, and <code class="language-plaintext highlighter-rouge">seen</code> contains exactly those
elements which are present in the prefix of the list we have traversed so far,
and the prefix of the list we have traversed so far does not contain two elements
that sum to zero,</p>

  <p><em>or</em> we have taken the early return, and the list contains two elements which sum to zero.</p>
</blockquote>

<p>Translating this to Lean in the form that <code class="language-plaintext highlighter-rouge">Std.Do</code> expects is a bit difficult without
documentation, but here is how it looks when done correctly:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">case</span> <span class="n">inv</span> <span class="o">=&gt;</span>
  <span class="n">exact</span> (<span class="k">fun</span> (<span class="o">⟨</span><span class="n">earlyReturn</span><span class="err">?</span>, <span class="n">seen</span><span class="o">⟩</span>, <span class="n">traversalState</span>) <span class="o">=&gt;</span>
    (<span class="n">earlyReturn</span><span class="err">?</span> <span class="o">=</span> <span class="n">none</span> <span class="o">∧</span> (<span class="o">∀</span> <span class="n">x</span>, <span class="n">x</span> <span class="err">∈</span> <span class="n">seen</span> <span class="o">↔</span> <span class="n">x</span> <span class="err">∈</span> <span class="n">traversalState</span><span class="o">.</span><span class="n">rpref</span>) <span class="o">∧</span> <span class="o">¬</span><span class="n">traversalState</span><span class="o">.</span><span class="n">pref</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>)) <span class="o">∨</span>
    (<span class="n">earlyReturn</span><span class="err">?</span> <span class="o">=</span> <span class="n">some</span> <span class="n">true</span> <span class="o">∧</span> <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>) <span class="o">∧</span> <span class="n">l</span> <span class="o">=</span> <span class="n">traversalState</span><span class="o">.</span><span class="n">pref</span>), ())
</code></pre></div></div>

<p>Here <code class="language-plaintext highlighter-rouge">earlyReturn?</code> is an optional value containing the value we returned early
if we returned early, <code class="language-plaintext highlighter-rouge">seen</code> is the <code class="language-plaintext highlighter-rouge">seen</code> from our program,
and <code class="language-plaintext highlighter-rouge">traversalState</code> contains information about where we are in the list. In
particular, <code class="language-plaintext highlighter-rouge">traversalState.pref</code> contains the prefix of the list that we have
already traversed, and <code class="language-plaintext highlighter-rouge">traversalState.rpref</code> is the reverse of that (which is
sometimes easier to reason about).</p>

<p>The third line is a fairly literal translation of the first case described in
prose above, and the fourth line is a translation of the second case, with the
slightly technical condition <code class="language-plaintext highlighter-rouge">l = traversalState.pref</code> thrown in, which asserts
that if we have taken the early return, we will not traverse the list any
further.</p>

<p>Now that we have provided the loop invariant, Lean tells us that we must prove
five things:</p>

<ul>
  <li>If the loop invariant holds and we take the early return, the loop invariant still holds;</li>
  <li>if the loop invariant holds and we do not take the early return, the loop invariant still holds;</li>
  <li>the loop invariant is satisfied before we enter the loop;</li>
  <li>if we took the early return, the loop invariant implies the claimed property; and finally</li>
  <li>if we did not take the early return, the loop invariant implies the claimed property.</li>
</ul>

<p>Now, to an experienced Lean user, proving these five things is not difficult, but
it is a bit tedious, because all of these are pretty obvious. Luckily, this is
where another big new feature from Lean 4.22 enters the picture: the <code class="language-plaintext highlighter-rouge">grind</code> tactic.
This is a new bit of proof automation which is able to make short work of many
“obvious” proofs like ours<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. This means that to dispatch the five proof obligations
above, it suffices to say</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">all_goals</span> <span class="n">simp_all</span> <span class="o">&lt;;&gt;</span> <span class="n">grind</span>
</code></pre></div></div>

<p>and Lean tells us <code class="language-plaintext highlighter-rouge">Goals accomplished!</code> to confirm that the proof is complete.
Behind the scenes, Lean has performed a detailed analysis of all cases, referring
to existing library results (for example that after inserting a new element into
a hash set, an element is contained if and only if it is equal to the new element
or was already contained in the original hash set) as appropriate.</p>

<p>For reference, here is the full program with the full proof:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">/-!</span>
<span class="o">#</span> <span class="n">Imperative</span> <span class="n">implementation</span>
<span class="o">-/</span>

<span class="k">def</span> <span class="n">pairsSumToZero</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) : <span class="n">Id</span> <span class="n">Bool</span> := <span class="n">do</span>
  <span class="n">let</span> <span class="n">mut</span> <span class="n">seen</span> : <span class="n">HashSet</span> <span class="n">Int</span> := <span class="err">∅</span>

  <span class="n">for</span> <span class="n">x</span> <span class="n">in</span> <span class="n">l</span> <span class="n">do</span>
    <span class="n">if</span> <span class="o">-</span><span class="n">x</span> <span class="err">∈</span> <span class="n">seen</span> <span class="n">then</span>
      <span class="n">return</span> <span class="n">true</span>
    <span class="n">seen</span> := <span class="n">seen</span><span class="o">.</span><span class="n">insert</span> <span class="n">x</span>

  <span class="n">return</span> <span class="n">false</span>

<span class="o">/-!</span>
<span class="o">#</span> <span class="n">Verification</span> <span class="n">of</span> <span class="n">imperative</span> <span class="n">implementation</span>
<span class="o">-/</span>

<span class="k">theorem</span> <span class="n">pairsSumToZero_spec</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) :
    <span class="err">⦃⌜</span><span class="n">True</span><span class="err">⌝⦄</span> <span class="n">pairsSumToZero</span> <span class="n">l</span> <span class="err">⦃⇓</span><span class="n">r</span> <span class="o">=&gt;</span> <span class="n">r</span> <span class="o">=</span> <span class="n">true</span> <span class="o">↔</span> <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>)<span class="err">⦄</span> := <span class="k">by</span>
  <span class="n">mvcgen</span> [<span class="n">pairsSumToZero</span>]

  <span class="n">case</span> <span class="n">inv</span> <span class="o">=&gt;</span>
    <span class="n">exact</span> (<span class="k">fun</span> (<span class="o">⟨</span><span class="n">earlyReturn</span><span class="err">?</span>, <span class="n">seen</span><span class="o">⟩</span>, <span class="n">traversalState</span>) <span class="o">=&gt;</span>
      (<span class="n">earlyReturn</span><span class="err">?</span> <span class="o">=</span> <span class="n">none</span> <span class="o">∧</span> (<span class="o">∀</span> <span class="n">x</span>, <span class="n">x</span> <span class="err">∈</span> <span class="n">seen</span> <span class="o">↔</span> <span class="n">x</span> <span class="err">∈</span> <span class="n">traversalState</span><span class="o">.</span><span class="n">rpref</span>) <span class="o">∧</span> <span class="o">¬</span><span class="n">traversalState</span><span class="o">.</span><span class="n">pref</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>)) <span class="o">∨</span>
      (<span class="n">earlyReturn</span><span class="err">?</span> <span class="o">=</span> <span class="n">some</span> <span class="n">true</span> <span class="o">∧</span> <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>) <span class="o">∧</span> <span class="n">l</span> <span class="o">=</span> <span class="n">traversalState</span><span class="o">.</span><span class="n">pref</span>), ())

  <span class="n">all_goals</span> <span class="n">simp_all</span> <span class="o">&lt;;&gt;</span> <span class="n">grind</span>
</code></pre></div></div>

<h2 id="why-this-excites-me">Why this excites me</h2>

<p>I will explain why I was very happy when I saw this working for the first time.</p>

<p>Verified imperative programming in this style is not new. Technologies like Dafny
and Verus have been doing this for a long time. However, there are some key differences
between Dafny-style systems and Lean.</p>

<p>Dafny and Verus are primarily automated systems. They allow you to state properties
at various places in your programs. To make sure that these properties hold, the
system then encodes the properties into so-called verification conditions which they then
send to an external <em>automated</em> prover called an SMT solver. The SMT solver is very
good at proving these properties fully autonomously. This means that if everything
works out, you never have to worry about proofs, which is great! There are, however,
some significant downsides which all center around what happens when you leave that
happy path where everything works.</p>

<p>SMT solvers are very impressive, but they have their limits. For complex problems,
they can take a long time, so compile/checking times can become an issue. If they
time out or fail, it can be difficult to recover. Should I tweak my invariants in
the hope that the solver can do it? Is the problem just too hard for the solver?
Dafny and Verus allow you to add “lemmas”, which are again proved by the SMT solver
and that you can then feed to the solver to reuse, but it’s not always easy to tell
which lemma is
missing and the systems are not really designed for building up large libraries of lemmas
and proofs. In the worst case, this leads to a situation where you have built up
a medium-sized project when you run into a limitation that you just cannot overcome
with no way to recover, and no good way to introspect the solver to see what can
be done to make progress.</p>

<p>To make matters worse, the behaviour of SMT solvers can change subtly between versions, making it possible 
that proofs break for no real reason during version upgrades. In addition, SMT solvers
are large and complex software projects, and you’re trusting that they’re free of bugs
for your proofs to be correct.</p>

<p>Lean occupies a very different point in the design space: at its core, it is an
interactive system, where the user builds up the proof interactively. All of
Lean’s tooling is
built around making the manual proving process as ergonomic as possible. Lean
has excellent editor support for interactive proving. It ships with a large library
of reusable concepts and lemmas, and it comes with powerful automation to make proofs
easy to write.</p>

<p>In our case, <code class="language-plaintext highlighter-rouge">grind</code> takes a role that is similar to the SMT solver in automated
systems. The difference is that when <code class="language-plaintext highlighter-rouge">grind</code> fails at some point, you can just do
the proof manually, and Lean is <em>really good</em> at making this easy.</p>

<p>From a trust perspective, Lean also has a good story: Lean is built to have a small
kernel, which is the only part that is relevant to whether Lean accepts a proof.
All of the automation that makes proving in Lean easy (including <code class="language-plaintext highlighter-rouge">grind</code>) generates
so-called proof terms that are fed to the small kernel. This means that while a bug in an
SMT solver might lead to Dafny accepting an incorrect program, a bug in <code class="language-plaintext highlighter-rouge">grind</code>
will at worst lead to Lean rejecting a correct program, which is much less bad.</p>

<p>Finally, the fact that Lean is focused on building theories means that large
libraries of proofs like <a href="https://github.com/leanprover-community/mathlib4">mathlib</a>
are available for use in correctness proofs of programs, for example when verifying
cryptographic algorithms. This also means that Lean requires very little runtime
support for basic data types; unlike in Dafny, where the <code class="language-plaintext highlighter-rouge">set</code> type is built
into the system (implemented in C#, not Dafny) and its properties are essentially
taken as axioms, Lean’s <code class="language-plaintext highlighter-rouge">Std.HashSet</code> and <code class="language-plaintext highlighter-rouge">Std.TreeSet</code> are
<a href="https://github.com/leanprover/lean4/tree/master/src/Std/Data">fully implemented and verified in Lean</a>.</p>

<p>For all of these reasons, I believe that Lean is in a very good position
to be a system that developers can rely on and trust for real-world program verification
tasks.</p>

<h2 id="bonus-verified-functional-programming">Bonus: verified functional programming</h2>

<p>As a quick addendum, I will note that the functional implementation of <code class="language-plaintext highlighter-rouge">pairsSumToZero</code>
is also very easy to verify using <code class="language-plaintext highlighter-rouge">grind</code>. Here is the implementation:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">pairsSumToZero</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) : <span class="n">Bool</span> :=
  <span class="n">go</span> <span class="n">l</span> <span class="err">∅</span>
<span class="n">where</span>
  <span class="n">go</span> (<span class="n">m</span> : <span class="n">List</span> <span class="n">Int</span>) (<span class="n">seen</span> : <span class="n">HashSet</span> <span class="n">Int</span>) : <span class="n">Bool</span> :=
    <span class="k">match</span> <span class="n">m</span> <span class="k">with</span>
    <span class="o">|</span> [] <span class="o">=&gt;</span> <span class="n">false</span>
    <span class="o">|</span> <span class="n">x</span>::<span class="n">xs</span> <span class="o">=&gt;</span> <span class="n">if</span> <span class="o">-</span><span class="n">x</span> <span class="err">∈</span> <span class="n">seen</span> <span class="n">then</span> <span class="n">true</span> <span class="n">else</span> <span class="n">go</span> <span class="n">xs</span> (<span class="n">seen</span><span class="o">.</span><span class="n">insert</span> <span class="n">x</span>)
</code></pre></div></div>

<p>Instead of the <code class="language-plaintext highlighter-rouge">for</code> loop, we have the tail-recursive helper function <code class="language-plaintext highlighter-rouge">go</code> which
takes the state as a parameter. Consequently, instead of writing down a loop
invariant, we give a correctness proof for the <code class="language-plaintext highlighter-rouge">go</code> function, which basically boils
down to the same thing:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">theorem</span> <span class="n">pairsSumToZero_go_iff</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) (<span class="n">seen</span> : <span class="n">HashSet</span> <span class="n">Int</span>) :
    <span class="n">pairsSumToZero</span><span class="o">.</span><span class="n">go</span> <span class="n">l</span> <span class="n">seen</span> <span class="o">=</span> <span class="n">true</span> <span class="o">↔</span>
      <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>) <span class="o">∨</span> <span class="o">∃</span> <span class="n">a</span> <span class="err">∈</span> <span class="n">seen</span>, <span class="o">∃</span> <span class="n">b</span> <span class="err">∈</span> <span class="n">l</span>, <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span> := <span class="k">by</span>
  <span class="n">fun_induction</span> <span class="n">pairsSumToZero</span><span class="o">.</span><span class="n">go</span> <span class="o">&lt;;&gt;</span> <span class="n">simp_all</span> <span class="o">&lt;;&gt;</span> <span class="n">grind</span>
</code></pre></div></div>

<p>The correctness statement is that <code class="language-plaintext highlighter-rouge">go</code>, when called with the state <code class="language-plaintext highlighter-rouge">seen</code> and
the yet-to-traverse suffix, returns true if and only if the suffix contains a
pair that sums to zero, or there is one element in <code class="language-plaintext highlighter-rouge">seen</code> and one in the suffix
that together sum to zero.
In the proof, instead of <code class="language-plaintext highlighter-rouge">mvcgen</code> for locally imperative programs, we rely on
<code class="language-plaintext highlighter-rouge">fun_induction</code> for the case analysis, and as before, <code class="language-plaintext highlighter-rouge">grind</code> does all of the proving work.</p>

<p>The correctness of <code class="language-plaintext highlighter-rouge">pairsSumToZero</code> is then an easy consequence:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">theorem</span> <span class="n">pairsSumToZero_iff</span> (<span class="n">l</span> : <span class="n">List</span> <span class="n">Int</span>) :
    <span class="n">pairsSumToZero</span> <span class="n">l</span> <span class="o">=</span> <span class="n">true</span> <span class="o">↔</span> <span class="n">l</span><span class="o">.</span><span class="n">ExistsPair</span> (<span class="k">fun</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>) := <span class="k">by</span>
  <span class="n">simp</span> [<span class="n">pairsSumToZero</span>, <span class="n">pairsSumToZero_go_iff</span>]
</code></pre></div></div>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>If you would like to dig deep into how imperative programming inside a functional language works behind the scenes, there is <a href="https://dl.acm.org/doi/pdf/10.1145/3547640">a paper</a> that describes the main ideas. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>I won’t explain exactly what <code class="language-plaintext highlighter-rouge">grind</code> is or how it works here, but there is a comprehensive <a href="https://lean-lang.org/doc/reference/latest/The--grind--tactic/#grind">reference manual entry</a> that should answer most of your questions. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Julia Markus Himmel</name></author><category term="blog" /><category term="Lean" /><category term="human-eval-lean" /><summary type="html"><![CDATA[Important note: This post is out of date! It describes an old, early version of the mvcgen tactic. As of Lean 4.25.0, the syntax has changed a bit (for the better) and the system has become much more convenient to use. To learn more about mvcgen as it is released today, I recommend the official introduction that is part of the Lean reference manual. My reasoning why I find all of this very cool hasn’t changed, of course, so if you’re interested in that, read on.]]></summary></entry><entry><title type="html">Freyd-Mitchell and Gabriel-Popescu</title><link href="https://juliahimmel.de/blog/freyd-mitchell/" rel="alternate" type="text/html" title="Freyd-Mitchell and Gabriel-Popescu" /><published>2025-06-21T05:00:00+00:00</published><updated>2025-06-21T05:00:00+00:00</updated><id>https://juliahimmel.de/blog/freyd-mitchell</id><content type="html" xml:base="https://juliahimmel.de/blog/freyd-mitchell/"><![CDATA[<p>Earlier this year, Jakob von Raumer, Paul Reichert and I finished a long project
adding the Freyd-Mitchell and Gabriel-Popescu theorems to mathlib, Lean’s
mathematical library. This week, a blog post about the project went live on
the Lean community blog. <a href="https://leanprover-community.github.io/blog/posts/abelian-categories/">Click here to read it!</a></p>]]></content><author><name>Julia Markus Himmel</name></author><category term="blog" /><category term="Lean" /><category term="mathlib" /><summary type="html"><![CDATA[Earlier this year, Jakob von Raumer, Paul Reichert and I finished a long project adding the Freyd-Mitchell and Gabriel-Popescu theorems to mathlib, Lean’s mathematical library. This week, a blog post about the project went live on the Lean community blog. Click here to read it!]]></summary></entry><entry><title type="html">Lean has iterators now</title><link href="https://juliahimmel.de/blog/iterators/" rel="alternate" type="text/html" title="Lean has iterators now" /><published>2025-06-12T05:00:00+00:00</published><updated>2025-06-12T05:00:00+00:00</updated><id>https://juliahimmel.de/blog/iterators</id><content type="html" xml:base="https://juliahimmel.de/blog/iterators/"><![CDATA[<p>Lean 4.22 (to be released in early August) will ship with the first version of
the new iterators library, allowing for efficient streaming, combining and
collecting of data. This library was designed and implemented by
<a href="https://github.com/datokrat">Paul</a>, and he has put a lot of thought into making
sure that iterators are compiled into efficient code (like in Rust), and he has
worked even harder to ensure that it’s easy to prove things about iterators
(even when iterating monadically).</p>

<p>To celebrate this occasion, let’s do one of the <code class="language-plaintext highlighter-rouge">human-eval-lean</code> tasks using
the new iterators (see <a href="/blog/the-largest-divisor/">my previous post</a>
for context about <code class="language-plaintext highlighter-rouge">human-eval-lean</code>).</p>

<p>HumanEval problem 11 asks us to take two lists of booleans and combine them into
one list using the xor operation at each position. The is easily done using
<code class="language-plaintext highlighter-rouge">List.zip</code> followed by <code class="language-plaintext highlighter-rouge">List.map</code>, but that would allocate a list of pairs as
an intermediate result, which is inefficient. We could write a recursive function,
but that sounds like a lot of work. <code class="language-plaintext highlighter-rouge">Iter.zip</code> to the rescue!</p>

<p>What follows is the entire code, including proofs.</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="n">Std</span><span class="o">.</span><span class="n">Data</span><span class="o">.</span><span class="n">Iterators</span>

<span class="k">def</span> <span class="n">stringXor</span> (<span class="n">a</span> <span class="n">b</span> : <span class="n">List</span> <span class="n">Bool</span>) : <span class="n">List</span> <span class="n">Bool</span> :=
  ((<span class="n">a</span><span class="o">.</span><span class="n">iter</span>)<span class="o">.</span><span class="n">zip</span> <span class="n">b</span><span class="o">.</span><span class="n">iter</span>)
    <span class="o">|&gt;.</span><span class="n">map</span> (<span class="k">fun</span> <span class="n">p</span> <span class="o">=&gt;</span> <span class="n">Bool</span><span class="o">.</span><span class="n">xor</span> <span class="n">p</span><span class="o">.1</span> <span class="n">p</span><span class="o">.2</span>)
    <span class="o">|&gt;.</span><span class="n">toList</span>

<span class="o">@</span>[<span class="n">simp</span>, <span class="n">grind</span>]
<span class="k">theorem</span> <span class="n">length_stringXor</span> <span class="err">{</span><span class="n">a</span> <span class="n">b</span> : <span class="n">List</span> <span class="n">Bool</span><span class="err">}</span> : (<span class="n">stringXor</span> <span class="n">a</span> <span class="n">b</span>)<span class="o">.</span><span class="n">length</span> <span class="o">=</span> <span class="n">min</span> <span class="n">a</span><span class="o">.</span><span class="n">length</span> <span class="n">b</span><span class="o">.</span><span class="n">length</span> := <span class="k">by</span>
  <span class="n">simp</span> [<span class="n">stringXor</span>]

<span class="k">theorem</span> <span class="n">getElem_stringXor</span> <span class="err">{</span><span class="n">a</span> <span class="n">b</span> : <span class="n">List</span> <span class="n">Bool</span><span class="err">}</span> <span class="err">{</span><span class="n">i</span> : <span class="n">Nat</span><span class="err">}</span> <span class="err">{</span><span class="n">hia</span> : <span class="n">i</span> <span class="o">&lt;</span> <span class="n">a</span><span class="o">.</span><span class="n">length</span><span class="err">}</span> <span class="err">{</span><span class="n">hib</span> : <span class="n">i</span> <span class="o">&lt;</span> <span class="n">b</span><span class="o">.</span><span class="n">length</span><span class="err">}</span> :
    (<span class="n">stringXor</span> <span class="n">a</span> <span class="n">b</span>)[<span class="n">i</span>]<span class="err">'</span>(<span class="k">by</span> <span class="n">grind</span>) <span class="o">=</span> <span class="n">Bool</span><span class="o">.</span><span class="n">xor</span> <span class="n">a</span>[<span class="n">i</span>] <span class="n">b</span>[<span class="n">i</span>] := <span class="k">by</span>
  <span class="n">simp</span> [<span class="n">stringXor</span>]
</code></pre></div></div>

<p>Three simple lines of stream processing, and the proofs are trivial one-liners
using what’s available in the standard library. Superfast code and easy proofs -
this is how I want my Lean. I love it!</p>]]></content><author><name>Julia Markus Himmel</name></author><category term="blog" /><category term="Lean" /><category term="human-eval-lean" /><summary type="html"><![CDATA[Lean 4.22 (to be released in early August) will ship with the first version of the new iterators library, allowing for efficient streaming, combining and collecting of data. This library was designed and implemented by Paul, and he has put a lot of thought into making sure that iterators are compiled into efficient code (like in Rust), and he has worked even harder to ensure that it’s easy to prove things about iterators (even when iterating monadically).]]></summary></entry><entry><title type="html">The largest divisor</title><link href="https://juliahimmel.de/blog/the-largest-divisor/" rel="alternate" type="text/html" title="The largest divisor" /><published>2025-06-09T18:00:00+00:00</published><updated>2025-06-09T18:00:00+00:00</updated><id>https://juliahimmel.de/blog/the-largest-divisor</id><content type="html" xml:base="https://juliahimmel.de/blog/the-largest-divisor/"><![CDATA[<p>A few weeks ago I started the <a href="https://github.com/TwoFX/human-eval-lean"><code class="language-plaintext highlighter-rouge">human-eval-lean</code></a>
project, an effort to collect hand-written solutions to the famous HumanEval
(AI) programming benchmark, written in the programming language <a href="https://lean-lang.org/">Lean</a>.
The twist is that Lean is not just a programming language, but also a proof assistant,
and all solutions in <code class="language-plaintext highlighter-rouge">human-eval-lean</code> come with machine-checked formal proofs that the code is
correct.</p>

<p>The idea is shamelessly copied from <a href="https://github.com/secure-foundations/human-eval-verus"><code class="language-plaintext highlighter-rouge">human-eval-verus</code></a>,
which does the same thing in <a href="https://github.com/verus-lang/verus">Verus</a>, a verification
platform for Rust programs. The fact that both <code class="language-plaintext highlighter-rouge">human-eval-lean</code> and <code class="language-plaintext highlighter-rouge">human-eval-verus</code>
exists is great because it makes it possible to directly compare the two systems and more
generally the verification paradigms they represent (interactive theorem proving for Lean
and Dafny-style SMT-solver-backed verification for Verus).</p>

<p>The <code class="language-plaintext highlighter-rouge">human-eval-verus</code> contributors have already solved dozens of the 160-or-so HumanEval
problems, which is very impressive. The Lean effort has only just started and we have only
done a handful of the problems so far, but there have been some nice contributions from
Marcus Rossel and Johannes Tantow, including a solution for <a href="https://github.com/TwoFX/human-eval-lean/blob/master/HumanEvalLean/HumanEval109.lean">task 109</a>,
which is notable because it has not been completed in Verus yet.</p>

<p>For the rest of this post, I’d like to quickly discuss my solution to HumanEval task 24, which
asks to write a function that takes as input an integer \(n\) and returns the largest proper
divisor of \(n\).</p>

<p>The HumanEval problems come with model solutions. Often, these are not very good, as is
the case here. The model solution just loops downwards from \(n - 1\) and returns the
first divisor that it finds. This is wasteful: the first number that 
has a chance of being a proper divisor of \(n\) is on the order of \(n/2\), so this
implementation will, on every input without exception, do useless work for about \(n/2\)
steps before it starts checking numbers that even have a change of being a divisor.</p>

<p>Here is a better approach: loop upwards from \(2\), looking for the <em>smallest</em> divisor
\(d\), and return \(n/d\). If no divisor is found after looking for candidates until
\(\sqrt{n}\), return \(1\).</p>

<p>This works because whenever \(d\) is a divisor of \(n\), so is \(n/d\), and this correspondence
reverses the order of the divisors. It shows that the largest divisor of \(n\) can be
found in time proportional to the square root of \(n\) rather than time proportional to
\(n\), which is much better even on fairly small inputs.</p>

<p>Here is an implementation of this approach in Lean:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="n">largestDivisor</span> (<span class="n">n</span> : <span class="n">Nat</span>) : <span class="n">Nat</span> :=
  <span class="n">go</span> <span class="mi">2</span>
<span class="n">where</span>
  <span class="n">go</span> (<span class="n">i</span> : <span class="n">Nat</span>) : <span class="n">Nat</span> :=
    <span class="n">if</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="n">i</span> <span class="o">*</span> <span class="n">i</span> <span class="n">then</span>
      <span class="mi">1</span>
    <span class="n">else</span> <span class="n">if</span> <span class="n">n</span> <span class="err">%</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">then</span>
      <span class="n">n</span> <span class="o">/</span> <span class="n">i</span>
    <span class="n">else</span>
      <span class="n">go</span> (<span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>)
</code></pre></div></div>

<p>It is written in a functional style, using tail recursion instead of a loop. Lean
also supports imperative constructs like <code class="language-plaintext highlighter-rouge">for</code> loops, but currently the support
for proving properties of functional programs is still better than for proving
properties of imperative programs.</p>

<p>One detail is that Lean will not accept the code as written above. In Lean, all
functions must terminate. In almost all cases, Lean is able to prove on its own
that a recursive function always reaches the base case, but in this case, we need
to help Lean a bit with the following code:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="n">termination_by</span> <span class="n">n</span> <span class="o">-</span> <span class="n">i</span>
  <span class="n">decreasing_by</span>
    <span class="k">have</span> : <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span> := <span class="k">by</span>
      <span class="k">match</span> <span class="n">i</span> <span class="k">with</span>
      <span class="o">|</span> <span class="mi">0</span> <span class="o">=&gt;</span> <span class="n">omega</span>
      <span class="o">|</span> <span class="mi">1</span> <span class="o">=&gt;</span> <span class="n">omega</span>
      <span class="o">|</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">=&gt;</span> <span class="n">exact</span> <span class="n">Nat</span><span class="o">.</span><span class="n">lt_of_lt_of_le</span> (<span class="n">Nat</span><span class="o">.</span><span class="n">lt_mul_self_iff</span><span class="o">.2</span> (<span class="k">by</span> <span class="n">omega</span>)) (<span class="n">Nat</span><span class="o">.</span><span class="n">not_lt</span><span class="o">.1</span> <span class="n">h</span>)
    <span class="n">omega</span>
</code></pre></div></div>

<p>We’re saying: the recursion always terminates because the number \(n - i\) keeps getting
smaller with each recursive call, but never goes below zero. For this to be true, we
need \(i &lt; n\) whenever we do a recursive call. The slightly tricky part about this is that we
only check for \(i^2 \le n\), so we need to spell out to Lean that this implies \(i &lt; n\).</p>

<p>Now that Lean has accepted our program, we can run it and notice that it returns the
right results, but we can also state the correctness property for our function:</p>

<div class="language-lean highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">theorem</span> <span class="n">largestDivisor_eq_iff</span> <span class="err">{</span><span class="n">n</span> <span class="n">i</span> : <span class="n">Nat</span><span class="err">}</span> (<span class="n">hn</span> : <span class="mi">1</span> <span class="o">&lt;</span> <span class="n">n</span>) :
    <span class="n">largestDivisor</span> <span class="n">n</span> <span class="o">=</span> <span class="n">i</span> <span class="o">↔</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span> <span class="o">∧</span> <span class="n">n</span> <span class="err">%</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">∧</span> <span class="o">∀</span> <span class="n">j</span>, <span class="n">j</span> <span class="o">&lt;</span> <span class="n">n</span> <span class="o">→</span> <span class="n">n</span> <span class="err">%</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">→</span> <span class="n">j</span> <span class="o">≤</span> <span class="n">i</span> :=<span class="cd">
  -- Proof omitted</span>
</code></pre></div></div>

<p>This reads as follows: as long as \(n &gt; 1\), the <code class="language-plaintext highlighter-rouge">largestDivisor</code> function applied to a
number \(n\) returns \(i\) if and only if three things hold:</p>

<ul>
  <li>\(i\) is less than \(n\),</li>
  <li>\(n\) modulo \(i\) is zero (so \(i\) is a divisor of \(n\)), and</li>
  <li>if \(j\) is any other proper divisor of \(n\), then \(j \le i\).</li>
</ul>

<p>In total, this means that \(i\) is the greatest proper divisor of \(n\).</p>

<p>The proof of this claim is a few dozen lines of code. Click <a href="https://live.lean-lang.org/#project=mathlib-stable&amp;url=https%3A%2F%2Fraw.githubusercontent.com%2FTwoFX%2Fhuman-eval-lean%2Frefs%2Fheads%2Fmaster%2FHumanEvalLean%2FHumanEval24.lean">here</a> to be dropped into the Lean web editor with the proof loaded in. You
can have a look around and place your cursor in various places. The bar on the right
will show the current state of the proof at that point.</p>

<p>The high-level idea of the proof is to break it up into two parts: the first part
consists of establishing the “loop invariant” of the “loop” function <code class="language-plaintext highlighter-rouge">go</code> and the
second part consists of showing that this invariant implies the claim. This works
well because it means we can separate the “computer science” (analyzing the program)
from the mathematics (arguing that mapping \(d\) to \(n/d\) makes a correspondence between
small and large divisors).</p>

<p>This also shows the strength of Lean: as an interative theorem prover, it is very
well-suited to express the mathematical reasoning that underpins the faster algorithm. Its
standard library also comes with many thousands of helper results about natural
numbers (and many other things like data structures) which are useful in the proof.
For example, in the proof it is very useful to have results characterizing exactly
how division, multiplication and the ordering on natural numbers interact.</p>

<p>For comparison, the <a href="https://github.com/secure-foundations/human-eval-verus/blob/main/tasks/human_eval_024.rs">corresponding entry</a>
in <code class="language-plaintext highlighter-rouge">human-eval-verus</code> uses the naive linear-time algorithm. It has no problem establishing
the loop invariant in that case, but it is less clear how well it would work to adapt and
extend the verification to include the additional reasoning required for the faster
algorithm. If any Verus user is reading this, I would be very interested to see a Verus
implementation!</p>

<p>If you’d like to join the fun, feel free to have a look at the <a href="https://github.com/TwoFX/human-eval-lean"><code class="language-plaintext highlighter-rouge">human-eval-lean</code></a>
repo and check out the open tasks!</p>]]></content><author><name>Julia Markus Himmel</name></author><category term="blog" /><category term="Lean" /><category term="human-eval-lean" /><summary type="html"><![CDATA[A few weeks ago I started the human-eval-lean project, an effort to collect hand-written solutions to the famous HumanEval (AI) programming benchmark, written in the programming language Lean. The twist is that Lean is not just a programming language, but also a proof assistant, and all solutions in human-eval-lean come with machine-checked formal proofs that the code is correct.]]></summary></entry></feed>