How does \expandafter work: A detailed macro case study

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

Case study: `\expandafter` example from The $\varepsilon\mathrm{\text{-}{\TeX}}$ Manual

The $\varepsilon\mathrm{\text{-}{\TeX}}$ typesetting engine was derived from Knuth’s TeX software and originally intended as an “interim” step toward development of the New Typesetting System (NTS), written in the Java programming language. $\varepsilon\mathrm{\text{-}{\TeX}}$ was first developed in the late 1990s to add a suite of new primitive commands which provide additional functionality not available in Knuth’s original program. Although $\varepsilon\mathrm{\text{-}{\TeX}}$ has received periodic updates since its initial release, today it is not widely used as a standalone typesetting engine although its innovations have been absorbed into later generations of TeX: pdfTeX, XeTeX and LuaTeX.

The $\varepsilon\mathrm{\text{-}{\TeX}}$ manual contains an enlightening example of a macro which makes clever use of \expandafter:

    \def\foo#1#2{\number#1
    \ifnum#1<#2,
    \expandafter\foo
    \expandafter{\number\numexpr#1+1\expandafter}%
    \expandafter{\number#2\expandafter}%
    \fi}

\foo implements a looping mechanism, such that \foo{7}{13} produces 7, 8, 9, 10, 11, 12, 13; however, \foo does not use any assignments to variables in order to control the looping process—which makes it an interesting macro to explore in some detail.

Some background: expressions and assignments

An important element of \foo’s code is its use of the command \numexpr, a command from a set of four related primitives first introduced by $\varepsilon\mathrm{\text{-}{\TeX}}$: \numexpr, \dimexpr, \glueexpr and \muexpr. Their purpose is to construct so-called expressions which allow calculation/manipulation of TeX values of type number, dimen, glue, or muglue (respectively). As discussed on pages 8–9 of The $\varepsilon\mathrm{\text{-}{\TeX}}$ Manual, an important characteristic of expressions is their evaluation (calculation) does not require TeX to perform any assignments.

In programming terms, assignment is the process of setting (assigning) a variable to a have a particular value; for example, assigning \count register 99 to contain the value 12345 via \count99=12345. Many other types of assignment take place during TeX processing—such as assigning token registers to contain a series of tokens, assigning box registers to contain box content, and so forth.

To perform an assignment, such as \count99=12345, TeX needs to action (execute) the internal code which implements the behaviour of \count or any other primitive that performs some sort of assignment. However, there are times when TeX is performing pure expansion and, at those times, such assignments are not actioned—at that point in TeX’s processing. Examples of this situation include the following commands:

\edef\command {token list} the “expanded definition” macro-definition command which expands tokens in token list and stores the results as the definition of \command.
\write number {token list} expands tokens in token list and writes them out a file represented by number.
\directlua {token list} this LuaTeX primitive command is used to pass Lua code to the built-in Lua interpreter. All tokens in token list are fully expanded before being passed to the Lua interpreter for execution.

Quick example of `\edef`

If we write the following basic macros:

     \def\mycount{\count99=12345}
     \edef\mymacro{\mycount}

\edef will expand \mycount into is constituent tokens but it goes no further: none of the commands contained in the definion of \mymacro will be actioned: i.e., the assignment of 12345 to \count99 does not happen at this point; only when we call \mymacro will that assignment take place as TeX executes the code to process the \count primitive. When TeX is performing expansion-only activities any assignments will be actioned later in TeX’s processing, not during the expansion process itself.

Why are assignments of interest here?

When writing code to perform a loop—in any programming language—it is common practice to have a variable designated to act as a “loop counter”: used control the number of times a loop is executed. Looping is typically controlled by testing whether that designated loop-counter variable has reached a particular value—that variable is incremented (or decremented) for each iteration of the loop. However, modifying a loop-counter variable means assigning it a new value which, for TeX, usually requires the primitive command \advance to increment (or decrement) a value stored in a \count register. As we’ve seen, during TeX’s pure expansion process such assignments (including incrementing variables) cannot take place: the macro \foo cleverly circumvents this restriction.

Back to explaining `\foo`

The macro \foo is able to control the looping process without needing to assign values to any variables: it controls how often the loop takes place using data arising from expansion: data values stored in temporary token lists. Using our knowledge of TeX’s usage (creation) of temporary token lists we can take a closer look to see exactly how \foo achieves its results.

Remember: We are working through the execution of a macro after the original text of its definition—contained in a physical .tex file—has been scanned (read-in by TeX) and converted to a token list representing the macro definition. In essence, we are following TeX’s processing of those stored tokens whilst it is reading and processing tokens in the macro definition contained somewhere in TeX’s memory. Any space characters originally present in the TeX code of the macro’s definition (text within the .tex file) will have been absorbed whilst TeX was scanning that text for commands (spaces as terminators), or they will have been converted to tokens, such as the space character after the comma (,) in \ifnum#1<#2, which arose from conversion of the end-of-line character (\r) into a space.

Because the TeX code in \foo uses multiple \expandafter commands, we’ll assist our explanation by adding subscripts to each \expandafter, indicating which one we are rererring to. In addition, we’ll extend the notation for tokens processed by \expandafter to $\mathrm{T^i_1}$ and $\mathrm{T^i_2}$, representing tokens $\mathrm{T_1}$ and $\mathrm{T_2}$ for \expandafter_i: \expandafter_i $\mathrm{T^i_1T^i_2}$

Here is the annotated macro code:

    \def\foo#1#2{\number#1
    \ifnum#1<#2,
    \expandafter₁\foo
    \expandafter₂{\number\numexpr#1+1\expandafter₃}%
    \expandafter₄{\number#2\expandafter₅}%
    \fi}

\foo starts with \number#1 which uses the expandable command \number to convert the first argument value into its typeset representation. The \number command works by generating a temporary token list containing character tokens which represent the individual digits contained in the numeric value that \number is operating on. That token list becomes TeX’s next input source. Here, that token list is read and the tokens are output to typeset the value of #1.

Next, the macro performs the test \ifnum#1<#2 to check if the argument for #1 is less than the argument passed in for #2. If so, a comma (,) token is output (typeset) followed by some space arising from the <space> token that was generated from the linebreak character after the comma (,). That space character was first generated when TeX read this line from the .tex file.

The macro continues by processing this next section of code, which is the core of its operation:

    \expandafter₁\foo
    \expandafter₂{\number\numexpr#1+1\expandafter₃}%
    \expandafter₄{\number#2\expandafter₅}%
    \fi}

In essence, this code generates a series of temporary token lists which result in multiple calls to the \foo macro, terminating when the if-test \ifnum#1<#2 is no longer true. But how is looping controlled because no assignments are taking place: where is the “loop counter”?

Let’s start by looking at the code \expandafter₁\foo\expandafter₂. Note that we will use the subscript notation _token (or _(token)) to remind ourselves that, here, TeX is reading/processing numeric (integer) token values.

Here, we have the following tokens as input for \expandafter₁:

$\mathrm{T^1_1} =\ $\foo_token which is read-in and stored for later re-insertion back into the input
$\mathrm{T^1_2} =\ $\expandafter_{2 (token)} which is expanded

For \expandafter₂ we have:

$\mathrm{T^2_1} =\ ${_token which is saved for later re-insertion back into the input
$\mathrm{T^2_2} =\ $ \number_token which is expanded

Note:\number is an expandable command whose purpose is to “convert to tokens”: i.e., convert a numeric quantity into a series of character tokens which represent that quantity. When \number is expanded, the first thing that TeX does is to scan the input looking for integers: a process which triggers further expansion.

The key to the story: Here, \number is acting on the expression \numexpr#1+1 which calculates the value of #1+1. The result of that calculation is processed by \number to convert it into a temporary token list containing character tokens representing the value of #1 + 1. That temporary token list, generated by \number, will eventually be read-in as the first argument to another call of \foo. Rather than incrementing a loop counter (via \advance and assignment), the use of \numexpr creates a new value but without assignment being necessary. Through this mechanism, the variable controlling the loop (\foo’s parameter #1) is incremented and iteration through the loop is controlled and terminated: quite ingenious!

Next, \expandafter₃ is processed, yielding:

$\mathrm{T^3_1} =\ $}_token which is saved for later re-insertion back into the input
$\mathrm{T^3_2} =\ $\expandafter_{4 (token)}, which is expanded:

For \expandafter₄ we have:

$\mathrm{T^4_1} = ${_token which is saved for later re-insertion back into the input
$\mathrm{T^4_2} = $\number_token which is expanded and converts #2 into another temporary token list.

Finally,\expandafter₅ is expanded:

$\mathrm{T^5_1} =\ $}_token which is saved for later re-insertion back into the input
$\mathrm{T^5_2} =\ $\fi_token, which is an expandable command.
The expansion of \fi effectively terminates the \ifnum and, in effect, closes this iteration of the macro. TeX now completes re-insertion of all the tokens temporarily saved by the multiple \expandafter commands: this generates a series of single-token token lists arising from the tokens saved by each \expandafter. In addition TeX has also created token lists from through the action of \number.

Assembling the token lists

In essence, the \foo macro generates a sequence of token lists: you can think of \foo as a token-list “manufacturing facility”. Those token lists are read by TeX to become the next sources of input. The clever part is contained in one of the earlier actions of \foo:

    \expandafter₁\foo\expandafter₂

through which \foo arranges to call itself again but with different arguments that are stored in token lists constructed by \number. To make these token lists collectively behave as a macro call, the braces { and } have all been saved and re-inserted into the input (as single-token lists) by the actions of \expandafter commands.

$token lists generated by the \foo macro$

Part 1 Part 2 Part 3 Part 4 Part 5 Part 6

How does \expandafter work: A detailed macro case study

Contents

Case study: `\expandafter` example from The \(\varepsilon\mathrm{\text{-}{\TeX}}\) Manual

Some background: expressions and assignments

Quick example of `\edef`

Why are assignments of interest here?

Back to explaining `\foo`

Assembling the token lists

How does \expandafter work: A detailed macro case study

Case study: \expandafter example from The \(\varepsilon\mathrm{\text{-}{\TeX}}\) Manual

Some background: expressions and assignments

Quick example of \edef

Why are assignments of interest here?

Back to explaining \foo

Assembling the token lists

Case study: `\expandafter` example from The \(\varepsilon\mathrm{\text{-}{\TeX}}\) Manual

Quick example of `\edef`

Back to explaining `\foo`