Token Concatenation
2025-02-24, Mon
The other day I was checking some random Emacs built-in function and
this DEFUN
macro shows up in source code, so my next step is to check
its definition to see what's behind the scene, and here is what's in
emacs/src/lisp.h
:
#define DEFUN(lname, fnname, sname, minargs, maxargs, intspec, doc) \ SUBR_SECTION_ATTRIBUTE \ static union Aligned_Lisp_Subr sname = \ {{{ PVEC_SUBR << PSEUDOVECTOR_AREA_BITS }, \ { .a ## maxargs = fnname }, \ minargs, maxargs, lname, {intspec}, lisp_h_Qnil}}; \ Lisp_Object fnname
This macro defines a static union and declares a function without its
parameter list. All seems natural – except what is this strange "##" thing
doing there? I tried to extract a snippet from emacs/src/fileio.c
and
pre-process it with gcc -E sample.c
. So this:
DEFUN ("directory-name-p", Fdirectory_name_p, Sdirectory_name_p, 1, 1, 0, doc: /* Return non-nil if NAME ends with a directory separator character. */) (Lisp_Object name) { /* Omitted */ }
was transformed to this:
SUBR_SECTION_ATTRIBUTE static union Aligned_Lisp_Subr Sdirectory_name_p = { { { PVEC_SUBR << PSEUDOVECTOR_AREA_BITS }, { .a1 = Fdirectory_name_p }, 1, 1, "directory-name-p", {0}, lisp_h_Qnil } }; Lisp_Object Fdirectory_name_p (Lisp_Object name) { }
Apparently .a ## maxargs
becomes .a1
after the preprocessing. This
result raises questions: where is this behavior defined? – namely, Is
it a standard behavior specified by the C standard, or implementation
behavior specified by GCC? Is there any other exotic special
characters being utilized by the C macro?
A quick check of the C99 doc n1256.pdf1, 2
reveals that this ##
thing is indeed well defined as "The ##
operator" in Section 6.10.3.3, along with "The # operator" in Section
6.10.3.2. While the standard is not self-explanatory for me, GNU provides
examples easier to understand in "The C Preprocessor"3 Section 3.4
and Section 3.5.
Back to earlier questions: #
and ##
are two operators recognized
by function-like macros. One is used for stringizing, and another
for token pasting or token concatenation. With questions answered,
it's time for more questions answered with examples:
#define SAMPLEFUNC(a, b, c) \ "a received as: " a \ "b received as: " b \ "c received as: " c // Q: What is the argument value that each parameter receives? // A: String literals separated by comma, stripped of comment SAMPLEFUNC(what, is: /*this macro, foo*/, bar); // "a received as: " what "b received as: " is: "c received as: " bar // Q: What if not enough arguemnt is provided? // A: Compilation error SAMPLEFUNC(what, is); // src/sample-func.c:37:20: error: too few arguments provided to function-like macro invocation // 37 | SAMPLEFUNC(What, is); // | ^ // src/sample-func.c:27:9: note: macro 'SAMPLEFUNC' defined here // 27 | #define SAMPLEFUNC(a, b, c) \ // | ^ // // Q: Other than comma, is there other separator of macro arguments? // A: Don't think so SAMPLEFUNC(`~!@#$%^&*()--=+{}[]|\';''"":;<>,.,doc/**/); // "a received as: " `~!@#$%^&*()--=+{}[]|\';''"":;<> "b received as: " . "c received as: " doc
And my curiosity was satisfied. Let's conclude this post with an example featuring all of stringizing, concatenation, and variadic.
#define foo bar #define greetings(x, y, z, ...) \ x ## y : # z : __VA_ARGS__ greetings(f, oo, baz, hi, hello, howdy); // bar : "baz" : hi, hello, howdy;
Footnotes:
C - Project status and milestones on open-std.org: https://open-std.org/jtc1/sc22/wg14/www/projects.html
Technically, n1256.pdf is not the standard documentation, it is merely a draft.
Token concatenation https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html