The Basics of Procedures

What is a “procedure”?

In UNIT, a procedure is referring to the UNIT_Procedure type, which holds instructions that will execute at runtime; or, in other words, a procedure is a function that we’ll be generating code for.

Hint

We use the term “procedure” instead of “function” to distinguish between things that aren’t compiled by UNIT. A procedure is created by you and contains UNIT’s instructions, whereas a function could be something in the C standard library.

Creating the context

Before we can emit any instructions, we need to initialize a procedure. This can be done through UNIT_Context_Init() or UNIT_Procedure_New(). But, if you look at the signature of those functions, they take a UNIT_Context *. So, before we can create a procedure, we have to create a new context.

Assuming everything has been installed correctly, UNIT’s primary header file should exist at $INCLUDE_PATH/unit/unit.h. If we #include that, then everything in UNIT’s public C API will become available in the namespace.

So, using our prior knowledge about how we initialize structures (Init and New functions), let’s create a new context:

main.c
 1 #include <unit/unit.h>
 2
 3 int main(void)
 4 {
 5     UNIT_Context context;
 6     if (UNIT_FAILED(UNIT_Context_Init(&context))) {
 7         return 1;
 8     }
 9
10     UNIT_Context_Clear(&context);
11     return 0;
12 }

When to use Init instead of New?

The primary difference between these two functions is what kind of allocation the structure is stored on. New explicitly uses the heap, so only use it when you actually need the heap; in other words, if you know that a structure will outlive the current function, then use New. Otherwise, use Init.

In this case, we use Init because we don’t need the context to outlive the main function.

Before proceeding, let’s make sure this works. For this tutorial, we’ll be using GCC, but you can use whatever compiler you want, as long as it can find UNIT’s header files and link against it.

So, let’s save the above as main.c and run it:

bash
$ gcc main.c -o out -lunit
$ ./out

Creating a procedure

At this point, we have a context that we can use, so we can now create our first procedure!

Following UNIT’s naming convention, procedures can be created through UNIT_Procedure_Init() and UNIT_Procedure_New(). Since we’re running everything inside the main function, we’ll use Init with stack memory:

main.c
 1 #include <unit/unit.h>
 2
 3 int main(void)
 4 {
 5     UNIT_Context context;
 6     if (UNIT_FAILED(UNIT_Context_Init(&context))) {
 7         return 1;
 8     }
 9
10     UNIT_Procedure procedure;
11     if (UNIT_FAILED(UNIT_Procedure_Init(&procedure, &context, "main"))) {
12         return 1;
13     }
14
15     UNIT_Procedure_Clear(&procedure)
16     UNIT_Context_Clear(&context);
17     return 0;
18 }

But wait, there’s a bug here! Even when UNIT_Procedure_Init fails, the context is still alive. Remember, we always need to call UNIT_Context_Clear after UNIT_Context_Init was successful, otherwise our program will leak memory.

In addition, UNIT_Procedure_Init sets an error message. For our own sanity as developers, let’s put a UNIT_PrintError before returning so we have some idea of what went wrong if something were to fail:

main.c
 1 #include <unit/unit.h>
 2
 3 int main(void)
 4 {
 5     UNIT_Context context;
 6     if (UNIT_FAILED(UNIT_Context_Init(&context))) {
 7         return 1;
 8     }
 9
10     UNIT_Procedure procedure;
11     if (UNIT_FAILED(UNIT_Procedure_Init(&procedure, &context, "main"))) {
12         UNIT_PrintError(&context, stderr);
13         UNIT_Context_Clear(&context);
14         return 1;
15     }
16
17     UNIT_Procedure_Clear(&procedure)
18     UNIT_Context_Clear(&context);
19     return 0;
20 }

Emitting instructions

Now that we have all the boilerplate code out of the way, we can start emitting instructions!

UNIT uses a stack-based IR, or in other words, you write code for a stack machine, and then UNIT will translate it to machine code.

Caution

Be careful to not confuse UNIT’s operand stack with the stack present in CPUs. The term “stack” in “stack-allocated variable” does not mean the same thing as “stack” in “operand stack”.

All instructions in UNIT have two common components of a stack-based instruction set:

  1. The operation ID, often shortened to “opcode”.

  2. The operation argument, often shortened to “oparg”.

Let’s start with the operation ID and ignore the argument for now. In UNIT, all instructions are available in an enum called UNIT_Instruction. The values of this enum are prefixed with UNIT_OP_. But, how do we actually add instructions to the procedure?

Instructions can be added using a few functions under the UNIT_Procedure namespace, but for now, let’s focus on UNIT_Procedure_AddOperation(), which takes an opcode and oparg as an integer. If we wanted to make a program that simply did return 0, we need two instructions:

  1. UNIT_OP_LOAD_INTEGER, which pushes a constant integer onto the operand stack.

  2. UNIT_OP_RETURN_VALUE, which pops a value off the operand stack and returns it to the caller.

For LOAD_INTEGER, we need an oparg. This is the value that will be pushed onto the stack. In our case, this will be 0. For RETURN_VALUE, we don’t need an oparg, so we can put any value we want for the oparg. We’ll just stay simple and pass 0.

Now, if we apply this to our code:

main.c
 1 #include <unit/unit.h>
 2
 3 int main(void)
 4 {
 5     UNIT_Context context;
 6     if (UNIT_FAILED(UNIT_Context_Init(&context))) {
 7         return 1;
 8     }
 9
10     UNIT_Procedure procedure;
11     if (UNIT_FAILED(UNIT_Procedure_Init(&procedure, &context, "main"))) {
12         UNIT_PrintError(&context, stderr);
13         UNIT_Context_Clear(&context);
14         return 1;
15     }
16
17     if (UNIT_FAILED(UNIT_Procedure_AddOperation(&procedure, UNIT_OP_LOAD_INTEGER, 0))) {
18         UNIT_PrintError(&context, stderr);
19         UNIT_Procedure_Clear(&procedure);
20         UNIT_Context_Clear(&context);
21         return 1;
22     }
23
24     if (UNIT_FAILED(UNIT_Procedure_AddOperation(&procedure, UNIT_OP_RETURN_VALUE, 0))) {
25         UNIT_PrintError(&context, stderr);
26         UNIT_Procedure_Clear(&procedure);
27         UNIT_Context_Clear(&context);
28         return 1;
29     }
30
31     UNIT_Procedure_Clear(&procedure)
32     UNIT_Context_Clear(&context);
33     return 0;
34 }

But, this is really ugly. The error handling gets out of control very quickly, and this will only get worse as we add more instructions. We can clean this up by adding some macros tailored to our function, like so:

main.c
 1 #include <unit/unit.h>
 2
 3 int main(void)
 4 {
 5     UNIT_Context context;
 6     if (UNIT_FAILED(UNIT_Context_Init(&context))) {
 7         return 1;
 8     }
 9
10     UNIT_Procedure procedure;
11     if (UNIT_FAILED(UNIT_Procedure_Init(&procedure, &context, "main"))) {
12         UNIT_PrintError(&context, stderr);
13         UNIT_Context_Clear(&procedure);
14         return 1;
15     }
16
17 #define ADDOP_INT(op, value)                                                \
18     if (UNIT_FAILED(UNIT_Procedure_AddOperation(&procedure, op, value))) {  \
19         UNIT_PrintError(&context, stderr);                                  \
20         UNIT_Procedure_Clear(&context);                                     \
21         UNIT_Context_Clear(&procedure);                                     \
22         return 1;                                                           \
23     }
24
25 #define ADDOP(op) ADDOP_INT(op, 0)
26
27     ADDOP_INT(UNIT_OP_LOAD_INTEGER, 0);
28     ADDOP(UNIT_OP_RETURN_VALUE);
29
30 #undef ADDOP_INT
31 #undef ADDOP
32
33     UNIT_Procedure_Clear(&procedure)
34     UNIT_Context_Clear(&context);
35     return 0;
36 }

Warning

Control flow in macros is a common source of bugs. Handle this with care.

The macro gymnastics aren’t super pretty, but this will help us a lot as we add more instructions.

Tip

Another way to consolidate error handling code is to add a label above the error handling code, and then jump to it with goto upon failure. For example:

#include <unit/unit.h>

int main(void)
{
    /* ... */
    if (UNIT_FAILED(UNIT_Procedure_AddOperation(&operation, UNIT_OP_LOAD_INTEGER, 42))) {
        goto error;
    }
    /* ... */

    UNIT_Procedure_Clear(&procedure)
    UNIT_Context_Clear(&context);
    return 0;
error:
    UNIT_Procedure_Clear(&procedure)
    UNIT_Context_Clear(&context);
    return 1;
}

Okay, let’s now try to compile and run our program.

gcc main.c -o out -lunit
./out
echo $?
0

Return code 0 – that makes sense. Let’s now try to return 1, just to confirm our code is running:

main.c
 1 int main(void) {
 2     /* ... */
 3
 4 #define ADDOP_INT(op, value)                                                \
 5     if (UNIT_FAILED(UNIT_Procedure_AddOperation(&procedure, op, value))) {  \
 6         UNIT_PrintError(&context, stderr);                                  \
 7         UNIT_Procedure_Clear(&context);                                     \
 8         UNIT_Context_Clear(&procedure);                                     \
 9         return 1;                                                           \
10     }
11
12 #define ADDOP(op) ADDOP_INT(op, 0)
13
14     ADDOP_INT(UNIT_OP_LOAD_INTEGER, 1);
15     ADDOP(UNIT_OP_RETURN_VALUE);
16
17 #undef ADDOP_INT
18 #undef ADDOP
19
20     /* ... */
21 }
bash
$ gcc main.c -o out -lunit
$ ./out
$ echo $?
0

Huh? 0 again? What’s going on?

In the above code, all did was create the procedure – not actually compile or execute it. Forgetting to actually run the code can be a common mistake when developing a code generator.

So, how do we actually execute our instructions? We’ll talk about that next.