January 7, 2015

Iteration vs. Recursion, from scratch

I’ve read in the interwebs than iteration or recursion is a matter of choice, … that they are somewhat “equivalent”…

I don’t think so.

Let’s compare “Iteration” vs “Recursion” by analyzing which core concepts are required to implement each one of this concepts:

Let’s start from scratch:

Question: What do yo need to compute something, to follow an algorithm and reach a result?

We will try to establish a hierarchy of concepts, starting from scratch and defining in first place the basic, core concepts and after that the concepts which are constructed from previously defined concepts.

State: Memory cells, storage: to do something you need places to store final and intermediate result values. Let’s assume we have an infinite array of “integer” cells, called Memory, M[0..Infinite].
Instructions: do something - transform a cell, change its value. alter state. Every interesting instruction performs a transformation. Basic instructions are:

a) Set & move memory cells (alter State)
- store a value into memory, e.g.: store 5 m[4]
- copy a value to another position: e.g.: store m[4] m[8]
b) Logic and arithmetic
- and, or, xor, not
- add, sub, mul, div
A Executing Agent: a core in a modern CPU. An “agent” is something that can execute instructions. An Agent can also be a person following the algorithm on paper.
Steps: a sequence of instructions: i.e.: do this first, do this after, etc. An imperative sequence of instructions. Even one line expressions are an imperative sequence of instructions. If you have a expression with a specific “order of evaluation” then you have steps. It means than even a single composed expression has implicit “steps” and also an implicit local variable (let’s call it “result”).
```
4 + 3 * 2 - 5
(- (+ (* 3 2) 4 ) 5)
(sub (add (mul 3 2) 4 ) 5)  
```
The expression above implies 3 steps with an implicit “result” variable.
```
// pseudocode

       1. result = (mul 3 2)
       2. result = (add 4 result)
       3. result = (sub result 5)
```
So even infix expressions, if you have a specific order of evaluation, are an imperative sequence of instructions. The expression implies a sequence of operations to be made in a specific order, and because there are steps, there is also an implicit “result” intermediate variable.
Instruction Pointer: If you have a sequence of steps, you have also an implicit “instruction pointer”. The instruction pointer marks the next instruction, and advances after the instruction is read but before the instruction is executed.

The Instruction Pointer is part of Memory. (Note: Normally the Instruction Pointer will be a “special register” in a CPU core, but here we will simplify the concepts and assume all data (registers included) are part of “Memory”. It will simplify the definition of “State”, which is all the referenced memory values in a point of time)
Jump - Once you have an ordered number of steps and an Instruction Pointer, you can apply the “store” instruction to alter the value of the Instruction Pointer itself. We will call this specific use of the store instruction with a new name: Jump. We use a new name because is easier to think about it as a new concept. By altering the instruction pointer we’re instructing the agent to “go to step x“.
Infinite Iteration: By jumping back, now you can make the agent “repeat” a certain number of steps. At this point we have infinite Iteration.
```
                   1. mov 1000 m[30]
                   2. sub m[30] 1
                   3. jmp-to 2  // infinite loop
```
Conditional - Conditional execution of instructions. With the “conditional” clause, you can conditionally execute one of several instructions based on the current state (which can be set with a previous instruction).

Proper Iteration: Now with the conditional clause, we can escape the infinite loop of the jump back instruction. We have now a conditional loop and then proper Iteration

1. mov 1000 m[30]
2. sub m[30] 1
3. (if not-zero) jump 2  // jump only if the previous 
                        // sub instruction did not result in 0

// this loop will be repeated 1000 times
// here we have proper ***iteration***, a conditional loop.

Naming: giving names to a specific memory location holding data or holding a step. This is just a “convenience” to have. We do not add any new instructions by having the capacity to define “names” for memory locations. “Naming” is not a instruction for the agent, it’s just a convenience to us. Naming makes code (at this point) easier to read and easier to change.
```
   #define counter m[30]   // name a memory location
   mov 1000 counter
loop:                      // name a instruction pointer location
    sub counter 1
    (if not-zero) jmp-to loop  
```
one-level subroutine: Suppose there’s a series of steps you need to execute frequently. You can store the steps in a named position in memory and then jump to that position when you need to execute them (call) but at the end of the sequence you’ll need to return to the point of calling to continue execution. With this mechanism, you’re creating new instructions (subroutines) by composing core instructions.

Implementation: (no new concepts required)
- Store the current Instruction Pointer in a predefined memory position
- jump to the subroutine
- at the end of the subroutine, you retrieve the Instruction Pointer from the predefined memory location, effectively jumping back to the following instruction of the original call
Problem with the one-level implementation: You cannot call another subroutine from a subroutine. If you do, you’ll overwrite the returning address (global variable), so you cannot nest calls.

To have a better Implementation for subroutines: You need a STACK
STACK: You define a memory space to work as a “stack”, you can “push” values on the stack, and also “pop” the last “pushed” value. To implement a stack you’ll need a Stack Pointer (similar to the Instruction Pointer) which points to the actual “head” of the stack. When you “push” a value, the stack pointer increments and you store the value. When you “pop”, you get the value at the actual Stack Pointer and then the Stack Pointer is decremented.
Subroutines Now that we have a stack we can implement proper subroutines allowing nested calls. The implementation is similar, but instead of storing the Instruction Pointer in a predefined memory position, we “push” the value of the IP in the stack. *At the end of the subroutine, we just “pop” the value from the stack, effectively jumping back to the instruction after the original *call. This implementation, having a “stack” allows calling a subroutine from another subroutine. With this implementation we can create several levels of abstraction when defining *new instructions as subroutines, by using core instructions or another subroutines as building blocks.
Recursion: What happens when a subroutine calls itself?. This is called “recursion”. The “new” problem with “recursion” (at this point) are the local intermediate results a subroutine can be storing in memory. Since you are calling/reusing the same steps, if the intermediate result are stored in predefined memory locations (global variables) they will be overwritten on a the nested calls.

Solution: To allow recursion, subroutines should store local intermediate results in the stack, so on each recursive call (direct or indirect) the intermediate results are stored in different memory locations.

We will call this “Local State”: The intermediate values a subroutine uses are stored in a stack, so each execution of the subroutine has it’s own separate memory space, It’s “Local State”. In contrast, values stored at fixed memory locations will be “Global State”.

Conclusion: #

In a Von Neumann Architecture, clearly “iteration” is a simpler/basic concept than “recursion". We have a form of Iteration at level 7, while recursion is at level 14 of the concepts hierarchy.

Which one is “better”? #

You should use “iteration” when you are processing simple, sequential data structures, and everywhere a “simple loop” will do.
You should use “recursion” when you need to process a recursive data structure (I like to call them “Fractal Data Structures”), or when the recursive solution is clearly more “elegant”.

Advice: use the best tool for the job, but understand the inner workings of each tool in order to choose wisely.

Finally, note that you have plenty of opportunities to use recursion. You have recursive data structures everywhere, you’re looking at one now: parts of the DOM supporting what you are reading are a RDS, a JSON expression is a RDS, the hierarchical file system in your computer is a RDS, i.e: you have a root directory, containing files and directories, every directory containing files and directories, every one of those directories containing files and directories…

Kudos

Iteration vs. Recursion, from scratch

Conclusion: #

Which one is “better”? #

Now read this

Adding a new dimension to source code. Literally.