Lucio Tato

golang Duff's devices

2015-02-03T07:11:49-08:00

I’m just starting to scratch golang surface, and until now it has been a pleasant experience.

golang feels solid and fast.

Let me share with you a few golang internals on this notes as long as I’m stumbling upon them:

Today I’ve tripped over some golang source code that looks like a copy-paste frenzy, but is a very clever assembler trick to make the processor go as fast as possible:

The code is in runtime/asm_amd64.s and looks like this:

    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    ...

… repeated ~~128~~ 2⁷ times

and later:

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    ...

… repeated, again, 2⁷ times, (512 LOC)

It turns out that this is a clever trick called “A Duff’s device”. Wow! today we’re learning very low-level stuff.

If this is the trickery level of golang, no wonder why is so fast!

I’ll paste the code at the end of this post, and you can google Duff’s device yourself, but before that, let me share a warning snippet I’ve found on source comments: Keep your hands off vprintf! :)

at runtime\print1.go:

// Very simple printf. Only for debugging prints.

// Do not add to this without checking with Rob.

func vprintf(str string, arg unsafe.Pointer)…*

And now for something completely different: Here is the full code section with the Duff’s devices from runtime/asm_amd64.s, for your viewing pleasure:

// A Duff's device for zeroing memory. 
// The compiler jumps to computed addresses within
// this routine to zero chunks of memory.  Do not
// change this code without also changing the code
// in ../../cmd/6g/ggen.c:clearfat.
// AX: zero
// DI: ptr to memory to be zeroed
// DI is updated as a side effect.
TEXT runtime·duffzero(SB), NOSPLIT, $0-0
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    STOSQ
    RET

// A Duff's device for copying memory.
// The compiler jumps to computed addresses within
// this routine to copy chunks of memory.  Source
// and destination must not overlap.  Do not
// change this code without also changing the code
// in ../../cmd/6g/cgen.c:sgen.
// SI: ptr to source memory
// DI: ptr to destination memory
// SI and DI are updated as a side effect.

// NOTE: this is equivalent to a sequence of MOVSQ but
// for some reason that is 3.5x slower than this code.
// The STOSQ above seem fine, though.
TEXT runtime·duffcopy(SB), NOSPLIT, $0-0
    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    MOVQ    (SI),CX
    ADDQ    $8,SI
    MOVQ    CX,(DI)
    ADDQ    $8,DI

    RET

golang first impressions

2015-02-03T07:06:30-08:00

What did I do with golang until now:

Install on linux? great: fast, seamless.
Workspace organization? great, simple, clean (wks/src, wks/bin, wks/pkg, and also wsk/src/github.com/user/project)
Compile on linux? (small test programs) great: code auto formatted, then a multi-threaded 64-bit executable produced and installed on a single command in milliseconds.
get golang source on linux? no problem.
Compile golang source on Linux? no problem (I did this to cross-compile the multi-thread test as 32-bit exe on a 64-bit machine, to test torn-reads… but this is for another post)
Install on windows? fast, seamless.
Compile on windows the same small tests? no problem.

Iteration vs. Recursion, from scratch

2015-01-07T12:20:22-08:00

I’ve read in the interwebs than iteration or recursion is a matter of choice, … that they are somewhat “equivalent”…

I don’t think so.

Let’s compare “Iteration” vs “Recursion” by analyzing which core concepts are required to implement each one of this concepts:

Let’s start from scratch:

Question: What do yo need to compute something, to follow an algorithm and reach a result?

We will try to establish a hierarchy of concepts, starting from scratch and defining in first place the basic, core concepts and after that the concepts which are constructed from previously defined concepts.

State: Memory cells, storage: to do something you need places to store final and intermediate result values. Let’s assume we have an infinite array of “integer” cells, called Memory, M[0..Infinite].
Instructions: do something - transform a cell, change its value. alter state. Every interesting instruction performs a transformation. Basic instructions are:

a) Set & move memory cells (alter State)
- store a value into memory, e.g.: store 5 m[4]
- copy a value to another position: e.g.: store m[4] m[8]
b) Logic and arithmetic
- and, or, xor, not
- add, sub, mul, div
A Executing Agent: a core in a modern CPU. An “agent” is something that can execute instructions. An Agent can also be a person following the algorithm on paper.
Steps: a sequence of instructions: i.e.: do this first, do this after, etc. An imperative sequence of instructions. Even one line expressions are an imperative sequence of instructions. If you have a expression with a specific “order of evaluation” then you have steps. It means than even a single composed expression has implicit “steps” and also an implicit local variable (let’s call it “result”).
```
4 + 3 * 2 - 5
(- (+ (* 3 2) 4 ) 5)
(sub (add (mul 3 2) 4 ) 5)  
```
The expression above implies 3 steps with an implicit “result” variable.
```
// pseudocode

       1. result = (mul 3 2)
       2. result = (add 4 result)
       3. result = (sub result 5)
```
So even infix expressions, if you have a specific order of evaluation, are an imperative sequence of instructions. The expression implies a sequence of operations to be made in a specific order, and because there are steps, there is also an implicit “result” intermediate variable.
Instruction Pointer: If you have a sequence of steps, you have also an implicit “instruction pointer”. The instruction pointer marks the next instruction, and advances after the instruction is read but before the instruction is executed.

The Instruction Pointer is part of Memory. (Note: Normally the Instruction Pointer will be a “special register” in a CPU core, but here we will simplify the concepts and assume all data (registers included) are part of “Memory”. It will simplify the definition of “State”, which is all the referenced memory values in a point of time)
Jump - Once you have an ordered number of steps and an Instruction Pointer, you can apply the “store” instruction to alter the value of the Instruction Pointer itself. We will call this specific use of the store instruction with a new name: Jump. We use a new name because is easier to think about it as a new concept. By altering the instruction pointer we’re instructing the agent to “go to step x“.
Infinite Iteration: By jumping back, now you can make the agent “repeat” a certain number of steps. At this point we have infinite Iteration.
```
                   1. mov 1000 m[30]
                   2. sub m[30] 1
                   3. jmp-to 2  // infinite loop
```
Conditional - Conditional execution of instructions. With the “conditional” clause, you can conditionally execute one of several instructions based on the current state (which can be set with a previous instruction).

Proper Iteration: Now with the conditional clause, we can escape the infinite loop of the jump back instruction. We have now a conditional loop and then proper Iteration

1. mov 1000 m[30]
2. sub m[30] 1
3. (if not-zero) jump 2  // jump only if the previous 
                        // sub instruction did not result in 0

// this loop will be repeated 1000 times
// here we have proper ***iteration***, a conditional loop.

Naming: giving names to a specific memory location holding data or holding a step. This is just a “convenience” to have. We do not add any new instructions by having the capacity to define “names” for memory locations. “Naming” is not a instruction for the agent, it’s just a convenience to us. Naming makes code (at this point) easier to read and easier to change.
```
   #define counter m[30]   // name a memory location
   mov 1000 counter
loop:                      // name a instruction pointer location
    sub counter 1
    (if not-zero) jmp-to loop  
```
one-level subroutine: Suppose there’s a series of steps you need to execute frequently. You can store the steps in a named position in memory and then jump to that position when you need to execute them (call) but at the end of the sequence you’ll need to return to the point of calling to continue execution. With this mechanism, you’re creating new instructions (subroutines) by composing core instructions.

Implementation: (no new concepts required)
- Store the current Instruction Pointer in a predefined memory position
- jump to the subroutine
- at the end of the subroutine, you retrieve the Instruction Pointer from the predefined memory location, effectively jumping back to the following instruction of the original call
Problem with the one-level implementation: You cannot call another subroutine from a subroutine. If you do, you’ll overwrite the returning address (global variable), so you cannot nest calls.

To have a better Implementation for subroutines: You need a STACK
STACK: You define a memory space to work as a “stack”, you can “push” values on the stack, and also “pop” the last “pushed” value. To implement a stack you’ll need a Stack Pointer (similar to the Instruction Pointer) which points to the actual “head” of the stack. When you “push” a value, the stack pointer increments and you store the value. When you “pop”, you get the value at the actual Stack Pointer and then the Stack Pointer is decremented.
Subroutines Now that we have a stack we can implement proper subroutines allowing nested calls. The implementation is similar, but instead of storing the Instruction Pointer in a predefined memory position, we “push” the value of the IP in the stack. *At the end of the subroutine, we just “pop” the value from the stack, effectively jumping back to the instruction after the original *call. This implementation, having a “stack” allows calling a subroutine from another subroutine. With this implementation we can create several levels of abstraction when defining *new instructions as subroutines, by using core instructions or another subroutines as building blocks.
Recursion: What happens when a subroutine calls itself?. This is called “recursion”. The “new” problem with “recursion” (at this point) are the local intermediate results a subroutine can be storing in memory. Since you are calling/reusing the same steps, if the intermediate result are stored in predefined memory locations (global variables) they will be overwritten on a the nested calls.

Solution: To allow recursion, subroutines should store local intermediate results in the stack, so on each recursive call (direct or indirect) the intermediate results are stored in different memory locations.

We will call this “Local State”: The intermediate values a subroutine uses are stored in a stack, so each execution of the subroutine has it’s own separate memory space, It’s “Local State”. In contrast, values stored at fixed memory locations will be “Global State”.

Conclusion: #

In a Von Neumann Architecture, clearly “iteration” is a simpler/basic concept than “recursion". We have a form of Iteration at level 7, while recursion is at level 14 of the concepts hierarchy.

Which one is “better”? #

You should use “iteration” when you are processing simple, sequential data structures, and everywhere a “simple loop” will do.
You should use “recursion” when you need to process a recursive data structure (I like to call them “Fractal Data Structures”), or when the recursive solution is clearly more “elegant”.

Advice: use the best tool for the job, but understand the inner workings of each tool in order to choose wisely.

Finally, note that you have plenty of opportunities to use recursion. You have recursive data structures everywhere, you’re looking at one now: parts of the DOM supporting what you are reading are a RDS, a JSON expression is a RDS, the hierarchical file system in your computer is a RDS, i.e: you have a root directory, containing files and directories, every directory containing files and directories, every one of those directories containing files and directories…

Adding a new dimension to source code. Literally.

2014-08-15T11:17:31-07:00

Note: I could have titled this “Curly braces considered harmful”, but titling like that is now considered harmful

Unidimensional source code #

Source code for the most popular programming languages is unidimensional. Just a unidimensional stream of unicode points. This is an artifact if you like of the Turing machine model, or, if you prefer, it is related to the fact that programming languages are analyzed with the same tools of natural languages, and all natural languages, being “spoken”, are constrained by its transmission medium to be a unidimensional stream of sounds and silences. On the other side, computer languages, are rarely “spoken”, so they’re not naturally constrained to a unidimensional stream.

Let’s talk about C. For the C compiler, source is a stream of chars. A c compiler can compile a 100kb source written on one line without line breaks. (Javascript engines do this thousands of times per minute, when somebody visits a page with a minified source). But nobody writes C (or javascript) programs in a single line without line breaks. Also we don’t just add line breaks, we properly indent after “{” and de-indent before “}”.

For the compiler, the source code is a unidimensional stream of chars. Whitespace and line-breaks are ignored. On the other hand, the programmer sees the source code as a bidimensional construction with lines and columns, and withespace and indent are fundamental for code readability.

Proper indent is so important for the programmer that looking at code without proper indentation is like staring a picture frame hanging crooked, and trying to resist fixing it.

This different “vision” between the compiler and the programmer creates a hidden source of big problems. The Apple SSL Bug this year is directly related to this difference in “parsing” from the compiler and the programmer mind. This kind of bug can only slip because the unidimensional/bidimensional mismatch. You can make the same mistake of “pasting twice” the “goto fail;” statement, but fi you do it in Python or LiteScript it will be just a extra ignored statement, instead of a catastrophic bug lurking hidden in the indentation. In Python or LiteScript the blocks you see is also what the compiler sees.

A new dimension in the compiler adds benefits #

I believe that a choosing to use indentation for blocks, aka the Off-side rule is not only a “style” choice for a modern language, it also helps to avoid a large source of subtle, potentially highly problematic bugs.

Javascript, untangled

2014-07-01T15:34:05-07:00

Background #

I’m working on LiteScript, a compile-to-js-language, based in Javascript design (the good parts). LiteScript is also a compile-to-C-language. In order to compile to fast C, it is required to untangle some JS concepts and limit some JS dynamic features

Untangling Javascript #

Javascript is extremely powerful, and this power allows a programmer to write very clever code, and also shoot herself in the foot in very imaginative ways.

Classes are Functions, and Functions are Classes #

In JS a “class” is simply a Function and every Function is a class. This is the first concept to untangle.
In LiteScript, a “Class” is a type of Object, different from a “Function”.

JS Function: Object containing executable code.
Functions objects have properties: name, prototype, length.
Functions objects have several methods: apply, call, bind, toString, etc.

JS Class: see Function.

LiteScript Function: executable code, Function objects have no properties, and have two methods: apply and call

LiteScript Class: model for instances. Class objects have two properties: name and prototype.

We’re simply untangling the two concepts, but keeping JS design. Since we have now a differentiated type “Class”, we moved the properties “name” and “prototype” to the new type.

In JS, the Function-Class is used as the instance initialization (the code initializing the instance property values). In LS, we separate the instance initialization as the class constructor.

to new or not to new #

In JS you may call a function with “new” or you may not. Sometimes, if the function is mainly a class, the function will take care of this and will create a instance for you (by calling itself with new), sometimes it will not. There are some core JS classes which do not respect this “initialize instance” model, most notably the “Error” Function-Class.

in LiteScript (if you want to compile-to-C later) the only way to create a instance is to use “new” with a Class, also, all core classes respect the “initialize provided instance” model. Moreover: the instance initialization function do not return a value, so it cannot return a different “this” than the provided one.

Untangling Objects and Dictionaries #

This is one of the most powerful features of JS. JS is self-reflective by default. All Objects are really “bags of dynamic properties”. This is also one of the reasons why normal JS code is slow compared to compiled code, despite great advances of JS engines like V8.

Many JS programs use object as “Dictionaries”, to have fast access to a name:value pair.

In the EcmaScript6 specification, a new concept “Map” is incorporated to JS. From the Map’s viewpoint, a Object is just a Map string=>any

Litescript: Objects and Maps #

in LiteScript, since we want to compile-to-C, we need to untangle Map and Object.

JS Object: bag of dynamic properties. You can access properties with the Property Access Operator, a dot. You can add or delete properties at runtime.

Example of the PropertyAccess operator: #

obj.someProp.anotherSubProp

JS Map (ES6): bag of dynamic properties.

LiteScript Object: instance of a Class. It has a Predefined list of properties. You can access properties with the Property Access Operator, a dot. You can not add new properties at runtime.

LiteScript Map: bag of dynamic properties. a Dictionary.

So in LiteScript, you use objects as you define them in Class declarations, and, if you need a “Dictionary”, you use a Map. In order to rewrite JS code in LiteScript, you need to use defined classes instead of patched Objects, and you can’t rely on being able to add or remove new Object properties at runtime. Normally you have to declare and extend classes instead of add or remove properties from Objects.

Performance gains #

The LiteScript compiler is written in LiteScript. Today, at v0.8.5, the LiteScript compiler can compile itself to-js code and also to-c code.

LiteScript compiler can be compiled to-js in order to execute it under node.js or the browser. LiteScript compiler can be compiled to-c in order to execute it as a standalone executable. When compiled-to-c, LiteScript compiler can self-compile 5x-7x faster than the js version of the same source.

Conclusion #

Untangling concepts, incorporating “Class” and limiting “Object”, we’re able to have a language, LiteScript, which is heavily inspired by Javascript, but which can be compiled-to-C when a better performance is required.

Status #

LiteScript is beta, and compile-to-C is in alpha stage. Hackers help is required and will be very well received. http://github.com/luciotato/litescript

Keep your coder's mind at full speed: avoid mental branch mispredictions

2014-03-07T13:01:42-08:00

Brains and CPUs #

One can make an analogy of the human brain and a CPU, but
never is the analogy more valid than in the case of a programmer
reading source code.

The coder’s mind #

When reading source code, trying to understand what the code does,
the mind of a trained programmer works much like a CPU.

We’re “following” (executing) the source code in our minds,
trying to determine what the program does.

..then calls this function, stores result here, if it’s less than 10, then it…

Branch Misprediction #

Do you know what a “branch misprediction” is ?

Branch misprediction is an interesting artifact of CPU’s execution,
since it effects can “bubble” up to high-level languages.

They say “A picture is worth a thousand words”, well this explanation is illustrated with a picture, and as today it seems to worth more than ten thousands upvotes.

Please go now and read this fantastic answer
at StackOverflow.

MBM: Mental Branch Misprediction #

Now, merging both concepts, a Mental Branch Misprediction
is what occurs in the coder’s mind when the source code language we’re reading, has some specific constructions causing our minds to “mispredict” execution, and lose our train of thought.

An example of this kind of syntax is the “if modifier”/“if postfix form”
of Ruby & also CoffeeScript (very nice languages)

Here’s a example from CoffeeScript main page:

number   = 42
opposite = true

# Conditions:
number = -42 if opposite

… ok, number=42, opposite=true, now number= - 42… WAIT!… only IF “opposite” then number = - 42…

The “WAIT” was branch misprediction on your brain, you had to stop, pull back the train, and parse again.

It was a simple statement, but the complex the code, the more to backtrack. Read the following code:

number   = 42
opposite = true
number = complexFunction.call(obj, oldValue, oldValue+100, baseValue*getFactor()) unless opposite

From this we can derive a simple rule to avoid mental branch misprediction: all conditionals evaluations should precede conditionally executed statements.

JavaScript (another wonderful language) does not suffer from this,
but it presents a very complex branch prediction cases when you use async callbacks and closures.

Example: (node.js)

fs.readFile('test', function(err,data){
    console.log(data.toString());
    if (data.length>1000){ 
        console.log('more than a thousand chars')
    };
});
console.log('data read started');

Again a very simple program, but, follow me while we try to use the tracks junction analogy for this. Suppose the the train is the execution thread:

The train passes by the junction and go straight saying “read this file please”. At a future time, the train magically reappears running at full speed, but now in the diverging track -just after the junction-, then follows this short track, and disappears to nowhere at the end of it.

That was just a funny example. Async is really great, and the perfect tool for a myriad of cases, but sometimes you do not want or need async, and a callback only adds a magic future branch to the source code, and trains appearing and disappearing to nowhere.

It can be even worst, when enough async callbacks accumulate you get
the “pyramid of doom” and end up in “callback hell”.

So: “async is really good”, but not all the time

I love to code, I’ve been programming for a lot of years, and I’ve reached a stage where I really appreciate simplicity, and know how hard is to achieve it. (this blog platform, svbtle, is a good example of “hard to achieve simplicity”)

I am designing a language: LiteScript, -while coding a PEG based compiler/transpiler for it-. One of the design objectives is the kind of simplicity that lets a coder’s mind flow.

The previous async example can be made much more readable
-without losing the async advantages- by using ES6 generators and LiteScript’s “yield until”:

For example, in the nice function below, the callback magic junction is gone and your train of thought can go at full speed.

nice function sequentialRead
    console.log('data read started');
    var data = yield until fs.readFile('test')
    print data.toString()
    if data.length>1000, print 'more than a thousand chars'

Please check LiteScript. It is open source, brand new, still in beta, and I’ll appreciate collaborations.

A teaser: write the following in pure js (node)

get google.com IPs, then reverse DNS (in parallel) #

LiteScript code:

global import dns, nicegen

nice function resolveAndParallelReverse

    try

        var domain = "google.com"

        var addresses:array = yield until dns.resolve domain

        var results = yield parallel map addresses dns.reverse 

        for each index,addr in addresses
            print "#{addr} reverse: #{results[index]}"

    catch err
        print "caught:", err.stack

end nice function