<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Byteworm</title>
	<atom:link href="http://byteworm.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://byteworm.com</link>
	<description>Honk if you&#039;re compiling</description>
	<lastBuildDate>Sun, 26 Feb 2012 02:58:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The Fastest VM Bytecode Interpreter</title>
		<link>http://byteworm.com/2010/11/21/the-fastest-vm-bytecode-interpreter/</link>
		<comments>http://byteworm.com/2010/11/21/the-fastest-vm-bytecode-interpreter/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 14:24:07 +0000</pubDate>
		<dc:creator>bysin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://byteworm.com/?p=91</guid>
		<description><![CDATA[Mat and I were scanning through github one day and a pretty lengthy, complex piece of code caught our eye (Caution: Do not read if you&#8217;re prone to seizures or have a heart condition). This code is one of the many intricacies involved in mono&#8217;s bytecode interpreter, and it was beautiful, at least to us. [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Mat and I were scanning through <i>github</i> one day and a pretty lengthy, complex piece of code <a href="https://github.com/mono/mono/blob/master/mono/interpreter/transform.c">caught our eye</a> (<b>Caution</b>: Do not read if you&#8217;re prone to seizures or have a heart condition). This code is one of the many intricacies involved in mono&#8217;s bytecode interpreter, and it was beautiful, at least to us. Why must it be so complex? How hard can it be? After a lengthy discussion, we decided the best thing to do at this point was have a <b>competition</b> to see which one of us can write the <b>fastest VM bytecode interpreter</b> in two hours. A few insults later (<i>&#8220;Your mother&#8217;s filesystem is so fat etc.&#8221;</i>), we decided to set a few ground rules and agree on a benchmark.</p>
<p>I came up with a pretty simple piece of code that contains addition, multiplication, and conditional branches. I then generated a generic bytecode equivalent of the benchmark to be used in our interpreter.</p>
<table>
<tr>
<th>Benchmark in C</th>
<th colspan="2">Benchmark in Pseudo-bytecode</th>
</tr>
<td width="50%">
<pre class="brush: cpp;">
int s, i, j;
for (s=0,i=0;i&lt;10000;i++) {
	for (j=0;j&lt;10000;j++)
		s+=i*j;
}
printf(&quot;%d\n&quot;,s);
</pre>
</td>
<td>
<b>0:</b> load 0<br />
<b>1:</b> store r0<br />
<b>2:</b> load 0<br />
<b>3:</b> store r1<br />
<b>4:</b> jmp 26<br />
<b>5:</b> load 0<br />
<b>6:</b> store r2<br />
<b>7:</b> jmp 18<br />
<b>8:</b> load r0<br />
<b>9:</b> load r1<br />
<b>10:</b> load r2<br />
<b>11:</b> mul<br />
<b>12:</b> add<br />
<b>13:</b> store r0<br />
<b>14:</b> load r2<br />
<b>15:</b> load 1
</td>
<td>
<b>16:</b> add<br />
<b>17:</b> store r2<br />
<b>18:</b> load r2<br />
<b>19:</b> load 10000<br />
<b>20:</b> cmp<br />
<b>21:</b> blt 8<br />
<b>22:</b> load r1<br />
<b>23:</b> load 1<br />
<b>24:</b> add<br />
<b>25:</b> store r1<br />
<b>26:</b> load r1<br />
<b>27:</b> load 10000<br />
<b>28:</b> cmp<br />
<b>29:</b> blt 5<br />
<b>30:</b> eof
</td>
</tr>
</table>
<p>As a control, the C benchmark runs on the native machine at an average time of <b>0.233 secs</b>. In my first attempt, I wrote a simple C program that reads each instruction and jumps to a corresponding block of code.</p>
<pre class="brush: cpp;">
...
int vm(inst *i) {
	static void *optable[]={
		[OP_NOP]=&amp;&amp;op_nop,	[OP_LDI]=&amp;&amp;op_ldi,	[OP_LDR]=&amp;&amp;op_ldr,
		[OP_STO]=&amp;&amp;op_sto,	[OP_ADD]=&amp;&amp;op_add,	[OP_SUB]=&amp;&amp;op_sub,
		[OP_MUL]=&amp;&amp;op_mul,	[OP_DIV]=&amp;&amp;op_div,	[OP_MOD]=&amp;&amp;op_mod,
		[OP_ORR]=&amp;&amp;op_orr,	[OP_XOR]=&amp;&amp;op_xor,	[OP_AND]=&amp;&amp;op_and,
		[OP_SHL]=&amp;&amp;op_shl,	[OP_SHR]=&amp;&amp;op_shr,	[OP_NOT]=&amp;&amp;op_not,
		[OP_NEG]=&amp;&amp;op_neg,	[OP_CMP]=&amp;&amp;op_cmp,	[OP_BEQ]=&amp;&amp;op_beq,
		[OP_BNE]=&amp;&amp;op_bne,	[OP_BGT]=&amp;&amp;op_bgt,	[OP_BLT]=&amp;&amp;op_blt,
		[OP_BGE]=&amp;&amp;op_bge,	[OP_BLE]=&amp;&amp;op_ble,	[OP_CAL]=&amp;&amp;op_cal,
		[OP_JMP]=&amp;&amp;op_jmp,	[OP_RET]=&amp;&amp;op_ret,	[OP_EOF]=&amp;&amp;op_eof
	};
	int r[4], s[32], *sp=s;
	inst *ip=i;
	...
	op_nop:                         goto *(++ip)-&gt;jmp;
	op_ldi:     *sp++=ip-&gt;arg;      goto *(++ip)-&gt;jmp;
	op_ldr:     *sp++=r[ip-&gt;arg];   goto *(++ip)-&gt;jmp;
	op_sto:     r[ip-&gt;arg]=*--sp;   goto *(++ip)-&gt;jmp;
	op_add:     sp--, sp[-1]+=*sp;  goto *(++ip)-&gt;jmp;
	op_sub:     sp--, sp[-1]-=*sp;  goto *(++ip)-&gt;jmp;
	op_mul:     sp--, sp[-1]*=*sp;  goto *(++ip)-&gt;jmp;
	...
}
</pre>
<p>Click here to download the <a href="/dump/simvm-slow.c">Full Source Code</a>.</p>
<p>This works by creating an array of <b>goto pointers</b> (I believe this is a gcc extension), a pseudo-stack, and a list of registers, then jumping to each instruction while increasing the instruction pointer. This simple virtual machine executed the bytecode in <b>6.421 secs</b>, which was way too slow for my taste, so I had to figure out another approach.</p>
<p>Why don&#8217;t I just compile the bytecode into x86 machine code, like modern JIT VMs? That could easily be my ticket to victory. I had about an hour left in the competition so I made haste. I began to replace the optable full of goto addresses into <b>x86 instructions</b>, then allocated some executable memory, copied the instructions, and jumped to it.</p>
<pre class="brush: cpp;">
...
int vm(inst *i) {
	struct {
		int32_t size, arg, jmp;
		char data[16];
	} *op, optable[]={
		INS(op_nop, 0, 0, 0, 0x90),
		INS(op_ldi, 4, 1, 0, 0x68),
		INS(op_ld0, 0, 0, 0, 0x53),
		...
		INS(op_add, 0, 0, 0, 0x58, LONG 0x01, 0x04, 0x24),
		INS(op_sub, 0, 0, 0, 0x58, LONG 0x29, 0x04, 0x24),
		INS(op_mul, 0, 0, 0, 0x5a, LONG 0x8b, 0x04, 0x24, LONG 0x0f,
			0xaf, 0xc2, LONG 0x89, 0x04, 0x24),
		...
		INS(op_ble, 4, 0, 2, 0x0f, 0x8e),
		INS(op_cal, 4, 0, 1, 0xe8),
		INS(op_jmp, 4, 0, 1, 0xe9),
		INS(op_ret, 0, 0, 0, 0xc3),
		...
	};
	...
	if (!(pn=mmap(0,m,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0)))
		return 0;
	...
	((void(*)())pn)();
	printf(&quot;%d\n&quot;,r0);
	return 0;
}
</pre>
<p>Click here to download the <a href="/dump/simvm.c">Full Source Code</a>.</p>
<p>On runtime this created a small, 122 byte x86 program based upon the benchmark bytecode which clocked in at an average speed of <b>0.518 secs</b>. This was only around twice as slow as the control so I was fairly confident at this point.</p>
<p>I slickly inquired into what Mat was working on, and he informed me he was writing his bytecode interpreter in <b>Visual Basic.NET</b>. I was a bit skeptical at first, considering he did not know Visual Basic, but was reassured he wasn&#8217;t joking. Evidently he taught himself Visual Basic in the span of 2 hours to what amounts to be the <i>ultimate coding troll</i>. He&#8217;s not one to lose these competitions, so I assumed he has some trick up his sleeve. He submitted his code for approval:</p>
<pre class="brush: vb;">
	...
	For ip = 0 to ops.Length - 1
		Dim i as VMI = ops(ip)
		il.MarkLabel(jmp(ip))
		Select Case i.Opcode
			Case &quot;ldi&quot;
				il.Emit(Opcodes.Ldc_i4, i.Operand)
			Case &quot;ldr&quot;
				il.Emit(Opcodes.Ldloc, i.Operand)
			Case &quot;sto&quot;
				il.Emit(Opcodes.Stloc, i.Operand)
			Case &quot;jmp&quot;
				il.Emit(Opcodes.Br_S, jmp(i.operand))
			Case &quot;mul&quot;
				il.Emit(Opcodes.Mul)
			Case &quot;add&quot;
				il.Emit(Opcodes.Add)
			Case &quot;eof&quot;
				il.Emit(Opcodes.Ldloc, 0)
				il.Emit(Opcodes.Ret)
			Case &quot;cmp&quot;
				Select Case (ops(ip+1).opcode)
					Case &quot;blt&quot;
						il.Emit(Opcodes.Blt, jmp(ops(ip+1).operand))
					Case Else
						Console.WriteLine(&quot;unsupported branch: {0}&quot;, ops(ip+1).opcode)
				End Select
				ip = ip + 1
			Case Else
				Console.WriteLine(&quot;unsupported opcode: {0}&quot;, i.Opcode)
		End Select
	Next

	Console.WriteLine(&quot;{0}&quot;, _
		CType(program.CreateDelegate(GetType(tmplP1(Of Long, Integer))), tmplP1(Of Long, Integer))(0))
	...
</pre>
<p>Click here to download the <a href="/dump/simvm.vb">Full Source Code</a>.</p>
<p>Once compiled, his code ran the benchmark at an average speed of <b>0.127 secs</b>&#8230;.. Wait, what?</p>
<pre>
# vbnc simvm.vb &#038;&#038; time mono simvm.exe
857419840

real	0m0.127s
user	0m0.120s
sys	0m0.000s
</pre>
<p>I wouldn&#8217;t have believed it if I didn&#8217;t see it myself. My code generates native Assembly&#8230; Assembly! And his is written in Visual Basic. I&#8217;m sure there is some trickery going on, like mono optimizing the emitted instructions, but I haven&#8217;t as of yet ruled out witchcraft. I was forced to conclude that <b>Visual Basic is faster than Assembly</b>, that I&#8217;m a horrible coder, and Mat wins.</p>
<table>
<tr>
<th>Program</th>
<th>Benchmark Time</th>
</tr>
<tr>
<td>Control</td>
<td>0.233 secs</td>
</tr>
<tr>
<td>Ben&#8217;s C/x86 ASM VM</td>
<td>0.518 secs</td>
</tr>
<tr>
<td>Mat&#8217;s Visual Basic VM</td>
<td>0.127 secs</td>
</tr>
</table>
<p>&nbsp;</p>
<p><b>UPDATE:</b> Its been mentioned that I didn&#8217;t compile the control with optimization on. I turned optimization off because gcc is way too damn smart. It almost literally translated the code into &#8216;printf(&#8220;857419840\n&#8221;);&#8217;.  I think a better example would be if we didn&#8217;t give gcc the answer on compile-time, since none of the VM&#8217;s were given that opportunity until it read the instructions on run-time. The VM&#8217;s did not, and could not know ahead of time the loop amount, or even the general flow of the bytecode for that matter. So by saving the loop amount in a variable declared as volatile, you prevent gcc from optimizing it out:</p>
<pre class="brush: cpp;">
#include &lt;stdio.h&gt;

int main(int argc, char **argv) {
    volatile int k=10000;
    int s, i, j;
    for (s=0,i=0;i&lt;k;i++) {
        for (j=0;j&lt;k;j++)
            s+=i*j;
    }
    printf(&quot;%d\n&quot;,s);
    return 0;
}
</pre>
<p>That code compiled with -O3 runs at <b>0.085 sec</b> on my machine. Surprisingly its only 66% faster then the Visual Basic example.</p>
]]></content:encoded>
			<wfw:commentRss>http://byteworm.com/2010/11/21/the-fastest-vm-bytecode-interpreter/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Free Content Delivery Network using DNS cache</title>
		<link>http://byteworm.com/2010/10/27/free-content-delivery-network-using-dns-cache/</link>
		<comments>http://byteworm.com/2010/10/27/free-content-delivery-network-using-dns-cache/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 00:56:11 +0000</pubDate>
		<dc:creator>bysin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://byteworm.com/?p=60</guid>
		<description><![CDATA[Why spend money on expensive CDN hosting when there&#8217;s a perfectly good, free, global one available? Thats right, DNS cache. Most open recursive DNS servers will cache requests (A, CNAME, PTR, TXT, etc.) for the length of the specified TTL value, and there&#8217;s millions of them worldwide. Once a public DNS server has the records [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Why spend money on expensive <b>CDN hosting</b> when there&#8217;s a perfectly good, free, global one available? Thats right, <b>DNS cache</b>. Most open recursive DNS servers will cache requests (<i>A, CNAME, PTR, TXT, etc.</i>) for the length of the specified TTL value, and there&#8217;s millions of them worldwide. Once a public DNS server has the records in cache (usually after a single request), it requires <b>no further bandwidth from the originating server</b>.</p>
<p>Unfortunately there&#8217;s a limit to the size of a record a DNS server will cache, and a limit to the length of the DNS packet itself. To store files using DNS cache we must encode the file and split it into multiple records. We&#8217;re going to use <b>TXT records</b> for this example, which is limited by 255 characters.</p>
<pre>
<b>file1.part1.cdn</b> 14400 IN TXT
"ICAgICAgQ2FuYWRhIEludmFzaW9uIFBsYW4KICAgIFRPUCBTRUNSRVQg
IENPTkZJREVOVElBTAotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tL
QoKU3RlcCAxKSBBcm0gYmVhdmVycyB3aXRoIHJpZmxlcwpTdGVw"

<b>file1.part2.cdn</b> 14400 IN TXT
"IDIpIFRyYWluIG1vbmtleXMgdG8gam91c3QKU3RlcCAzKSBQcm9maXQ
KCldlIGhhdmUgYSBncm91cCB0aGF0IG1lZXRzIEZyaWRheXMgYXQgbWl
kbmlnaHQgdW5kZXIgdGhlCmJyb29rbHluIGJyaWRnZSBhbmQgdGhlIHBh"

<b>file1.part3.cdn</b> 14400 IN TXT
"c3N3b3JkIGlzIHNpYyBzZW1wZXIgdHlyYW5uaXMuCg=="
<br/>
</pre>
<p>The receiver simply has to request all parts of the file, reassemble, and decode it. I&#8217;ve included an example program that does just that (both CDN client and server).</p>
<pre>
<b># ./server --path example_data</b>
...

<b># ./client --domain virtserve.com --list</b>
Inode      Size         Path
------------------------------------
4068250    254          Epicfail.txt
4068229    283          Important_Plan.txt

<b># ./client --domain virtserve.com --get 4068250</b>
&lt;gh0st-&gt; epicfail.c?
&lt;matja&gt; bysin wrote that
&lt;bysin&gt; its 3000 lines of nothing but preprocessor macros that turns gcc into a tetris game
&lt;matja&gt; if you distcc it, can you play multplayer?
&lt;bysin&gt; hold on, i'll #include you on the next round
&lt;matja&gt; thx
<br/>
</pre>
<p>In the program above, the first request for a file uses the CDN server and any subsequent requests do not, since the <b>public DNS server</b> has it in cache. I look forward to seeing streaming videos via DNS in the future.</p>
<p>Click here to download the <a href="/dump/dnscdn.tgz">DNS CDN Source Code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://byteworm.com/2010/10/27/free-content-delivery-network-using-dns-cache/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>SSE4.2 and the new CRC32 instruction</title>
		<link>http://byteworm.com/2010/10/13/crc32/</link>
		<comments>http://byteworm.com/2010/10/13/crc32/#comments</comments>
		<pubDate>Wed, 13 Oct 2010 09:13:18 +0000</pubDate>
		<dc:creator>bysin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://byteworm.com/?p=40</guid>
		<description><![CDATA[For those of you who have the new Nehalem processor from Intel, there&#8217;s an interesting new instruction that is used to speed up calculating checksums called CRC32. This instruction is part of the SSE4.2 set, and just like most SSE instructions, its fairly useless. But I just spent my hard earned money on a new [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>For those of you who have the new <i>Nehalem</i> processor from Intel, there&#8217;s an interesting new instruction that is used to speed up calculating checksums called <b>CRC32</b>. This instruction is part of the <b>SSE4.2</b> set, and just like most SSE instructions, its fairly useless. But I just spent my hard earned money on a new processor and I&#8217;ll be damned if I don&#8217;t get my moneys worth, so here&#8217;s my evaluation of CRC32.</p>
<p>We&#8217;ll start off with a standard 32-bit checksum function:</p>
<pre class="brush: cpp;">
uint32_t slowcrc_table[1&lt;&lt;8];

void slowcrc_init() {
	uint32_t i, j, a;

	for (i=0;i&lt;(1&lt;&lt;8);i++) {
		a=((uint32_t)i)&lt;&lt;24;
		for (j=0;j&lt;8;j++) {
			if (a&amp;0x80000000)
				a=(a&lt;&lt;1)^0x11EDC6F41;
			else
				a=(a&lt;&lt;1);
		}
		slowcrc_table[i]=a;
	}
}

uint32_t slowcrc(char *str, uint32_t len) {
	uint32_t lcrc=~0;
	char *p, *e;

	e=str+len;
	for (p=str;p &lt; e;++p)
		lcrc=(lcrc&gt;&gt;8)^slowcrc_table[(lcrc^(*p))&amp;0xff];
	return ~lcrc;
}
</pre>
<p>Not including the table setup, the standard checksum function took <b>0.30 seconds</b> to process a random 64 MB string. Unfortunately, the compiler I&#8217;m using currently doesn&#8217;t support SSE4.2 instructions, so I&#8217;m forced to write the hardware checksum function in byte code.</p>
<pre class="brush: cpp;">
uint32_t fastcrc(char *str, uint32_t len) {
	uint32_t q=len/sizeof(uint32_t),
		r=len%sizeof(uint32_t),
		*p=(uint32_t*)str, crc;

	crc=0;
	while (q--) {
		__asm__ __volatile__(
			&quot;.byte 0xf2, 0xf, 0x38, 0xf1, 0xf1;&quot;
			:&quot;=S&quot;(crc)
			:&quot;0&quot;(crc), &quot;c&quot;(*p)
		);
		p++;
	}

	str=(char*)p;
	while (r--) {
		__asm__ __volatile__(
			&quot;.byte 0xf2, 0xf, 0x38, 0xf0, 0xf1&quot;
			:&quot;=S&quot;(crc)
			:&quot;0&quot;(crc), &quot;c&quot;(*str)
		);
		str++;
	}

	return crc;
}
</pre>
<p>The hardware accelerated checksum instruction processed the same 64 MB of random data in <b>0.05 seconds</b>. That&#8217;s around 6 times faster then the standard checksum function.</p>
<table>
<tr>
<th>Function</th>
<th>Average Exec. Time</th>
</tr>
<tr>
<td>slowcrc</td>
<td>0.303066 seconds</td>
</tr>
<tr>
<td>fastcrc</td>
<td>0.052982 seconds</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://byteworm.com/2010/10/13/crc32/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Containers, Templates, Lambda Expressions in C</title>
		<link>http://byteworm.com/2010/10/12/container/</link>
		<comments>http://byteworm.com/2010/10/12/container/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 17:50:45 +0000</pubDate>
		<dc:creator>bysin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://byteworm.com/?p=1</guid>
		<description><![CDATA[Not many people know this, but the C language (with the help of gcc extensions) can support templates and lambda expressions. I know I&#8217;m going to get emails / comments about how I butchered the C language, and how Dennis Ritchie is turning in his grave (hes dead right?). So let me start out with [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Not many people know this, but the C language (with the help of <i>gcc extensions</i>) can support templates and lambda expressions.</p>
<p>I know I&#8217;m going to get emails / comments about how I butchered the C language, and how <b>Dennis Ritchie</b> is turning in his grave (hes dead right?). So let me start out with a word of caution, this is for educational use only, and is not intended to be used in production code.</p>
<p>We start off with a little known <i>gcc extension</i> that allows functions to be declared inside of functions. These functions are <b>normally</b> only accessible from inside the scope in which it was declared.</p>
<pre class="brush: cpp;">
void parent() {
    int child() {
        return 42;
    }
    printf(&quot;%d\n&quot;,child());
}
</pre>
<p>Theres also a <i>gcc extension</i> that allows you to nest instructions inside of scope operators, where the instructions are evaluated and returned.</p>
<pre class="brush: cpp;">
void parent() {
    int a = ({ int n=17; n+=25; n; });
    printf(&quot;%d\n&quot;,a);
}
</pre>
<p>If we were to combine these two examples we can have a scope operator return a function pointer to a nested function. The result would look very similar to a <b>lambda expression</b>, but in C! <small>FYI: You can use the dollar sign as a symbol name.</small></p>
<pre class="brush: cpp;">
#include &lt;stdio.h&gt;

void eval(int(*func)(char*)) {
	func(&quot;lambda&quot;);
	func(&quot;expression&quot;);
}

void main() {
	int (*func)(char*);
	func = ({ int $(char*str){ printf(&quot;Test: %s\n&quot;,str); } $; });
	eval(func);
}
</pre>
<p>And here is the expected output:</p>
<pre>
Test: lambda
Test: expression
</pre>
<p>So here&#8217;s where it gets tricky. You can create a <b>template class</b> in C if you combine lambda expressions with structures and an incredibly large macro (what <b>Mat Anger</b> likes to call <i>Uber-Macros &copy;2010</i>).</p>
<pre class="brush: cpp;">
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

typedef struct stack {
    int (*push)(struct stack*, ...);
    void *(*pop)(struct stack*);
    void *ptr;
    int length;
} *stack;

#define stack_new(T)                                                  \
    ({                                                                \
        stack __n;                                                    \
        if ((__n=calloc(sizeof(struct stack),1))) {                   \
                                                                      \
            __n-&gt;push=(void*)({int $(stack n, T p) {                  \
                T *np;                                                \
                if (!(np=realloc(n-&gt;ptr,sizeof(*np)*(n-&gt;length+1))))  \
                    return 0;                                         \
                np[n-&gt;length++]=p;                                    \
                n-&gt;ptr=np;                                            \
                return 1;                                             \
            }$;});                                                    \
                                                                      \
            __n-&gt;pop=(void*)({T *$(stack n){                          \
                T *np;                                                \
                if (!n-&gt;length)                                       \
                    return (T*)0;                                     \
                np=n-&gt;ptr;                                            \
                return &amp;np[--n-&gt;length];                              \
            }$;});                                                    \
                                                                      \
        }                                                             \
        __n;                                                          \
    })

void main() {
    stack p;
    int *num;
    char **str;

    p=stack_new(int);
    p-&gt;push(p,42);
    p-&gt;push(p,666);
    printf(&quot;%d\n&quot;,*(num=p-&gt;pop(p)));
    printf(&quot;%d\n&quot;,*(num=p-&gt;pop(p)));

    p=stack_new(char*);
    p-&gt;push(p,&quot;template&quot;);
    p-&gt;push(p,&quot;class&quot;);
    printf(&quot;%s\n&quot;,*(str=p-&gt;pop(p)));
    printf(&quot;%s\n&quot;,*(str=p-&gt;pop(p)));

}
</pre>
<p>As you can see, we created a pseudo-template that took a type, and it generated function pointers based upon the type. Here is the expected output:</p>
<pre>
666
42
class
template
</pre>
<p>Using those techniques I was able to create a small <b>container library</b> for C. It has map, list, and array objects that can be used in a syntax almost as easy as its C++ counterpart:</p>
<pre class="brush: cpp;">
c_object n;

n=c_array_new(int);
c_insert(n,5);
c_insert(n,512);
c_insert(n,-125);
c_foreach(n,t_int) {
	printf(&quot;%d\n&quot;,t_int);
}
c_free(n);
</pre>
<pre class="brush: cpp;">
c_object n;
c_pair(char*,int) t_map;

n=c_map_new(char*,int);
c_append(n,&quot;test&quot;,4);
c_append(n,&quot;foo&quot;,12345);
c_append(n,&quot;bar&quot;,-423);
printf(&quot;foo is most definitely %d\n&quot;,c_at(n,&quot;foo&quot;));
c_erase(n,&quot;foo&quot;);
c_foreach(n,t_map) {
	printf(&quot;%s = %d\n&quot;,t_map.left,t_map.right);
}
c_free(n);
</pre>
<p>Click here to download the <a href="/dump/container.tgz">Container Source Code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://byteworm.com/2010/10/12/container/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

