May 142013
 

In Node.js, a single process is used to handle many concurrent clients by taking advantage of the hypothesis that the majority of the time spent responding to server requests involves waiting for I/O, which includes waiting for other networked resources to become available. Node.js and programs written for it are essentially designed to use cooperative multitasking, allowing a central loop to manage tasks and call them when their resources are available. However, as a consequence of JavaScript’s implementation, true continuations are not available, so tasks cannot be written in a single, uniform control flow that suspends the task on I/O, resuming it in place when data is available (however, libraries exist for this purpose).

Instead, an asynchronous approach is used: when an I/O request is made, a callback is also supplied, to be invoked when the data is available. This technique is frequently generalized to include all or most library interfaces, because it provides greater consistency to simply assume that all information is available either asynchronously or synchronously, rather than burdening the programmer with additional requirements to remember which approach a specific function uses. Sometimes this is referred to as continuation passing style, because data is never passed back to a calling context via return; instead it is forwarded to the next function as an argument.

Despite the advantages of an asynchronous structure for multitasking, code that makes frequent use of callbacks, including nesting callbacks, can quickly become spaghetti code and create a maintenance nightmare. Consider this snippet, which uses express and node-mysql to handle a web request by pulling data from a database and returning it in JSON format:

// Note that error handling is omitted for brevity and simplicity
server.post("/:id", function(req, res) {
	var id = parseInt(req.params.id) || 1;
	pool.getConnection(function(err, conn) {
		conn.query('SELECT * FROM `users` WHERE `id` = ?', id, function(err2, rows) {
			res.send(rows[0]);
		});
	});
});

All of the functions are defined locally and anonymously, growing inward as each callback is added. Of course, we can suspend our instinct for good style just this once and not add additional indentation for callbacks, because there will be no code in the parent context, but this is only a small stopgap. Really, the problem is that we’re defining callbacks exactly where they are used, which limits our ability to reuse them. An improvement, then, would be to restructure our code as a series of complete functions, and then pass them by name as callbacks, which permits better reuse and avoids the infinite tower of indentation, but it is not without its own problems:

function send_user_info(res, err, rows) {
	// ... error checking here
	res.send(rows[0]);
}

function get_user_info(id, res, err, conn) {
	// ... error checking here
	conn.query('SELECT * FROM `users` WHERE `id` = ?', id, _.partial(send_user_info, res));
}

function handle_id(req, res) {
	var id = parseInt(req.params.id) || 1;
	pool.getConnection(_.partial(get_user_info, id, res));
}

server.get('/:id', handle_id); 

One immediate problem with this style is that we now have to pass parameters into the other functions, as we cannot rely on creating closures with anonymous functions. The solution I’ve chosen here involves using the underscore library to perform “partial execution” and specify some of the parameters to be passed to the function. Effectively this just wraps the function in an anonymous function that forwards the given arguments, but overall it is much cleaner (and shorter) to write out.

This solution is really not ideal for more complex code, even if it looks acceptable for the example given here. Lots of attempts have been made to deal with this. I could spend the next week going over existing solutions like promises, deferred, and Q, but I would instead like to introduce a different solution that I have been working on for a while, a control flow library that is called flux-link. The library aims to mimic an overall synchronous code flow while providing a shared local state object that can be used to avoid partial execution and (very dangerous) global variables. Let’s start by reworking the above example to use flux-link:

var fl = require('flux-link');

function get_db(env, after) {
	pool.getConnection(after);
}

function get_user_info(env, after, err, conn) {
	// ... error checking
	conn.query('SELECT * FROM `users` WHERE `id` = ?', env.id, after);
}

function send_user_info(env, after, err, rows) {
	// ... error checking
	env.res.out(rows[0]);
	after();
}

handle_id = new fl.Chain(get_db, get_user_info, send_user_info);

server.get('/:id', function(req, res) {
	var init_env = {
		id : parseInt(req.params.id) || 1,
		req : req,
		res : res
	};
	var env = new fl.Environment(init_env, console.log);
	handle_id.call(null, env, null);
}

This code looks very similar to the second example above (intentionally so, this does not use all of the library features), except that we no longer need partial evaluation to specify any of our callbacks. Instead, data is passed through the environment variable, which is made available to each callback in turn, as private, local state. Each callback that is to be used in the chain must accept two specific arguments: env and after, which hold the environment and a reference to the next callback in the chain, but thanks to the magic of partial execution, these are already bound inside each “after” parameter, so user code does not need to be aware of them. This means that chained callbacks can be passed to other library code (such as when after is passed to pool.getConnection()) and will behave as expected, receiving all of the parameters that are normally supplied by the library, in addition to the environment and next callback.

But, when you look closely at it, this code can be improved even further, with better modularity, as the argument passing introduces unnecessary coupling between function prototypes, and even a generic function to set up routes in express and automatically invoke callback chains. To demonstrate this, let’s add another route and another chain to retrieve information about another type of entity tracked on our website, groups, which will have a very similar implementation to users:

var fl = require('flux-link');

function init_db(env, after) {
	pool.getConnection(function(err, conn) {
		// ... Check for connection error
		env.conn = conn;
		after();
	});
}

function get_user_info(env, after) {
	var id = parseInt(env.req.params.id) || 1;
	env.conn.query('SELECT * FROM `users` WHERE `id` = ?', id, after);
}

function get_group_info(env, after) {
	var id = parseInt(env.req.params.id) || 1;
	env.conn.query('SELECT * FROM `groups` WHERE `id` = ?', id, after);
}

function print_result0(env, after, err, rows) {
	// ... error check
	env.res.out(rows[0]);
	after();
}

function create_route_handler(chain) {
	return function(req, res) {
		var init_env = {req : req, res : res};
		env = new fl.Environment(init_env, console.log);
		chain.call(null, env, null);
	}
}

var handle_user = new fl.Chain(init_db, get_user_info, print_result0);
var handle_group = new fl.Chain(init_db, get_group_info, print_result0);

server.get('/users/:id', create_route_handler(handle_user));
server.get('/groups/:id', create_route_handler(handle_group));

Importantly, init_db, create_route_handler, and even print_result0 all can be implemented once as part of an internal library and then used to construct websites with only minimal project-specific code. Furthermore, chains can be constructed out of other chains. So, it is easily possible to create a “pre-routing” type chain that performs mandatory tasks such as establishing a database connection, parsing session data, or whatever else you need, and a “post-routing” chain that might fetch additional data required on every page (say, users online count, for a forum, or notification information). By storing data in the environment, you can designate an output object, write to it during your per-route chain, and then have the final step of the post-route handler be to pass the output object through a template renderer and then send the result to the browser. Although it is much more sophisticated, this is the basic premise of a system I use internally to implement projects.

The flux-link library itself has many additional features, such as proper exception handling, call trace generation (to aid in debugging), and utility functions to replace common tasks (such as checking for errors and throwing exceptions). The project is MIT Licensed, hosted on github and is also available in (hopefully) stable releases on npm. I have plans to expand its capabilities in the future, including a proper looping construct (although loops inside chains can be implemented already, it is a little tricky to do so) as well as adding solutions to any other control flow problems that I may come across that fit with the theme of creating callback chains. Comments, bug reports, suggestions, and any other feedback is welcome, here, on github, or anywhere else that you may be able to reach me.

Apr 122013
 

Yesterday, I ran across the original eyes.js library for Node.js, which provides improved functionality to replace util.inspect(). It seemed nice and did a good job color-coding the output, but it was missing a few features that I wanted, such as the ability to hide certain types (such as null or undefined properties), as well as the ability to specify stacked color attributes on the output. Initially, I opted to modify the original source project, but as it went on I realized I didn’t really like the way it was structured and decided to rewrite the entire module, using colors and underscore to simplify the implementation.

Therefore, I forked the repository and rewrote the entire core library file. My branch is available at https://github.com/gmalysa/spyglass. It is also available to be installed through npm:

npm install spyglass

The README on github has a very in-depth explanation of how to use spyglass, so I won’t repeat all of it here, but the basic idea is that it allows for colored, pretty-printed object inspection, optionally writing the output to a stream (such as the console) or simply saving it as a string. It has a built-in type identification system (which allows for better resolution than the typeof operator) with user-customizable hooks that allow for special display routines to be provided for any user-defined class. Reasonable defaults are provided for all of the built in Javascript classes; more might come for some built in Node.js objects as well. Object properties can be hidden from the output based on exact name matches, regular expressions, or value types (i.e. null properties, function properties, etc.) to avoid unnecessary clutter.

If you’re currently using eyes.js or are just looking for a tool to add powerful inspection/var_dump()-like capabilities to your Node.js project, please give it a try and let me know about any bugs or changes that might be useful!

Dec 182012
 

For a project that I started working on with a friend back in November, we needed to use a project management system for bugs and progress tracking, and we decided to try Trac an open project management system with svn integration written in python.

I run nginx to do all of my web serving needs, so I followed the instructions given for nginx configuration for Trac, and I was a little disappointed to find that the FastCGI script given didn’t do any normal server things, like fork a child process to run in the background. I also wanted to integrate the FastCGI server with the service management tools available in CentOS, so I set to work.

First, I modified the nginx-specific FastCGI script to accept a number of command line arguments, allowing me to use the same script to run multiple track servers for multiple different projects. In total, it accepts a project path, a user, a unix socket to listen for fcgi connections, and a location to store the server process pid. Then, I added the child fork to allow me to run it as a service. Here is the version that I use now:

#!/usr/bin/env python
from optparse import OptionParser
import os
import pwd

parser = OptionParser()
parser.add_option("-s", "--sock", dest="sockaddr", help="Use a specific unix socket", default="/var/trac/run/instance.sock")
parser.add_option("-u", "--user", dest="user", help="Run FCGI process as a specific user", default="nginx")
parser.add_option("-t", "--trac", dest="trac_env", help="Trac environment path to use", default="/var/trac/project")
parser.add_option("-p", "--pid", dest="pidfile", help="Location to store pid file", default="/var/run/trac.fcgi.pid")
(options, args) = parser.parse_args()

os.environ['TRAC_ENV'] = options.trac_env

try:
     from trac.web.main import dispatch_request
     import trac.web._fcgi

     pid = os.fork()
     if pid == 0:
          os.setuid(pwd.getpwnam(options.user)[2])
          fcgiserv = trac.web._fcgi.WSGIServer(dispatch_request, bindAddress = options.sockaddr, umask = 7)
          fcgiserv.run()
     else:
          os.system("echo " + str(pid) + " > " + options.pidfile)

except OSError, oe:
    raise
except SystemExit:
    raise
except Exception, e:
    print 'Content-Type: text/plain\r\n\r\n',
    print 'Oops...'
    print
    print 'Trac detected an internal error:'
    print
    print e
    print
    import traceback
    import StringIO
    tb = StringIO.StringIO()
    traceback.print_exc(file=tb)
    print tb.getvalue()

With a service-capable FastCGI script, now I needed to create an /etc/init.d script to control the process. Because I am running CentOS, I also wanted the script to be compatible with chkconfig so that it can be used with the service utility. This was pretty simple, as I just copied an existing init script and ripped out most of its guts, replacing with a minimal set of controls: start, stop, and restart. Here’s the full init script, which I saved as /etc/init.d/trac-fcgi :

#!/bin/sh
#
# chkconfig:   - 85 15
# description:  Runs trac's fcgi script as a daemon
# processname:  trac.fcgi
# pidfile:     /var/run/trac.fcgi.pid

# Source function library.
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ "$NETWORKING" = "no" ] && exit 0

tracfcgi="/var/trac/trac.fcgi"
prog=$(basename $tracfcgi)
pidfile="/var/run/${prog}.pid"

start() {
    echo -n $"Starting $prog: "
    daemon $tracfcgi
    retval=$?
    echo
    return $retval
}

stop() {
    echo -n $"Stopping $prog: "
    killproc -p $pidfile $prog
    retval=$?
    echo
    return $retval
}

restart() {
	stop
	start
}

rh_status() {
	status $prog
}

rh_status_q() {
    rh_status >/dev/null 2>&1
}

case "$1" in
    start)
        rh_status_q && exit 0
        $1
        ;;
    stop)
        rh_status_q || exit 0
        $1
        ;;
    restart)
        $1
        ;;
    status|status_q)
        rh_$1
        ;;
    condrestart|try-restart)
        rh_status_q || exit 7
        restart
	    ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart}"
        exit 2
esac

Finally, the new init script is added with a simple call to chkconfig:

# chkconfig --add trac-fcgi

If you want to run multiple trac servers with different configurations, you will need to create a copy of the init.d script per configuration and add them using command line arguments on the line that reads daemon $tracfcgi after the $tracfcgi variable. Then, add each configuration’s script to chkconfig using the same method as before.

Nov 152012
 

While discussing game scripting languages with a friend, I asserted that a Lisp-based language such as Scheme or Racket would form a very effective in-game scripting language due to its expressive power and flexibility, as well as the efficiency with which modern, JIT-capable implementations execute code. One of the most widely used scripting languages for this purpose appears to be Lua which is pretty small and designed to be embedded in C programs. I glanced over the feature list on Wikipedia and saw that it was basically a subset of Scheme with some omissions and a few additions. In particular, I really liked the idea of a metatable, which effectively automatically handled function memoization.

The example given on Wikipedia is a metatable-based function to calculate the nth Fibonacci number, a classic example of a simple recursive expression that has exponential time complexity for a naïve implementation, but can be improved to linear time (or, depending on the usage scenario, amortized constant time, for sufficient reuse) with a memoization table. The Lua for this is given (available under the Creative Commons Attribution-ShareAlike license from Wikipedia) as:

fibs = { 1, 1 }
setmetatable(fibs, {
  __index = function(name, n)
    name[n] = name[n - 1] + name[n - 2]
    return name[n]
  end
})

I didn’t recall seeing a Scheme or Racket built-in to provide the same functionality, but I was confident that it would be easy to implement such a language feature using standard Racket built-ins, such as the hash table. So, I started with a simple function that would check if a value for ‘arg’ existed in a given hash table, and if so, return that value, otherwise compute a value by applying the given function, storing the result, and then returning:

(define (memo-or-call ht fun arg)
  (if (hash-has-key? ht arg)
      (hash-ref ht arg)
      (let ((result (fun arg)))
        (hash-set! ht arg result)
        result)))

With this, I was able to define a hash table on my own, set up the default values, and then create an index function, which was pretty effective and enabled me to compute any Fibonacci number that I wanted almost instantly:

(define fibtable (make-hash))
(hash-set! fibtable 0 1)
(hash-set! fibtable 1 1)

(define (fib n)
  (memo-or-call fibtable
                (lambda (k) (+ (fib (- k 1)) (fib (- k 2))))
                n))

This is great, but it felt like it required too much template-like rewriting of code for each function that I wanted to provide table memoization for. I’d have to create a new hash table, add default values, and then use the “internal” function memo-or-call to create my memoization-capable function. To resolve this, I wrote another function that would take a lambda expression and wrap it with a hash table and call to the memoization function:

(define (mt-store-defaults tbl defaults)
  (cond
    [(empty? defaults) tbl]
    [else (hash-set! tbl (caar defaults) (cadar defaults))
          (mt-store-defaults tbl (cdr defaults))]))

(define (make-metatable fun defaults)
  (let ((tbl (mt-store-defaults (make-hash) defaults)))
    (lambda (k)
      (memo-or-call tbl fun k))))

This reduced the complexity of creating a new memorized function considerably, which brought me to functionality that was effectively equivalent to what was possible in Lua. It’s worth noting that I’m strictly dealing with the case where a table is used to memorize computation; not the other usage cases, such as implementing classes or objects, although it would be possible to modify the idea here to include support for the __call magic function in Lua, and it is already possible to store functions in a hash table, as they are still first class objects. Scheme code implementing the now-table-based Fibonacci calculator

(define fib2
  (make-metatable
   (lambda (k) (+ (fib2 (- k 1)) (fib2 (- k 2))))
   '((0 1) (1 1))))

Therefore, in summary, a reasonable implementation of automatic memoization with “metatables” can be achieved with just a few lines of Scheme/Racket, allowing for concise definitions of memorized functions in user code. Note that this is just some example code and not really robust, but feel free to use it as a starting point.

#lang racket

(define (memo-or-call ht fun arg)
  (if (hash-has-key? ht arg)
      (hash-ref ht arg)
      (let ((result (fun arg)))
        (hash-set! ht arg result)
        result)))

(define (mt-store-defaults tbl defaults)
  (cond
    [(empty? defaults) tbl]
    [else (hash-set! tbl (caar defaults) (cadar defaults))
          (mt-store-defaults tbl (cdr defaults))]))

(define (make-metatable fun defaults)
  (let ((tbl (mt-store-defaults (make-hash) defaults)))
    (lambda (k)
      (memo-or-call tbl fun k))))
Aug 312012
 

Earlier this week, I stumbled across McCarthy’s original paper describing LISP, and I felt a motivation to write my own LISP interpreter. I know there are plenty of good ones out there already, but this isn’t really about making one that is better or different from the others. Mostly, I am interested in exploring my own understanding of what LISP programs really mean and, as a part of that, testing my ability as a C programmer to implement a LISP interpreter. I’ve talked with a friend that I will be collaborating with on this project, and we hope to get a basic interpreter up and running over the weekend.

We don’t have a definitive feature list, although we do plan on supporting everything in the original paper, as well as basic garbage collection and an external function interface to define symbols that provide additional functionality to the LISP environment. This is primarily an investigation into what we’re able to do, and, as a part of that, how hard it is to write a functional LISP interpreter, so we’re not planning on too many language features; the focus is mostly on things that are necessary to run the engine itself. If we wanted the kind of functionality found in PLT Scheme/Racket, then we’d just use that to run our programs, but we don’t want to stop with something that can’t be used to do anything useful, either.

We’ve thrown around the idea of making a stand-alone compiler as well, but for now we’re simply going to focus on implementing a REPL and on executing LISP scripts as an interpreter. I know that technically this counts as compilation (in some sense, based on the semantics of the paper), but we’ve also considered making a standalone compiler that will take a script and produce a binary that does not need to re-parse the script every time it is executed.

If you want to follow our progress, we’ve created a public github repository: https://github.com/gmalysa/cs-lisp where you can see all of the code.