The Lone Pole

Yesterday, I had a small issue where someone attempted to brute force the admin password to my blog, resulting in significantly decreased availability for about 10 minutes as my server's resources were maxed out, leading to 45+ second page load times. Luckily, it happened just as I was coming home, so I was able to identify the problem and put a stop to it very quickly. Incidentally, this taught me that wordpress does not have a rate limiting algorithm in place on logins (allowing brute force attacks at the server's maximum request speed). So, I would recommend using a plugin to add rate limiting to logins, which can mitigate the danger associated with a simple scripted attack.

Once the problem was identified as an attack of some kind, the next step was to disable php-fpm in order to drop the load average to an acceptable level so that I could figure out who and what was going on, and then do my best to mitigate it. Incidentally, this will cause nginx to produce a 502 error as the CGI handler is no longer responsive. From /var/log/nginx/blog.access.log:

{IP} - - [01/Apr/2013:16:59:57 +0000] "POST /wp-login.php HTTP/1.1" 200 6969 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:57 +0000] "GET /wp-admin/ HTTP/1.1" 302 5 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:58 +0000] "GET /wp-login.php?redirect_to=http%3A%2F%2Fthelonepole.com%2Fwp-admin%2F&reauth=1 HTTP/1.1" 200 6044 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:58 +0000] "POST /wp-login.php HTTP/1.1" 200 6969 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:59 +0000] "GET /wp-admin/ HTTP/1.1" 302 5 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:59 +0000] "GET /wp-login.php?redirect_to=http%3A%2F%2Fthelonepole.com%2Fwp-admin%2F&reauth=1 HTTP/1.1" 200 6044 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

becomes

{IP} - - [01/Apr/2013:22:45:53 +0000] "GET /wp-admin/ HTTP/1.1" 502 173 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "POST /wp-login.php HTTP/1.1" 502 198 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "GET /wp-admin/ HTTP/1.1" 502 173 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "POST /wp-login.php HTTP/1.1" 502 198 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Interestingly, we can see that whatever brute force attack is being used attempts to spoof the bing search crawling robot in order to prevent me from using a user-agent denial without deindexing myself. This isn't really a big deal, because I don't plan on solving the problem with policies in nginx, so anything in the http request is largely irrelevant.

I've removed the user's IP address, but in this case it was assigned through a cloud services provider, meaning that someone likely set up an account to look for wordpress blogs on the internet and attempt to break into their admin directory, likely to install a plugin and then take over the machine in order to attach it to a botnet. I've emailed their abuse department, but I never heard back and don't really expect them to do anything about it (most companies don't seem to care). I don't expect any of my traffic to come from cloud providers, so the simplest solution is to deny all traffic from the IP in question. There are a number of ways for me to do this, but the best is to drop requests as early as possible in my network stack, because it minimizes the amount of resources that my server will dedicate to processing an illegitimate request. To do this, I use the kernel firewall utility iptables, adding rules to drop all incoming packets (with no acknowledgement at all, which makes it appear as though my server is no longer online) from blacklisted sources as early as possible in the processing chain.

General scanning of the internet, probing for vulnerabilities, and attempting to exploit poorly configured servers is extremely common. I used to get very concerned when I saw logfile entries almost daily indicating that someone had attempted to request files that don't exist or exploit a flaw in how older versions of apache (which I don't use) looks up files. After doing some reading, I relaxed, because it's very common, and servers running up-to-date software will withstand generic attacks (but likely not actual, targeted attempts at breaching). That said, since I have no interest in letting these people continue to scan me for vulnerabilities in order to make me a part of their botnet, and they almost certainly do not form any part of my readership, I put together a small script that added a log message and dropped the packets whenever an IP I disliked attempted to reach the server, to make it quick and easy to deny people whose nonsense was becoming problematic.

#!/bin/bash

if [ -z $1 ]; then
        echo "Usage: $0 <ip address> [reason]";
        exit;
fi

IP=$1
REASON="denied ip"

if [ $# -gt 1 ]; then
        shift;
        REASON=$@
fi

iptables -A INPUT -s {REASON} "
iptables -A INPUT -s ${IP}/32 -j DROP

You can see log messages (including those generated by iptables) in /var/log/messages where you will see a message for each dropped packet. This could potentially cause your log file to balloon, in the case of any kind of flooding, so you may wish to eliminate the log rules, but I generally keep them so that I remember later why an IP address was blocked.

Feel free to use this whenever you find you need to deny access to a specific IP address, to save the trouble of learning iptables' options. If you add lots of entries to your INPUT chain in iptables, all of your network processing will slow down, as each incoming packet is matched against all of the rules in the chain in order until it reaches the end, where it is passed up to the next layer. That said, the slowdown is relatively minor and does not become noticeable until you have added hundreds of rules. If you find yourself in this situation, perhaps investigate another approach, or consider using multiple chains and branching as early as possible, rather than listing all blacklisted IPs sequentially, to limit the maximum chain traversal length.

I generally try to avoid non-technical posts or updates, but I am very excited about an announcement I have: beginning this fall, I will be attending Stanford University to pursue a Ph.D in Electrical Engineering. I hope this will give me more material for clarifying things that I see undergraduates struggle with once I begin TAing courses, as well as updates on research and various projects that I work on. I haven't selected an area of research or primary focus yet, but it's likely that my work will center around the same types of things I try to write about on here: hardware design, software design, and applied mathematics.

Online polls and user voting contests are ubiquitous in the modern internet. Almost every major forum provider supports online polls, they are common on Facebook, and entire third party websites exist to provide polls and related services that can be embedded (such as SodaHead, who provides polling features for the LA times and ESPN SportsNet). And, since March Madness is upon us, almost everyone is running brackets—some for fun, but some carry (substantial) rewards for the winners. How then, does one ensure that a poll is not gamed by people who stand to profit from a specific result?

The fundamental challenge with securing polls and ensuring meaningful results lies in identifying anonymous users and restricting them to one vote per physical person. The nature of the internet complicates this problem: users can spoof their identities, multiple users can share an identity, and software can be written to automate any process that exploits a shortcoming in the poll code itself. At the same time, any anti-cheating strategy must not interfere with the voting process for the average user, and it must not involve a level of effort that a user considers unjustified given the significance of the poll in question. A user may be willing to spend 10 minutes validating his personal identity in order to cast a vote electronically for the Presidential election, but he probably is unwilling to spend more than 10 seconds casting a vote for his favorite brand of soda.

Passive Identification

The two most common methods of identifying a user without requiring him to take action are: (1) IP address tracking, and (2) tracking/persistent cookies. However, both have significant drawbacks that make them unsuitable for use as the primary method of preventing cheating in a polling application. The problem with IP address tracking is that it prevents users who are behind a NAT from differentiating themselves. In many situations, users who connect to the internet do so through either a NAT or a proxy: for instance, users of any wifi connection in a McDonald's, an airport, or even public municipal services will all appear as a single IP address per physical location. In many situations, it is also possible for users to acquire new IP addresses and therefore new identities, by using cloud providers, open proxies, or other tactics. Therefore, anti-cheating strategies that limit each IP address to one vote are unsuitable because they will likely prevent a percentage of legitimate voters from taking part in the poll, but they will not eliminate illegitimate votes from users who wish to spam the poll (although they will not form a significant portion of votes, the loss in legitimate votes will make them more important to the poll's outcome).

The other passive method worth discussing is the use of persistent or so-called "tracking" cookies, which would store an identifier to indicate that a user had already voted. However, this method is laughably easy to circumvent: cookies are not shared between browsers, cookies can easily be cleared by the end user, and most browsers can even be set to simply ignore cookies in the first place. Although some techniques exist for hardening cookies against removal (through questionable tactics), ultimately these can be defeated by the use of non-browser HTTP tools, such as curl. Against polls that use only cookies to prevent users from voting multiple times, it is trivial to configure scripts using curl that are capable of submitting thousands of votes per minute that appear as unique users to the server. Clearly, this is unacceptable from a security perspective

Active Identification

Realizing that passive protection does not offer much against scripted attacks, some sites make a requirement for active user identification in order to vote. This can take many forms, such as (1) requiring a user to register an account and log in to the website before he can vote, (2) requiring the user to authenticate through Facebook, Google, or an OpenID provider, or (3) for the sake of completeness, filling out a CAPTCHA in order to vote. All three of these methods are very intrusive to the user's experience, which means they are unsuitable for some types of "quick" polls, even if they carry prizes for the winners, but they do offer much more robust protection against polling spam.

The first method, which requires the creation of a user account, works for sites that host polls along with forums; most users that would be interested in completing the poll have already registered an account, so there is no additional user burden for them to use the poll. However, this does prevent non-members from participating unless they create an account. For some sites this may function to bring in additional users, but for large aggregators it will often provoke an uncomfortable response from users—"why do they need me to create an account? I don't want an account here; I just want to vote in this poll." With appropriate protection on the account registration process (i.e. the use of CAPTCHAs, activation emails, unique email addresses, etc.), it is possible for a site which uses local accounts to restrict voters to offer a polling experience that will be free from attempted poll spam, because it becomes far too tedious on a per-vote basis and it is difficult to automate.

In the same vein of account-based identification, the use of OpenID, Facebook, or Google accounts for ID is a huge improvement in the user experience. This is a lot less intrusive as it does not require users to create a new account along with associated usernames and passwords on each site, but it does introduce privacy concerns. Giving a third party access to Facebook account information can leak a user's real name and interests or other internet activity to a site that may not be trusted with this information. For this reason, although it was at one time very popular (and is still used or supported by many sites), the use of Facebook accounts for identification is unavailable on many large sites, even though it would effectively limit gaming the polls. General OpenIDs, however, can easily be abused, because it is possible to create a private OpenID endpoint that will authorize any number of unique accounts and script the entire process, which offers no more security than a cookie-based method (that is, none).

Finally, you can require users to complete a CAPTCHA before voting in the poll, which, although it does not actively identify users, will prevent automated programs from spamming the poll with votes. This may be sufficient to deter some malicious users, but not all. Although CAPTCHAs are not secure against all forms of automated attacks, high quality ones are available for free and are frequently updated to defeat the latest in OCR while maintaining human readability. The main downside to CAPTCHAs is the amount of effort for legitimate users, which is why they are generally not seen for something as trivial as an online poll. It takes about 2 seconds to select a polling option and click vote, but it will take 30-45 seconds for most users to fill out a CAPTCHA. For the same reasons that they are likely unwilling to register a throw-away account, most users would see a CAPTCHA attached to a simple poll and decide against voting rather than attempting to solve the puzzle.

Where does that leave us?

So far, it seems as though every method I've described either offers zero protection against automated attempts to poison the poll's outcome or presents too much inconvenience to the user for practical use , either because it requires too much effort or raises privacy concerns. The truth is that there is no silver bullet that magically solves the problem, but my recommendation is the use of a tiered approach, along with a technique that I have not yet seen used. Users who wish to "cheat" at online polls come in roughly three flavors, the curious, the mildly knowledgeable, and the "hacker" or programmer type.

The first of these, the curious user, is basically an average user who knows how to use his web browser but doesn't really understand much about web technology. A number of these users will see a poll, vote, and then upon seeing that their choice is losing, will wonder to themselves "can I vote again somehow?" Most will try refreshing the page or something, and then if they can, vote perhaps a half dozen times before getting bored and moving on. They are a very low priority threat, but they are also easily defeated with a tracking cookie, so, despite its inefficiency against sophisticated spam, the use of tracking cookies to prevent multiple votes is mandatory, both because of how easy it is and because of how many would-be illegitimate votes it can stop with zero impact on other users.

The second type of user, the mildly knowledgeable one, knows about the basics of how the internet works: he's aware of tracking cookies, and he knows that that is how a lot of polls prevent people from voting multiple times. If he wants to try to spam a poll, he will be able to clear his cookies and re-vote multiple times, circumventing the tracking cookie mechanism. He could be stopped by a form of active identification, but the volume of votes that can be submitted by a user like this is limited in scope because he does not understand how to script. A friend of mine recently fell into this category while trying to win an online contest: he was able to vote 3-4 times per minute by recording a screen macro where he would load the page, vote, clear cookies, and then repeat the entire process. Most polling sites rightfully choose to ignore these users because their ability to impact a poll that reaches many thousands of people is minimal.

Finally, we have the "hacker" or programmer type of would-be assailant. He understands how to debug javascript, he knows how HTTP works, and he has a familiarity with scripting and other utilities (such as cURL) that may be available to automate the process. I myself would fall into this category, as would many readers. By spending 5-10 minutes examining the source code for a website that features polling, it is easy to extract the target URL, even if it is submitted with AJAX or uses javascript to rewrite form targets in an attempt to obfuscate the poll's behavior—just as in cryptography, security through obscurity is ineffective here. Furthermore, tools such as the Web Developer Toolbar, Firebug, and jsbeautifier.org make it easy to undo any obfuscation, reducing it to a matter of reading and tracing (sometimes even with a built-in javascript debugger) the code until the poll's submission behavior is captured. Then, it is easy to write a shell script that simply invokes cURL with the correct arguments, which can include form data, referrer data, user agent data, and even fake cookie data if necessary. Using this approach it is possible to vote as fast as your requests are handled by the server which easily works out to thousands per minute. Poll poisoning from power users like this represents the most meaningful threat to the integrity of the result and should be dealt with, but so far I haven't found any online polls that offer meaningful protection against it.

The goal, then, is not to totally prevent sophisticated attacks (although it would be ideal), but instead to reduce them to the point where they cannot have a significant impact on the poll's outcome. Many small things can be used to reduce the effectiveness of scripted attacks without affecting normal users. The first is the use of hidden fields with random values that must match on form submission. Although such fields are traditionally used to mitigate cross site request forgeries, they are of use in decreasing the magnitude of a scripted attack in this case. The attacker can readily load the page before the form, read off the correct value of the field, and include it with his scripted submission, but this process takes time. Because vote requests are normally made without waiting for a response for each request (as the response is irrelevant), a spammer can submit them as fast as the server will accept his requests. However, if he must first load another page completely in order to read off a semi-secret value, the maximum rate at which he can request new pages is significantly reduced.

For example, if they were to implement the token-passing strategy and I wanted to script an attack against Buzzfocus's TV show tournament, I would need to request the page for each vote I wanted to submit, which takes an average of 300 ms for my computer to download from the servers, limiting me to ~200 votes per minute, a significant reduction compared to the thousands (or tens of thousands? I've never tried, but I can't imagine this taking many resources on my end as a large number of votes can be submitted in parallel) of votes I could have submitted without any need for state transfer. This does not eliminate the vulnerability but it does reduce it significantly in scope, which is the best we can hope for. Note that it takes 5-6 seconds for the page to load, but all of the necessary data is actually contained in the original html document—the rest is third party javascript, advertisements, etc. which are all included in "loading time," but which would be irrelevant to a cURL-based attack.

Even 200 votes per minute is still a serious problem. Multiple machines could be involved, cloud providers could be used, and ultimately it'd be well within reach to get back to thousands of votes per minute, although the resource usage is already much higher, limiting the number of people who are willing to put in the effort to cheat at the poll. The next, and most effective step, then, is to identify the behavior of cheaters and target them specifically, which is surprisingly easy to do. Anyone who wants to skew the results of the poll must do so within a specific time frame, which generally means that he will try to vote as fast as possible for a length of time to ensure that his candidate is ahead. The best strategy in this case, is simply to apply a rate-limiting policy to vote submission, which can easily be implemented in software or even within the the webserver itself (nginx, for instance, supports request rate limiting by IP address and target URL, which would form a rudimentary solution to this problem with only a few lines of configuration).

Because users may share IP addresses, rate limiting cannot be totally unintelligent, but there is almost no reason for an IP address to vote more than once a minute. What are the odds of everyone in a single McDonald's coming across a poll at exactly the same time and attempting to submit their choice? There are only a handful of people in McDonald's even using the free wireless at any given part of the day, making this kind of a collision extremely unlikely. However, it can be further improved to reduce this problem anyway. Identify some amount of vote spam that is tolerable (for instance, five votes within a minute, initially) and set it as a threshold for activating the rate limiting procedure—this would allow multiple users, or friends who are all together in one location to vote at the same time. Because the way in which spammers operate is fundamentally different from how normal users operate, simply by tracking the rate of requests and limiting it as it becomes less and likely that it represents a legitimate series of votes, it is possible to reduce the power of a sophisticated spammer to nearly nothing.

Personally, I would outline a basic approach to rate limiting polls that looks something like this: Start by tracking the time at which a vote was submitted for a specific poll for each IP address. If the number of votes submitted in the last minute, ten minutes, or hour, etc. exceeds the threshold, flag the IP address and assign a timeout to the address. When the timeout expires, set a grace period timer of equal length to the timeout and begin accepting votes from the address again. If the number of votes received during this grace period exceeds the threshold again, assign a timeout for twice as long. After this timeout, set another grace period of equal length to the timeout and allow votes again. Repeat this process, doubling the timeout period and the grace period each time. If the grace period expires without exceeding the threshold again, reset the timeouts to the default. The timeout does not need to match the time period used to determine if spamming is occurring, but matching between the two will allow for the best detection, because a "slow" spammer might trigger a threshold of 100 votes in 15 minutes but might not trigger a per-minute detector at the same time unless they have equal values (making one redundant).

For a naïve approach, simply blasting the server with votes, this method will significantly limit the number of spam votes, providing significant defense against a power user's attack. If someone attempts to submit votes constantly, we can determine how many he can get in assuming he starts as soon as the poll goes live and spams continuously for its entire duration of N minutes. Assuming an initial timeout and grace period of $t_0=g_0$ and a doubling of both each time the threshold, $\theta$ is reached, we can find the total number of time periods (the initial pre-detection time plus all subsequent grace periods) during which the spammer may submit additional votes as the smallest integer x for which

$t_0 + \sum\limits_{k=0}^x t_k+g_k \ge N$

The recursive nature of the definition of $t_k$ and the equality between $t_0$ and $g_0$ allows us to reduce the expression significantly:

$t_0 + \sum\limits_{k=0}^x (t_k +g_k) = t_0 (2^{x+2}-1)$

The maximum number of time periods during which we will receive votes, then, is the smallest integer x which satisfies

$x \ge log_2 (\frac{N}{t_0} -1) - 2$

This gives us a logarithmic increase in votes as a function of the poll's duration, which, for a reasonable choice of $\theta$ = 10 and $t_0$ = 1 minute for a poll lasting five days results in at most 110 illegitimate votes due to constant spam from one IP address. However, in the spirit of robustness, if we assume that the attacker knows how the exponential reduction strategy works, we need to analyze what might happen. This enables him to wait out each time period without triggering the threshold and achieve a maximum of $\theta$ - 1 votes per $t_0$ minutes, totaling $(\theta-1)\frac{N}{t_0}$ votes. In the quick spam scenario presented before with one minute timeouts, an attacker could still submit 64,800 votes per IP address, more than enough to alter the outcome of most online polls. Even with a longer timeout (such as 60 minutes) and a similar threshold (say, 30 votes), it is still possible to submit a lot of illegitimate votes, but the overall impact is significantly lower, as the maximal number of spam votes per minute per IP address has been reduced from potentially thousands to less than one.

In retrospect, rate limiting is a rather obvious solution, yet it is not one that I have seen implemented by any poll provider. It offers no additional burden to the normal end user, but it cleanly prevents the power user and even the mid-tier spammers from having significant impact on the poll. I'm surprised that nobody does this yet, given the prevalence of rate limiting in other applications to prevent messaging spam. The method I've outlined for rate limiting is rather basic (and I already have a better one in mind, but this post has gone on far longer than I intended), but the impact of even a basic method such as this can reduce spam votes by several orders of magnitude, making the poll overall more fun for users (people seem to become outraged if they think they were cheated out of something meaningless, for reasons that are unclear to me).

For a project that I started working on with a friend back in November, we needed to use a project management system for bugs and progress tracking, and we decided to try Trac an open project management system with svn integration written in python.

I run nginx to do all of my web serving needs, so I followed the instructions given for nginx configuration for Trac, and I was a little disappointed to find that the FastCGI script given didn't do any normal server things, like fork a child process to run in the background. I also wanted to integrate the FastCGI server with the service management tools available in CentOS, so I set to work.

First, I modified the nginx-specific FastCGI script to accept a number of command line arguments, allowing me to use the same script to run multiple track servers for multiple different projects. In total, it accepts a project path, a user, a unix socket to listen for fcgi connections, and a location to store the server process pid. Then, I added the child fork to allow me to run it as a service. Here is the version that I use now:

#!/usr/bin/env python
from optparse import OptionParser
import os
import pwd

parser = OptionParser()
parser.add_option("-s", "--sock", dest="sockaddr", help="Use a specific unix socket", default="/var/trac/run/instance.sock")
parser.add_option("-u", "--user", dest="user", help="Run FCGI process as a specific user", default="nginx")
parser.add_option("-t", "--trac", dest="trac_env", help="Trac environment path to use", default="/var/trac/project")
parser.add_option("-p", "--pid", dest="pidfile", help="Location to store pid file", default="/var/run/trac.fcgi.pid")
(options, args) = parser.parse_args()

os.environ['TRAC_ENV'] = options.trac_env

try:
     from trac.web.main import dispatch_request
     import trac.web._fcgi

     pid = os.fork()
     if pid == 0:
          os.setuid(pwd.getpwnam(options.user)[2])
          fcgiserv = trac.web._fcgi.WSGIServer(dispatch_request, bindAddress = options.sockaddr, umask = 7)
          fcgiserv.run()
     else:
          os.system("echo " + str(pid) + " > " + options.pidfile)

except OSError, oe:
    raise
except SystemExit:
    raise
except Exception, e:
    print 'Content-Type: text/plain\r\n\r\n',
    print 'Oops...'
    print
    print 'Trac detected an internal error:'
    print
    print e
    print
    import traceback
    import StringIO
    tb = StringIO.StringIO()
    traceback.print_exc(file=tb)
    print tb.getvalue()

With a service-capable FastCGI script, now I needed to create an /etc/init.d script to control the process. Because I am running CentOS, I also wanted the script to be compatible with chkconfig so that it can be used with the service utility. This was pretty simple, as I just copied an existing init script and ripped out most of its guts, replacing with a minimal set of controls: start, stop, and restart. Here's the full init script, which I saved as /etc/init.d/trac-fcgi:

#!/bin/sh
#
# chkconfig:   - 85 15
# description:  Runs trac's fcgi script as a daemon
# processname:  trac.fcgi
# pidfile:     /var/run/trac.fcgi.pid

# Source function library.
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ "$NETWORKING" = "no" ] && exit 0

tracfcgi="/var/trac/trac.fcgi"
prog=tracfcgi)
pidfile="/var/run/${prog}.pid"

start() {
    echo -n $"Starting $prog: "
    daemon $tracfcgi
    retval=$?
    echo
    return $retval
}

stop() {
    echo -n $"Stopping $prog: "
    killproc -p $pidfile $prog
    retval=$?
    echo
    return $retval
}

restart() {
    stop
    start
}

rh_status() {
    status $prog
}

rh_status_q() {
    rh_status >/dev/null 2>&1
}

case "$1" in
    start)
        rh_status_q && exit 0
        $1
        ;;
    stop)
        rh_status_q || exit 0
        $1
        ;;
    restart)
        $1
        ;;
    status|status_q)
        rh_$1
        ;;
    condrestart|try-restart)
        rh_status_q || exit 7
        restart
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart}"
        exit 2
esac

Finally, the new init script is added with a simple call to chkconfig:

# chkconfig --add trac-fcgi

If you want to run multiple trac servers with different configurations, you will need to create a copy of the init.d script per configuration and add them using command line arguments on the line that reads daemon $tracfcgi after the $tracfcgi variable. Then, add each configuration's script to chkconfig using the same method as before.

While discussing game scripting languages with a friend, I asserted that a Lisp-based language such as Scheme or Racket would form a very effective in-game scripting language due to its expressive power and flexibility, as well as the efficiency with which modern, JIT-capable implementations execute code. One of the most widely used scripting languages for this purpose appears to be Lua which is pretty small and designed to be embedded in C programs. I glanced over the feature list on Wikipedia and saw that it was basically a subset of Scheme with some omissions and a few additions. In particular, I really liked the idea of a metatable, which effectively automatically handled function memoization.

The example given on Wikipedia is a metatable-based function to calculate the nth Fibonacci number, a classic example of a simple recursive expression that has exponential time complexity for a naïve implementation, but can be improved to linear time (or, depending on the usage scenario, amortized constant time, for sufficient reuse) with a memoization table. The Lua for this is given (available under the Creative Commons Attribution-ShareAlike license from Wikipedia as:

fibs = { 1, 1 }
setmetatable(fibs, {
  __index = function(name, n)
    name[n] = name[n - 1] + name[n - 2]
    return name[n]
  end
})

I didn't recall seeing a Scheme or Racket built-in to provide the same functionality, but I was confident that it would be easy to implement such a language feature using standard Racket built-ins, such as the hash table. So, I started with a simple function that would check if a value for 'arg' existed in a given hash table, and if so, return that value, otherwise compute a value by applying the given function, storing the result, and then returning:

(define (memo-or-call ht fun arg)
  (if (hash-has-key? ht arg)
      (hash-ref ht arg)
      (let ((result (fun arg)))
        (hash-set! ht arg result)
        result)))

With this, I was able to define a hash table on my own, set up the default values, and then create an index function, which was pretty effective and enabled me to compute any Fibonacci number that I wanted almost instantly:

(define fibtable (make-hash))
(hash-set! fibtable 0 1)
(hash-set! fibtable 1 1)

(define (fib n)
  (memo-or-call fibtable
                (lambda (k) (+ (fib (- k 1)) (fib (- k 2))))
                n))

This is great, but it felt like it required too much template-like rewriting of code for each function that I wanted to provide table memoization for. I'd have to create a new hash table, add default values, and then use the "internal" function memo-or-call to create my memoization-capable function. To resolve this, I wrote another function that would take a lambda expression and wrap it with a hash table and call to the memoization function:

(define (mt-store-defaults tbl defaults)
  (cond
    [(empty? defaults) tbl]
    [else (hash-set! tbl (caar defaults) (cadar defaults))
          (mt-store-defaults tbl (cdr defaults))]))

(define (make-metatable fun defaults)
  (let ((tbl (mt-store-defaults (make-hash) defaults)))
    (lambda (k)
      (memo-or-call tbl fun k))))

This reduced the complexity of creating a new memorized function considerably, which brought me to functionality that was effectively equivalent to what was possible in Lua. It's worth noting that I'm strictly dealing with the case where a table is used to memorize computation; not the other usage cases, such as implementing classes or objects, although it would be possible to modify the idea here to include support for the __call magic function in Lua, and it is already possible to store functions in a hash table, as they are still first class objects. Scheme code implementing the now-table-based Fibonacci calculator

(define fib2
  (make-metatable
   (lambda (k) (+ (fib2 (- k 1)) (fib2 (- k 2))))
   '((0 1) (1 1))))

Therefore, in summary, a reasonable implementation of automatic memoization with "metatables" can be achieved with just a few lines of Scheme/Racket, allowing for concise definitions of memorized functions in user code. Note that this is just some example code and not really robust, but feel free to use it as a starting point.

#lang racket

(define (memo-or-call ht fun arg)
  (if (hash-has-key? ht arg)
      (hash-ref ht arg)
      (let ((result (fun arg)))
        (hash-set! ht arg result)
        result)))

(define (mt-store-defaults tbl defaults)
  (cond
    [(empty? defaults) tbl]
    [else (hash-set! tbl (caar defaults) (cadar defaults))
          (mt-store-defaults tbl (cdr defaults))]))

(define (make-metatable fun defaults)
  (let ((tbl (mt-store-defaults (make-hash) defaults)))
    (lambda (k)
      (memo-or-call tbl fun k))))

Older postsNewer posts