In Node.js, a single process is used to handle many concurrent clients by taking advantage of the hypothesis that the majority of the time spent responding to server requests involves waiting for I/O, which includes waiting for other networked resources to become available. Node.js and programs written for it are essentially designed to use cooperative multitasking, allowing a central loop to manage tasks and call them when their resources are available. However, as a consequence of JavaScript’s implementation, true continuations are not available, so tasks cannot be written in a single, uniform control flow that suspends the task on I/O, resuming it in place when data is available (however, libraries exist for this purpose).

Instead, an asynchronous approach is used: when an I/O request is made, a callback is also supplied, to be invoked when the data is available. This technique is frequently generalized to include all or most library interfaces, because it provides greater consistency to simply assume that all information is available either asynchronously or synchronously, rather than burdening the programmer with additional requirements to remember which approach a specific function uses. Sometimes this is referred to as continuation passing style, because data is never passed back to a calling context via return; instead it is forwarded to the next function as an argument.

Despite the advantages of an asynchronous structure for multitasking, code that makes frequent use of callbacks, including nesting callbacks, can quickly become spaghetti code and create a maintenance nightmare. Consider this snippet, which uses express and node-mysql to handle a web request by pulling data from a database and returning it in JSON format:

// Note that error handling is omitted for brevity and simplicity
server.post("/:id", function(req, res) {
    var id = parseInt(req.params.id) || 1;
    pool.getConnection(function(err, conn) {
        conn.query('SELECT * FROM `users` WHERE `id` = ?', id, function(err2, rows) {
            res.send(rows[0]);
        });
    });
});

All of the functions are defined locally and anonymously, growing inward as each callback is added. Of course, we can suspend our instinct for good style just this once and not add additional indentation for callbacks, because there will be no code in the parent context, but this is only a small stopgap. Really, the problem is that we're defining callbacks exactly where they are used, which limits our ability to reuse them. An improvement, then, would be to restructure our code as a series of complete functions, and then pass them by name as callbacks, which permits better reuse and avoids the infinite tower of indentation, but it is not without its own problems:

function send_user_info(res, err, rows) {
    // ... error checking here
    res.send(rows[0]);
}

function get_user_info(id, res, err, conn) {
    // ... error checking here
    conn.query('SELECT * FROM `users` WHERE `id` = ?', id, _.partial(send_user_info, res));
}

function handle_id(req, res) {
    var id = parseInt(req.params.id) || 1;
    pool.getConnection(_.partial(get_user_info, id, res));
}

server.get('/:id', handle_id); 

One immediate problem with this style is that we now have to pass parameters into the other functions, as we cannot rely on creating closures with anonymous functions. The solution I've chosen here involves using the underscore library to perform "partial execution" and specify some of the parameters to be passed to the function. Effectively this just wraps the function in an anonymous function that forwards the given arguments, but overall it is much cleaner (and shorter) to write out.

This solution is really not ideal for more complex code, even if it looks acceptable for the example given here. Lots of attempts have been made to deal with this. I could spend the next week going over existing solutions like promises, deferred, and Q, but I would instead like to introduce a different solution that I have been working on for a while, a control flow library that is called flux-link. The library aims to mimic an overall synchronous code flow while providing a shared local state object that can be used to avoid partial execution and (very dangerous) global variables. Let's start by reworking the above example to use flux-link:

var fl = require('flux-link');

function get_db(env, after) {
    pool.getConnection(after);
}

function get_user_info(env, after, err, conn) {
    // ... error checking
    conn.query('SELECT * FROM `users` WHERE `id` = ?', env.id, after);
}

function send_user_info(env, after, err, rows) {
    // ... error checking
    env.res.out(rows[0]);
    after();
}

handle_id = new fl.Chain(get_db, get_user_info, send_user_info);

server.get('/:id', function(req, res) {
    var init_env = {
        id : parseInt(req.params.id) || 1,
        req : req,
        res : res
    };
    var env = new fl.Environment(init_env, console.log);
    handle_id.call(null, env, null);
}

This code looks very similar to the second example above (intentionally so, this does not use all of the library features), except that we no longer need partial evaluation to specify any of our callbacks. Instead, data is passed through the environment variable, which is made available to each callback in turn, as private, local state. Each callback that is to be used in the chain must accept two specific arguments: env and after, which hold the environment and a reference to the next callback in the chain, but thanks to the magic of partial execution, these are already bound inside each "after" parameter, so user code does not need to be aware of them. This means that chained callbacks can be passed to other library code (such as when after is passed to pool.getConnection()) and will behave as expected, receiving all of the parameters that are normally supplied by the library, in addition to the environment and next callback.

But, when you look closely at it, this code can be improved even further, with better modularity, as the argument passing introduces unnecessary coupling between function prototypes, and even a generic function to set up routes in express and automatically invoke callback chains. To demonstrate this, let's add another route and another chain to retrieve information about another type of entity tracked on our website, groups, which will have a very similar implementation to users:

var fl = require('flux-link');

function init_db(env, after) {
    pool.getConnection(function(err, conn) {
        // ... Check for connection error
        env.conn = conn;
        after();
    });
}

function get_user_info(env, after) {
    var id = parseInt(env.req.params.id) || 1;
    env.conn.query('SELECT * FROM `users` WHERE `id` = ?', id, after);
}

function get_group_info(env, after) {
    var id = parseInt(env.req.params.id) || 1;
    env.conn.query('SELECT * FROM `groups` WHERE `id` = ?', id, after);
}

function print_result0(env, after, err, rows) {
    // ... error check
    env.res.out(rows[0]);
    after();
}

function create_route_handler(chain) {
    return function(req, res) {
        var init_env = {req : req, res : res};
        env = new fl.Environment(init_env, console.log);
        chain.call(null, env, null);
    }
}

var handle_user = new fl.Chain(init_db, get_user_info, print_result0);
var handle_group = new fl.Chain(init_db, get_group_info, print_result0);

server.get('/users/:id', create_route_handler(handle_user));
server.get('/groups/:id', create_route_handler(handle_group));

Importantly, init_db, create_route_handler, and even print_result0 all can be implemented once as part of an internal library and then used to construct websites with only minimal project-specific code. Furthermore, chains can be constructed out of other chains. So, it is easily possible to create a "pre-routing" type chain that performs mandatory tasks such as establishing a database connection, parsing session data, or whatever else you need, and a "post-routing" chain that might fetch additional data required on every page (say, users online count, for a forum, or notification information). By storing data in the environment, you can designate an output object, write to it during your per-route chain, and then have the final step of the post-route handler be to pass the output object through a template renderer and then send the result to the browser. Although it is much more sophisticated, this is the basic premise of a system I use internally to implement projects.

The flux-link library itself has many additional features, such as proper exception handling, call trace generation (to aid in debugging), and utility functions to replace common tasks (such as checking for errors and throwing exceptions). The project is MIT Licensed, hosted on github and is also available in (hopefully) stable releases on npm. I have plans to expand its capabilities in the future, including a proper looping construct (although loops inside chains can be implemented already, it is a little tricky to do so) as well as adding solutions to any other control flow problems that I may come across that fit with the theme of creating callback chains. Comments, bug reports, suggestions, and any other feedback is welcome, here, on github, or anywhere else that you may be able to reach me.

Recently, I wrote a post about using iptables to block certain IPs from accessing my server, making it drop traffic without acknowledgement, so that I would appear to no longer exist on the internet. My intent was to mitigate a small brute force login problem I was seeing on my server from one IP address. However, since then, I've learned from a variety of sources (including Ars Technica and pretty much every other electronic news site that this was likely just a small window into a much larger botnet attack that is attempting to compromise wordpress installs using many machines at once. Obviously, in this case, manually blacklisting each IP address is infeasible. I recommended installing Limit Login Attempts as a good plugin to prevent botnets from attempting to brute force one's admin page, but now it is time to go a step further. I would like to expand the blacklisting to simply drop traffic from anyone who exceeds the login limit, in order to save my server the processing time for handling illegitimate requests.

In order to do this, I'm going to be using a program known as fail2ban which monitors log files for pre-specified regular expressions that identify undesirable behavior, and then blocks traffic from the IP addresses generating this behavior. The bans are timed and are added and removed automatically by the fail2ban daemon program, with configuration options for the duration, number of failures necessary to trigger a ban, etc. By combining fail2ban with Limit Login Attempts, we can leverage both a Wordpress-level solution to identify problem users easily and a kernel firewall-level solution to remove them from the server with minimal impact.

Fail2ban operates on log file parsing, which means it is incompatible with currently released versions of Limit Login Attempts. I will talk to the author and see if I can get him to incorporate writing to an actual log file in addition to logging in the MySQL database running wordpress, but for now, we'll simply hack in the functionality we need, because it only takes a few lines. In limit-login-attempts.php, in the function limit_login_notify_log, which starts around line 563, we'll append the following lines to the end of the function, making it write a simple message with the user's IP address to a pre-specified and hardcoded log file name whenever a lockout is also written to the database:

$file = fopen(LIMIT_LOGIN_FILE_PATH, 'at');
if ($file) {
    // Note that the 't' flag will offer \n to \r\n translation on windows
    fwrite($file, date('M j h:i:s ') . sprintf(__('IP locked out by wp-limit-login: %s', 'limit-login-attempts'), $ip) . "\n");
    fclose($file);
}

Then, we also need to add a line defining the constant LIMIT_LOGIN_FILE_PATH, so I've added define(LIMIT_LOGIN_FILE_PATH, '/var/log/nginx/wp-limit-login.log'); at the top of the file. Perfect, now whenever logging is enabled (and why would you disable it?), the offending user's information will be written to both the MySQL database to display in the Limit Login Attempts dashboard and to the specified log file. Make sure the file path you choose for the log file is writeable by whichever process runs your php instance (be it apache, php-fpm, etc). This log file is very simple, which suits my purposes, but you may want to augment yours with some other information, such as potentially the name of the blog, if you're running a multi-site installation, or the number of times an IP has been locked out. It's also entirely possible to put this file inside your wordpress install, if you don't have access to /var/log for instance. If you put it inside the plugin directory itself, the standard wordpress rewrite rules should prevent clients from accessing it, but I haven't verified this, so I'd recommend keeping it outside of your webroot to avoid disclosing the IP addresses (we're responsible administrators, even if the people we're banning aren't).

Next, we add a filter to the fail2ban configuration. Filters are defined in /etc/fail2ban/filter.d, one per file. I suggest a simple name, like wplogin.conf. Inside this filter, we configure the regular expression to capture a failure and an optional regular expression to ignore, which trumps a failure line. Based on the output above, our filter is very simple:

# Capture failed login attempts through wordpress limit logins module

[Definition]

# default message is "IP locked out by wp-limit-login: <ip address>"
failregex = [^:]*: <HOST>

ignoreregex =

With the filter in place, we must add a jail configuration to fail2ban. For that, we edit /etc/fail2ban/jail.conf and add a new jail that references the filter we just created. I would strongly recommend that you add your own IP address on the ignoreip line , in CIDR notation, because if not, all of your traffic to your server will be dropped (for a specified amount of time), which will make it impossible for you to access.

[wplogin]
enabled = true
filter = wplogin
action = iptables-allports[name=wplogin]
logpath = /var/log/nginx/wp-limit-login.log
maxretry = 1
bantime = 600
ignoreip = 127.0.0.1/32

Because Limit Login Attempts already counts how many times someone has failed to log in, an entry here means they should be banned. Therefore, we set the retry count to just one, so that it will ban them on their first attempt. We also use the iptables-allports action to ban the user's IP address, which does almost exactly the same thing as the script I presented before, except that it uses a separate fail2ban chain and names the rules so that it can go back and remove them later with minimal fuss (i.e. it is more sophisticated). I would also recommend increasing the ban time, as the default is only 10 minutes. Unless you run into problems with actual readers being banned (unlikely, as the botnet appears to run on other servers, not on infected clients), you can easily set this to an hour, or a day, at your discretion.

After that, simple launch fail2ban by executing:

# fail2ban-client start

and fail2ban will begin monitoring the log file. You may notice that fail2ban gives a warning on startup saying that the log file doesn't exist. If this happens, create the file, change its owner to match the account that will be writing to it, and restart fail2ban:

# touch /var/log/nginx/wp-limit-login.log
# chown nginx /var/log/nginx/wp-limit-login.log
# fail2ban-client reload

This should combine the power of Limit Login Attempts to protect your installation from brute force password attacks and fail2ban to drop traffic to your server, improving its effectiveness. The combination will reduce the amount of server resources spent dealing with illegitimate requests, which can turn an almost-DoS caused by brute force password hammering into a non-issue.

Yesterday, I ran across the original eyes.js library for Node.js, which provides improved functionality to replace util.inspect(). It seemed nice and did a good job color-coding the output, but it was missing a few features that I wanted, such as the ability to hide certain types (such as null or undefined properties), as well as the ability to specify stacked color attributes on the output. Initially, I opted to modify the original source project, but as it went on I realized I didn't really like the way it was structured and decided to rewrite the entire module, using colors and underscore to simplify the implementation.

Therefore, I forked the repository and rewrote the entire core library file. My branch is available at https://github.com/gmalysa/spyglass. It is also available to be installed through npm:

npm install spyglass

The README on github has a very in-depth explanation of how to use spyglass, so I won't repeat all of it here, but the basic idea is that it allows for colored, pretty-printed object inspection, optionally writing the output to a stream (such as the console) or simply saving it as a string. It has a built-in type identification system (which allows for better resolution than the typeof operator) with user-customizable hooks that allow for special display routines to be provided for any user-defined class. Reasonable defaults are provided for all of the built in Javascript classes; more might come for some built in Node.js objects as well. Object properties can be hidden from the output based on exact name matches, regular expressions, or value types (i.e. null properties, function properties, etc.) to avoid unnecessary clutter.

If you're currently using eyes.js or are just looking for a tool to add powerful inspection/var_dump()-like capabilities to your Node.js project, please give it a try and let me know about any bugs or changes that might be useful!

While doing some updates to the blog, such as adding author information and other useful information to hopefully help with legitimatizing things, I spent about two hours trying to figure out why a shortcode in the theme that I use (Suffusion) would not parse arguments. The tag suffusion-the-author would display author name, but if I added an argument to make it display the description or a link instead of just the name, it would always fail and revert to the author name only. Some quick googling told me that this didn't seem to be a problem anyone else had, and after four major versions for the theme, I'm pretty sure someone would've noticed if a basic feature were entirely nonfunctional. After lots of debugging, I discovered that the problem was happening in Wordpress's code, inside the definition of shortcode_parse_atts(), which parses the attributes matched in the shortcode into an array for use inside the callback. For me, instead, it killed all of the tag arguments and returned an empty string. In the end I found that the issue was a regular expression that appears to be designed to replace certain types of unicode spaces with normal spaces to simplify the argument parsing regular expression that comes after

$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);

On my system, this would simply set $text to an empty string, every time. Initially, I wanted to just comment it out, because I'm not expecting to see any of these characters inside tags, but that bothered me, because this has been a part of the Wordpress code for a very long time, and nobody else appears to have problems with it. Finally, I concluded that this must mean that unicode support was not enabled, so I began searching through yum. Bad news, mbstring and pcre were all already installed, and php -i told me that php was built with multibyte string support enabled, as well as pcre regex support. I tested my pcre libraries and they claimed to support unicode as well.

In the end, I solved the problem by updating php and pcre to the latest version, which was an update from php-5.3.13 to php-5.3.24 in my case, and pcre was updated from 7.x to 8.21. It appears that there was some kind of incompatibility between php 5.3 and the 7.x branch of pcre (if you build php 5.3 from source, it will target pcre 8.x), which prevented me from using unicode despite support for it being present everywhere. So, if you have trouble getting your php installation to handle unicode inside regular expressions, and you're running on Amazon Web Services, make sure you update to the latest versions of php and pcre, as there appear to be some issues with older packages.

Yesterday, I had a small issue where someone attempted to brute force the admin password to my blog, resulting in significantly decreased availability for about 10 minutes as my server's resources were maxed out, leading to 45+ second page load times. Luckily, it happened just as I was coming home, so I was able to identify the problem and put a stop to it very quickly. Incidentally, this taught me that wordpress does not have a rate limiting algorithm in place on logins (allowing brute force attacks at the server's maximum request speed). So, I would recommend using a plugin to add rate limiting to logins, which can mitigate the danger associated with a simple scripted attack.

Once the problem was identified as an attack of some kind, the next step was to disable php-fpm in order to drop the load average to an acceptable level so that I could figure out who and what was going on, and then do my best to mitigate it. Incidentally, this will cause nginx to produce a 502 error as the CGI handler is no longer responsive. From /var/log/nginx/blog.access.log:

{IP} - - [01/Apr/2013:16:59:57 +0000] "POST /wp-login.php HTTP/1.1" 200 6969 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:57 +0000] "GET /wp-admin/ HTTP/1.1" 302 5 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:58 +0000] "GET /wp-login.php?redirect_to=http%3A%2F%2Fthelonepole.com%2Fwp-admin%2F&reauth=1 HTTP/1.1" 200 6044 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:58 +0000] "POST /wp-login.php HTTP/1.1" 200 6969 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:59 +0000] "GET /wp-admin/ HTTP/1.1" 302 5 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:16:59:59 +0000] "GET /wp-login.php?redirect_to=http%3A%2F%2Fthelonepole.com%2Fwp-admin%2F&reauth=1 HTTP/1.1" 200 6044 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

becomes

{IP} - - [01/Apr/2013:22:45:53 +0000] "GET /wp-admin/ HTTP/1.1" 502 173 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "POST /wp-login.php HTTP/1.1" 502 198 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "GET /wp-admin/ HTTP/1.1" 502 173 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{IP} - - [01/Apr/2013:22:45:54 +0000] "POST /wp-login.php HTTP/1.1" 502 198 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Interestingly, we can see that whatever brute force attack is being used attempts to spoof the bing search crawling robot in order to prevent me from using a user-agent denial without deindexing myself. This isn't really a big deal, because I don't plan on solving the problem with policies in nginx, so anything in the http request is largely irrelevant.

I've removed the user's IP address, but in this case it was assigned through a cloud services provider, meaning that someone likely set up an account to look for wordpress blogs on the internet and attempt to break into their admin directory, likely to install a plugin and then take over the machine in order to attach it to a botnet. I've emailed their abuse department, but I never heard back and don't really expect them to do anything about it (most companies don't seem to care). I don't expect any of my traffic to come from cloud providers, so the simplest solution is to deny all traffic from the IP in question. There are a number of ways for me to do this, but the best is to drop requests as early as possible in my network stack, because it minimizes the amount of resources that my server will dedicate to processing an illegitimate request. To do this, I use the kernel firewall utility iptables, adding rules to drop all incoming packets (with no acknowledgement at all, which makes it appear as though my server is no longer online) from blacklisted sources as early as possible in the processing chain.

General scanning of the internet, probing for vulnerabilities, and attempting to exploit poorly configured servers is extremely common. I used to get very concerned when I saw logfile entries almost daily indicating that someone had attempted to request files that don't exist or exploit a flaw in how older versions of apache (which I don't use) looks up files. After doing some reading, I relaxed, because it's very common, and servers running up-to-date software will withstand generic attacks (but likely not actual, targeted attempts at breaching). That said, since I have no interest in letting these people continue to scan me for vulnerabilities in order to make me a part of their botnet, and they almost certainly do not form any part of my readership, I put together a small script that added a log message and dropped the packets whenever an IP I disliked attempted to reach the server, to make it quick and easy to deny people whose nonsense was becoming problematic.

#!/bin/bash

if [ -z $1 ]; then
        echo "Usage: $0 <ip address> [reason]";
        exit;
fi

IP=$1
REASON="denied ip"

if [ $# -gt 1 ]; then
        shift;
        REASON=$@
fi

iptables -A INPUT -s {IP}/32 -j LOG --log-prefix {REASON} "
iptables -A INPUT -s ${IP}/32 -j DROP

You can see log messages (including those generated by iptables) in /var/log/messages where you will see a message for each dropped packet. This could potentially cause your log file to balloon, in the case of any kind of flooding, so you may wish to eliminate the log rules, but I generally keep them so that I remember later why an IP address was blocked.

Feel free to use this whenever you find you need to deny access to a specific IP address, to save the trouble of learning iptables' options. If you add lots of entries to your INPUT chain in iptables, all of your network processing will slow down, as each incoming packet is matched against all of the rules in the chain in order until it reaches the end, where it is passed up to the next layer. That said, the slowdown is relatively minor and does not become noticeable until you have added hundreds of rules. If you find yourself in this situation, perhaps investigate another approach, or consider using multiple chains and branching as early as possible, rather than listing all blacklisted IPs sequentially, to limit the maximum chain traversal length.