Jan 012014
 

For a client’s website, I needed to enumerate the 12 months preceding a given date to create links to archived content. The site uses a javascript templating engine to create HTML, offloading the process from the server, so generating the list of months on the client side in javascript seemed like a reasonable choice. For the past week, everything looked great, but suddenly today I noticed that it was repeating the current month.

The code itself is pretty simple. It’s written as a dustjs helper function that uses the Date.setMonth function to handle wrapping from January of one year to December of the previous year.

dust.helpers.month_list = function(chunk, ctx, bodies, params) {
	var count = dust.helpers.tap(params.count, chunk, ctx) | 0;
	var curDate = new Date();

	for (var i = 0; i < count; ++i) {
		var dateStr = (1900 + curDate.getYear()) + '-' + (curDate.getMonth()+1);
		chunk = chunk.render(bodies.block, ctx.push({date : dateStr}));
		curDate.setMonth(curDate.getMonth()-1);
	}

	return chunk;
}

Some quick printf debugging, adding console.log(curDate) at the start of the loop, shows this surprising result:

Tue Dec 31 2013 20:47:14 GMT-0800 (Pacific Standard Time)
Sun Dec 01 2013 20:47:14 GMT-0800 (Pacific Standard Time)
Fri Nov 01 2013 20:47:14 GMT-0700 (Pacific Daylight Time)
Tue Oct 01 2013 20:47:14 GMT-0700 (Pacific Daylight Time)

Apparently, on the 31st of the month, subtracting one month from the current date does not have the desired effect, in Chrome (on Windows 8.1). I ran the test again in IE 11 and observed the same behavior, as well as tried by manually setting the date to the 31st of October and subtracting a month, again seeing the same behavior. I'm not sure if that means this is somehow part of the specification, or if it's a bug caused by a library used underneath the hood in both browsers, but the end result is the same. My "clean" code to use setMonth(getMonth()-1) instead of writing out an if statement to detect the start of the year and wrap correctly now contains a cryptic loop that detects if the month didn't actually change and subtracts it again, all to deal with a bug that only happens once a month.

Jul 202013
 

There aren’t very many good resources for optimizing compute-bound JavaScript on the Internet, aside from this article from Google Developers discussing string building and touching very briefly on some of the problems with functions. For the most part, I’d bet this is because a lot of developers say “we’ll leave it up to the JIT compiler to do optimizations,” and then they write code the idiomatic way, assuming that this will translate into high performance, and the rest focus on non-compute usage, which is generally limited in performance by networked resources. However, as I’ve found, the v8 optimizer, as good as it is, still struggles with a number of situations that are easily handled in C. Obviously, I’m not the first to notice this, with projects like asm.js that exist to identify a subset of standard JavaScript that can achieve high performance by allowing for better optimization and especially by avoiding garbage collection.

However, I think that there are a number of things that can be done without resorting to asm.js, to increase performance. I’ve assembled four test cases/demonstrations to illustrate some problematic areas for the JIT optimizer and identify how I’ve hand-optimized them. All of the test cases will be discussed in terms of simple benchmark results, using Node.js’s high resolution timer to achieve a higher precision. One thing that is important to note is that sometimes the relative speed increase appears huge (for instance 10x faster), but really we are looking at a very simple, pared down test case. In this instance it is much more meaningful to consider the absolute speedup (5 microseconds vs. 10 microseconds), because when used in real code, the function bodies will generally have far more meat on them, but the same amount of overhead will be eliminated.

My attempts at optimizing a graph processing library that I have been working on identified function calls to be the biggest source of overhead, because the optimizer doesn’t appear to do much with them. I haven’t tried to look at tracing or disassembly, because I want to keep this simple for now. Since functions appear to be the problem, let’s start with a simple test: does v8 inline functions?

var iter = 10000;
var i, j, start, stop;

function inc(a) { return a+1; }

var x = 0;
start = process.hrtime();
for (i = 0; i < iter; ++i) {
	x = inc(x);
}
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T1//with function: '+time+' ms');

var x = 0;
start = process.hrtime();
for (i= 0; i < iter; ++i) {
	x = x + 1;
}
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T1//hand inlined: '+time+' ms');

For me, the version calling inc() generally takes about 0.467 milliseconds to complete, while the hand-inlined version takes only 0.0365 ms. The answer, then, is that either v8 doesn't inline, or if it does, it cannot match a hand-inlined version of the same code. I am not knowledgeable enough about the engine internals or about the specification requirements for JavaScript to speculate on why this happens, but for now, I am satisfied to notice the problem and try to work around it in performance-critical code.

However, this actually leads to a serious problem from a design perspective. You can either write well-partitioned code that is easy to read and follow, with good compartmentalization for function reuse, or you can stuff it all into a single function and expect a significant amount of call overhead to be eliminated. I think this could be (easily?) rolled into a tool like UglifyJS, to automatically inline functions, but at that point, we're introducing a build step for an interpreted language. For scripts that will be sent to the browser, this is perfectly acceptable, as we generally would want to pass these through UglifyJS (although you shouldn't use a separate build step for this anyway), but for server-side scripts, this is extremely tedious. Yes, you can automate it with grunt or another task runner, but it just reintroduces the exact type of process we are trying to eliminate by using JavaScript instead of C.

For now, I'll just say that I think that some amount of function inlining should be possible for the JIT optimizer to do, but there are a lot of instances (member functions, for instance, due to the completely dynamic way in which methods are called) where this is not possible. Therefore, let's move on, and examine the performance of JavaScript's iteration constructs.

One of the things that really attracted me to Node.js was Underscore.js, a functional programming library that provides the types of things I would normally find in Scheme. The best thing about these tools is that they make it simple to write clean, easy to maintain code, and they help enforce good design principles by encouraging programmers to write small functions for one task, and then to build larger processes out of those functions. It might be obvious in hindsight, but considering that function optimization is not very effective, using functions to do array iteration is probably slower than using a for loop to do the same thing. Let's look at two variations, one testing Array.reduce and one testing Array.forEach in an O(n^2) situation to see how the function overhead scales with nested loops.

// Test 2: Iteration methods vs. for loops
iter = 1000;
var elems = 100, subelems = 10, sum;
var xa = [];

for (i = 0; i < elems; ++i) {
	xa[i] = Math.random();
}

// 2a: array sum with reduce
start = process.hrtime();
sum = xa.reduce(function(memo, v) {
	return memo + v;
}, 0);
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T2A//using reduce: '+time+' ms');

start = process.hrtime();
sum = 0;
for (i = 0; i < xa.length; ++i) {
	sum += xa[i];
}
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T2A//for loop: '+time+' ms');
console.log('');

// On my machine, times fluctuate a lot, but 0.10-0.16 ms vs 0.0054-0.0078 ms
// so the for loop is faster again

// 2b: nested iteration, calling a member function for each element of an array
function t2b_class() {
	this.val = Math.random();
}
var t2b_sum = 0;
t2b_class.prototype.work = function() {
	t2b_sum += this.val;
};

for (i = 0; i < elems; ++i) {
	xa[i] = [];
	for (j = 0; j < elems; ++j) {
		xa[i][j] = new t2b_class();
	}
}

start = process.hrtime();
xa.forEach(function(v) {
	v.forEach(function(v2) {
		v2.work();
	});
});
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T2B//nested iterators: '+time+' ms');

function inner_iter(v2) {
	v2.work();
}
start = process.hrtime();
xa.forEach(function(v) {
	v.forEach(inner_iter);
});
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T2B//nested iterators, avoids redeclaration: '+time+' ms');

start = process.hrtime();
for (i = 0; i < xa.length; ++i) {
	for (j = 0; j < xa[i].length; ++j) {
		xa[i][j].work();
	}
}
stop = process.hrtime(start);

time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T2B//for loop: '+time+' ms');

For Test 2A, I find that using Array.reduce takes between 0.10 and 0.16 milliseconds, while using the for loop to implement a reduce operation "the long way" takes between 0.0045 and 0.0075 ms. These really aren't huge differences, but in principle they shouldn't exist at all, as, from a semantic point of view, the code does exactly the same thing in both implementations. For test 2B, however, there is a much more significant difference. For the nested forEaches, I've tested two possible implementations: the "obvious" approach, which uses an anonymous inner function for the second loop, and the "observant" approach which realizes that this function gets redeclared every time the loop iterates, so the declaration is moved outside of the outer loop. My results are that the obvious approach is the slowest, coming in at around 1.08-1.32 ms, then the observant approach, coming in around 0.87-0.95ms, and then the explicit, nested for loop blowing both out of the water with a time of 0.22-0.35 ms.

The results for 2A are underwhelming, but 2B illustrates a much more significant problem with the use of functional techniques. It's really easy to write code in a functional style that ends up being really slow, because v8 does not effectively optimize it. For me, this is a disaster. Functional-style code is faster to write, easier to maintain, and generally cleaner than its imperative counterpart. Plus, implementing a map, reduce, or even just a for-each loop using the for construct and explicit iteration requires repeating a lot of glue code that is handled internally. Even C++11 has language features to minimize this kind of repetition, now. As with function inlining, I maintain that for both 2A and 2B, these types of transformations can easily be done by a tool like UglifyJS at the source level, but really, I feel that the JIT optimizer should be capable of doing them as well.

Now, let's look at another type of optimization important for performance when using recursive functions: tail call optimization. I've written the canonical factorial function in three styles: first, a basic recursive implementation that works its way back up the stack to compute its result; second, a version that uses tail-calls and could therefore be optimized without any source transformation; and third, a complete iterative version, to determine whether v8 performs maximally effective tail optimization.

// Test 3: Tail calls vs. stack walking vs. iterative method
function fact_tail(n, accumulator) {
	if (n < 2)
		return accumulator;
	return fact_tail(n-1, accumulator*n);
}
function fact_stack(n) {
	if (n < 2)
		return 1;
	return n * fact_stack(n-1);
}
function fact_iter(n) {
	var accumulator = n;
	for (n = n-1; n > 1; --n) {
		accumulator = accumulator*n;
	}
	return accumulator;
}

iter = 1000;
var n = 30, result;

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = fact_tail(n-1, n);
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T3//Tail Factorial: '+time+' ms');

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = fact_stack(n);
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T3//Stack Factorial: '+time+' ms');

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = fact_iter(n);
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T3//Iterative Factorial: '+time+' ms');

The results are approximately what you'd expect. The tail call is not fully optimized, but it is faster than the version using the stack. I haven't done enough research to determine whether the speed boost is due to (some) actual optimization being performed or if it's just a by-product of how temporary variables are used implicitly in the local scope. The numbers for me come in at 0.515 ms for the basic stack version, 0.277 ms for the tail-call version, and 0.181 ms for the iterative version.

Again we see that a source transformation from "good, easy-to-follow code" to "awkward, but still makes sense if you stop to think about it code" results in improved performance. This is the same type of trade-off that we faced in the 90s when C compilers did not optimize effectively. And, again, I would posit that this is the type of operation that should be done by the JIT optimizer. If only I had the necessary knowledge to actually try to modify v8 myself--I've looked at the code, and for all my complaints about shortcomings, it is both incredibly impressive and far too complicated for me to "pick up" over the weekend and change).

Finally, I have one last optimization case. This one, however, is not something that can be done by a JIT optimizer, because the result would no longer meet the language specification. For those of you familiar with C++, there are two basic types of methods in a class: "normal" methods, which have a fixed address known at compile time, and "virtual" methods, which are resolved at run time based on the virtual function table stored within the object on which the method is being called (I almost said "dynamically determined by the type" but that would be inaccurate, as virtual methods do not require RTTI). In JavaScript all class methods are implicitly of a virtual type. When you call a function, that field on the object must be looked up every time, because it can be changed at any time. This is an incredibly useful feature of the language, because of the flexibility it provides: polymorphism is "easy," objects can be monkey-patched to modify behavior, and some of the more subtle aspects of C++ go away (for instance, do you know off the top of your head which method is called if you dereference a base class-type pointer to an instance of a derived class and then call a virtual method?). But this power comes at a cost. When a method isn't used in this manner, we still pay the price of doing lookups, and because the optimizer cannot know that we will not replace a method definition, it can't substitute in the address of the specific function we want.

There is actually a simple, clever way to obtain static references to functions. When the method is called through the object as a property lookup, the prototype chain must be traversed to find the first matching function and then it is used. If we skip that step and use the specific function, on the specific prototype that we want, it should be slightly faster. We still pay the price of looking up a property on the prototype object, though, so we can even take this a step further and save a variable that points precisely to the function we want to call. Because of JavaScript's ability to call any function with any "this" context, we can take the static function pointer and call it as though it were a member function without going through the dynamic lookup phase. Let's look at a test to compare this.

// Test 4: Function table overhead
function TestClass(a, b, c) {
	this.a = a;
	this.b = b;
	this.c = c;
}
TestClass.prototype.hypot = function() {
	return Math.sqrt(this.a*this.a + this.b*this.b + this.c*this.c);
}

iter = 10000;
var fn = TestClass.prototype.hypot;
xa = new TestClass(Math.random(), Math.random(), Math.random());

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = xa.hypot();
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T4//Object method lookup: '+time+' ms');

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = TestClass.prototype.hypot.call(xa);
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T4//Prototype method lookup: '+time+' ms');

start = process.hrtime();
for (i = 0; i < iter; ++i) {
	result = fn.call(xa);
}
stop = process.hrtime(start);
time = (stop[0]*1e9 + stop[1])/1e6;
console.log('T4//Pre-saved method pointer: '+time+' ms');

For me, the results are that full dynamic lookup takes 0.194 ms, retrieving the method through the prototype takes 0.172 ms, and using a pre-saved pointer to the function takes only 0.135 ms. These absolute differences really aren't that significant, but this can be applied to every single non-polymorphic function call in your project, which means this might be the test case where relative speedup (30% faster with static function pointers) matters more than absolute. If you still think I'm crazy for even mentioning this, consider the extremely common problem of converting an arguments variable into a proper array by calling Array.prototype.slice on it. Many people will save a reference to Array.prototype.slice at the top of their module and use it throughout their code; luckily it's both shorter to type AND faster, which is great for libraries that provide a lot of functions accepting variable numbers of arguments.

Member function lookup optimization requires far more work than the other three, and it cannot be applied transparently by a JIT optimizer or by UglifyJS as the language specification prevents it from having enough information to correctly do so. It also makes your code much, much more annoying to read. I could suggest writing an AltJS language that allows you to differentiate between the two types of member functions, so that when it compiles to JS it spits out static references where possible, but I'm pretty sure @horse_js would make fun of me for it (rightly so). But, if you're in a situation where performance for computation is just that important, I'd recommend at least considering static function references.

As for my graph library, by carefully rewriting code matching the four above cases, I was able to get the execution time for a single pass of the evaluator down from 25.5 usec to 7.4 usec, which increases the number of users that my application can support for a given amount of hardware by 3.3x, immediately translating into lower operating costs. I wish I could share the code (maybe one day, but for now it's under the "secret project" category), but hopefully an unsubstantiated anecdote like this will help convince you that these optimizations still benefit real code that performs real tasks, and not just overly-simple test cases. There is often an important tradeoff to consider between readability, maintainability, and performance, so I wouldn't recommend always writing code that looks like this, but if you're working on a compute-bound application (mostly, JavaScript-based games, which we will see many of as WebGL and HTML5 are improved), maybe think about writing your code in three steps: first, do it the easy/fast/right way, producing good, clean code. Then, write a very good set of test cases. Finally, optimize your code to death, and use your test cases to verify that you haven't broken anything.

The complete benchmark file used to generate my numbers is available as a gist in case you don't feel like cobbling together the snippets given above.

Jul 062013
 

The Wiener filter is well known as the optimal solution to the problem of estimating a random process when it is corrupted by another additive process, using only a linear combination of values of the measured process. Mathematically, this means that the Wiener filter constructs an estimator of some original signal x(t) given z(t)=x(t)+n(t) with the property that \|\hat{x}(t)-x(t)\| is minimized among all such linear estimators, assuming only that both x(t) and n(t) are stationary and have known statistics (mean, variance, power spectral density, etc.). When more information about the structure of x(t) is known, different estimators may be easier to implement (such as a Kalman filter for signals with a recursive structure).

Such a filter is very powerful—it is optimal, after all—when the necessary statistics are available and the input signals meet the requirements, but in practice, signals of interest are never stationary (rarely even wide sense stationary, although it is a useful approximation), and their statistics change frequently. Rather than going through the derivation of the filter, which is relatively straightforward and available on Wikipedia (linked above), I’d like to talk about how to adapt it to situations that do not meet the filter’s criteria and still obtain high quality results, and then provide a demonstration on one such signal.

The first problem to deal with is the assumption that a signal is stationary. True to form for engineering, the solution is to look at only a brief portion of the signal and approximate it as stationary. This has the unfortunate consequence of preventing us from defining the filter once and reusing it; instead, as the measured signal is sliced into approximately stationary segments, we must estimate the relevant statistics and construct an appropriate filter for each segment. If we do the filtering in the frequency domain, then for segments of length N we are able to do the entire operation with two length N FFTs (one forward and one reverse) and O(N) arithmetic operations (mostly multiplication and division). This is comparable to other frequency domain filters and much faster than the O(N^2) number of operations required for a time domain filter.

This approach creates a tradeoff. Because the signal is not stationary, we want to use short time slices to minimize changes. However, the filter operates by adjusting the amplitude of each bin after a transformation to the frequency domain. Therefore, we want as many bins as possible to afford the filter high resolution. Adjusting the sampling rate does not change the frequency resolution for a given amount of time, because the total time duration of any given buffer is f_{s}N. So, for fixed time duration, the length of the buffer will scale inversely with the sampling rate, and the bin spacing in an FFT will remain constant. The tradeoff, then, exists between how long each time slice will be and how much change in signal parameters we wish to tolerate. A longer time slice weakens the stationary approximation, but it also produces better frequency resolution. Both of these affect the quality of the resulting filtered signal.

The second problem is the assumption that the statistics are known beforehand. If we’re trying to do general signal identification, or simply “de-noising” of arbitrary incoming data (say, for sample, cleaning up voice recorded from a cell phone’s microphone in windy area, or reducing the effects of thermal noise in a data acquisition circuit), then we don’t know what the signal will look like beforehand. The solution here is a little bit more subtle. The normal formulation of the Wiener filter, in the Laplace domain, is

G(s)= \frac{S_{z,x}(s)}{S_{z}(s)}
\hat{X}(s)=G(s) Z(s)

In this case we assume that the cross-power spectral density, S_{z,x}(s), between the measured process z(t) and the true process x(t) is known, and we assume that the power spectral density, S_{z}(s), of the measured process z(t) is known. In practice, we will estimate S_{z}(s) from measured data, but as the statistics of x(t) are unknown, we don’t know what S_{z,x}(s) is (and can’t measure it directly). But, we do know the statistics of the noise. And, by (reasonable) assumption, the noise and the signal of interest are independent. Therefore, we can calculate several related spectra and make some substitutions into the definition of the original filter.

S_z(s)=S_x(s)+S_n(s)
S_{z,x}(s)=S_x(s)

If we substitute these into the filter definition to eliminate S_x(s), then we are able to construct and approximation of the filter based on the (known) noise PSD and an estimate of the signal PSD (if the signal PSD were known, it’d be exact, but as our PSD estimate contains errors, the filter definition will also contain errors).

G(s)=\frac{S_z(s)-S_n(s)}{S_z(s)}

You may ask: if we don’t know the signal PSD, how can we know the noise PSD? Realistically, we can’t. But, because the noise is stationary, we can construct an experiment to measure it once and then use it later. Simply identify a time when it is known that there is no signal present (i.e. ask the user to be silent for a few seconds), measure the noise, and store it as the noise PSD for future use. Adaptive methods can be used to further refine this approach (but are a topic for another day). It is also worth noting that the noise does not need to be Gaussian, nor does it have any other restrictions on its PSD. It only needs to be stationary, additive, and independent of the signal being estimated. You can exploit this to remove other types of interference as well.

One last thing before the demo. Using the PSD to construct the filter like this is subject to a number of caveats. The first is that the variance of each bin in a single PSD estimate is not zero. This is an important result whose consequences merit more detailed study, but the short of it is that the variance of each bin is essentially the same as the variance of each sample from which the PSD was constructed. A remedy for this is to use a more sophisticated method for estimating the PSD by combining multiple more-or-less independent estimates, generally using a windowing function. This reduces the variance and therefore improves the quality of the resulting filter. This, however, has consequences related to the trade-off between time slice length and stationary approximation. Because you must average PSDs computed from (some) different samples in order to reduce the variance, you are effectively using a longer time slice.

Based on the assigned final project in ECE 4110 at Cornell University, which was to use a Wiener filter to de-noise a recording of Einstein explaining the mass-energy equivalence with added Gaussian white noise of unknown power, I’ve put together a short clip comparing the measured (corrupted) signal, the result after filtering with a single un-windowed PSD estimate to construct the filter, and the result after filtering using two PSD estimates with 50% overlap (and thus an effective length of 1.5x the no-overlap condition) combined with a Hann window to construct the filter. There is a clear improvement in noise rejection using the overlapping PSD estimates, but some of the short vocal transitions are also much more subdued, illustrating the tradeoff very well.

Be warned, the first segment (unfiltered) is quite loud as the noise adds a lot of output power.

Here is the complete MATLAB code used to implement the non-overlapping filter

 
% Assumes einsteindistort.wav has been loaded with
[d,r] = wavread('EinsteinDistort.wav');

% Anything that can divide the total number of samples evenly
sampleSize = 512;

% Delete old variables
% clf;
clear input;
clear inputSpectrum;
clear inputPSD;
clear noisePSD;
clear sampleNoise;
clear output;
clear outputSpectrum;
clear weinerCoefficients;

% These regions indicate where I have decided there is a large amount of
% silence, so we can extract the noise parameters here.
noiseRegions = [1 10000;
                81000 94000;
                149000 160000;
                240000 257500;
                347500 360000;
                485000 499000;
                632000 645000;
                835000 855000;
                917500 937500;
                1010000 1025000;
                1150000 116500];

% Now iterate over the noise regions and create noise start offsets for
% each one to extract all the possible noise PSDs
noiseStarts = zeros(length(noiseRegions(1,:)), 1);
z = 1;
for k = 1:length(noiseRegions(:,1))
    for t = noiseRegions(k,1):sampleSize:noiseRegions(k,2)-sampleSize
        noiseStarts(z) = t;
        z = z + 1;
    end
end

% In an effort to improve the PSD estimate of the noise, average the FFT of
% silent noisy sections in multiple parts of the recording.
noisePSD = zeros(sampleSize, 1);
for n = 1:length(noiseStarts)
    sampleNoise = d(noiseStarts(n):noiseStarts(n)+sampleSize-1);
    noisePSD = noisePSD + (2/length(noiseStarts)) * abs(fft(sampleNoise)).^2 / sampleSize;
end

% Force the PSD to be flat like white noise, for comparison
% noisePSD = ones(size(noisePSD))*mean(noisePSD);

% Now, break the signal into segments and try to denoise it with a
% noncausal weiner filter.
output = zeros(1, length(d));
for k = 1:length(d)/sampleSize
    input = d(1+sampleSize*(k-1):sampleSize*k);
    inputSpectrum = fft(input);
    inputPSD = abs(inputSpectrum).^2/length(input);
    weinerCoefficients = (inputPSD - noisePSD) ./ inputPSD;
    weinerCoefficients(weinerCoefficients < 0) = 0;
    outputSpectrum = inputSpectrum .* weinerCoefficients;

    % Sometimes for small outputs ifft includes an imaginary value
    output(1+sampleSize*(k-1):sampleSize*k) = real(ifft(outputSpectrum, 'symmetric'));
end

% Renormalize and write to a file
output = output/max(abs(output));
wavwrite(output, r, 'clean.wav');

To convert this implementation to use 50% overlapping filters, replace the filtering loop (below "Now, break the signal into segments…") with this snippet:

output = zeros(1, length(d));
windowFunc = hann(sampleSize);
k=1;
while sampleSize*(k-1)/2 + sampleSize < length(d)
    input = d(1+sampleSize*(k-1)/2:sampleSize*(k-1)/2 + sampleSize);
    inputSpectrum = fft(input .* windowFunc);
    inputPSD = abs(inputSpectrum).^2/length(input);
    weinerCoefficients = (inputPSD - noisePSD) ./ inputPSD;
    weinerCoefficients(weinerCoefficients < 0) = 0;
    outputSpectrum = inputSpectrum .* weinerCoefficients;

    % Sometimes for small outputs ifft includes an imaginary value
    output(1+sampleSize*(k-1)/2:sampleSize*(k-1)/2 + sampleSize) = output(1+sampleSize*(k-1)/2:sampleSize*(k-1)/2 + sampleSize) + ifft(outputSpectrum, 'symmetric')';
    k = k +1;
end

The corrupted source file used for the project can be downloaded here for educational use.

This can be adapted to work with pretty much any signal simply by modifying the noiseRegions matrix, which is used to denote the limits of "no signal" areas to use for constructing a noise estimate.

Jun 122013
 

I’ve been a huge fan of LISP and related programming languages (scheme, racket, etc.) for a very long time. I’ve been evangelizing the utility of learning how the language works and understanding functional concepts among my friends for almost as long, and recently a friend of mine finally caved and read McCarthy’s original paper. Once he realized how important it was, he decided to learn while solving the L-99, a list of 99 problems for LISP (based on Haskell’s list of 99 problems), except we use Scheme/Racket for our implementations, because of the excellent Dr. Racket environment on both Windows and Linux.

The purpose of the L-99 was to provide a series of challenges that would force someone to learn about the language, its syntax and structure, standard paradigms, etc., which something I think it accomplishes very well. But I think it also provides a great backdrop for discussing algorithms regardless of the language they are written in. L-99 therefore provides two opportunities: the first is learning how to do something in LISP, and the second is learning how to do it “the best way”—the one with the lowest time and/or memory complexities. An example I’d like to mention is the list reversal problem, #5, because it is easy to understand and has two easy solutions with very different characteristics. My friend’s solution (in Racket), which works correctly, is given below:

(define (list-reverse x)
  (cond
    [(null? (cdr x)) x]
    [else (append (list-reverse (cdr x)) (list (car x)))]))

As I mentioned above, his solution works correctly (aside from a little bit of parameter validation that is missing), but a close reading reveals that its time complexity is O(n^2). This is because the append function is O(n), and it is called once per item in the list to reverse. In another language, with another library (say, C++ with STL containers), appending to a list is a constant time operation, which is likely why he didn’t think much of using append here. That said, an opaque list structure can be made (in LISP) that offers constant time append operations, but this isn’t how a chain of cons cells works. For comparison, here is my solution to the problem, which uses a helper function to reverse the list in O(n) time:

(define (list-reverse-linear x)
  (define (lr-inner xa xr)
    (cond
      [(empty? (cdr xa)) (cons (car xa) xr)]
      [else (lr-inner (cdr xa) (cons (car xa) xr))]))
  (cond
    [(not (pair? x)) x]
    [(empty? (cdr x)) x]
    [else (lr-inner (cdr x) (cons (car x) null))]))

I suppose it is obvious to most people that faster asymptotic complexity is always better, but just for the sake of argument, let’s look at the time required for reversing a list with 10,000 elements in it:

; Input
(define (gen-list n)
  (cond
    [(= n 0) null]
    [else (cons n (gen-list (- n 1)))]))
(define tl (gen-list 10000))
(time-apply list-reverse (cons tl null))
(time-apply list-reverse-linear (cons tl null))

; Result is, in milliseconds (cpu real gc)
(3432 3427 2293)
(0 1 0)

A stark contrast—based on a measurement with time-apply, it takes nearly 3.5 seconds to reverse the list using the O(n^2) algorithm, but less than one millisecond to do so with the O(n) algorithm. A big part of this, of course, also comes from the garbage collection time, which accounts for nearly 2/3 of the run time in the slower algorithm, due to how many intermediate lists are generated and discarded, while the linear algorithm does not allocate any unused items. Another important difference is that my solution is tail-recursive, which enables significant optimizations (ones that Dr. Racket is capable of doing, too), compared to my friend’s solution.

I think that the L-99 has a wonderful secondary purpose. Not only can it be used to introduce someone to the standard idioms, tools, and techniques of functional programming, but it can also be used to introduce and discuss analysis of algorithms, as the problems can generally be solved in multiple ways, with different time and space complexity requirements. It even provides a platform to discuss different implementation strategies (such tail-recursion vs. not) and experiment with their effects on performance, all with problems and solutions that are very simple to state and understand. This is a great aspect of LISP as a tool for learning in general—it frees you as a programmer from worrying about necessary but conceptually irrelevant implementation details, to focus on understanding what you’re doing, why you’re doing it, and how you’re doing it, on an algorithmic level. Mastering the relevant skills translates readily into using other languages, where it helps separate the problems—algorithms, memory management, program structure, inheritance organization, the list goes on—out in your mind, allowing you to solve each individually.

Normally I’d write another 3+ pages, at minimum, trying to beat this to death, but I think this is a pretty self-defending position. Instead, I’d like to ask you a few questions to think about and maybe take with you as you do or learn about programming. Have you ever looked at the L-99? Can you solve all the problems, even if you don’t take the time to write out complete solutions? Are your solutions optimal in their asymptotic complexities? I’d also love to hear about any particularly clever solutions people have created.

May 232013
 

Ever since I wrote my first query for MySQL, nearly a decade ago, as far as web development goes, I have wanted nothing more than to not write database queries in SQL. I wouldn’t say that I dislike the idea of SQL, even; I just don’t like writing queries. SQL is pretty flexible and extremely powerful, but it’s tedious. Queries read almost like English sentences, and despite the repetition of writing nearly identical structure many times, SQL syntax errors (magically) happen at a rate far higher than they do in my PHP or JavaScript source code, likely due to the extensive amount of escaping and quoting that happens while writing queries by hand. Within a couple years, I grew tremendously as a PHP developer and learned about the concept of database abstraction, but sadly it didn’t solve my problem.

The Problem with Database Abstraction

Database abstraction seems like a great idea—it hides the details of the database API from the user application. And, for this purpose, it is excellent. Perhaps it was my naivety, but I expected more. I expected database abstraction to abstract the syntax of SQL, not just hide the API for talking to MySQL. During those first few years, I worked on several different projects, and in almost every case, we used ad hoc solutions for hiding some of the details of constructing queries, such as a select() method that accepts a WHERE clause and fills in the rest of the query automatically. In the time since, these types of solutions have been included in abstraction layers such as the one provided by the Zend Framework, but they suffer from many problems.

Chief among these problems is incomplete SQL abstraction. This was a common aspect of our ad hoc solutions in 2005, as we really just wanted to hide part of the query syntax, but even modern solutions require the user to fill in some portions of the SQL, although this is improving. The problem, however, is that without complete SQL abstraction, the results aren’t really portable. The entire objective of abstracting the API is to provide backend portability without requiring significant changes, yet any hand-written SQL or portions of SQL provides an opportunity for using non-portable features of the particular DBMS at hand. The other consequence of incomplete abstraction is that input escaping must be done by the consuming code. This is easy, but it requires extra code and provides extra opportunity to make mistakes.

The next problem is that a lot of solutions are verbose. Sometimes this is due to the language syntax—like PHP’s array syntax—but the library structure is almost always a contributing factor. Granted, because we are doing abstraction, some overhead is not unreasonable, but when you combine the source code overhead with incomplete abstraction, it raises a valid question of the utility: now you are writing more code than if you wrote SQL directly plus if you change backends, you will need to inspect all of your code anyway, and possibly make changes despite using an adapter.

Finally, most abstraction layers do not provide a means for simplifying and standardizing the generation of frequently used queries or query fragments. They may provide abstraction and query generation, but it must be repeated wherever a query is generated. You may ease the pain by defining additional functions that modify query objects to perform the repeated tasks or add groups of conditions to where clauses, but this doesn’t hook into the rest of the abstraction in a clean way. Recently, I’ve come to learn of an approach known as Object-Relational Mapping (ORM), which provides some of this capability, as it provides a means of translating key/value maps into database objects and vice versa, but it appears to suffer from limitations in the complexity of the types of queries that can be created using the object mapping approach.

In 2006/2007, I began developing my own solution designed specifically to address these problems, in PHP. I’ve actually written about the approach before, on a blog long gone, where I posted some proto-code that formed the core of what I called “database filtering” and used for about 3 years on all of my PHP work. The premise is simple: the user constructs a map, in the form of a PHP array, which specifies a filter—that is, a set of matching conditions—for obtaining data from the database, and then passes it to a function that decodes the filter, escapes all of the input, formats it according to the database field types, and creates a complete query to obtain the data. This combined features of ORM with a more general query generation technique, because the filters did not actually reflect the structure of the table: they reflected the structure of an abstract data interface, which could be extended to include conversion rules for many different types of objects. An important aspect of my approach was automatically doing “the right thing” by converting array parameters to IN() conditions, but leaving scalars as simple equality. It greatly simplified my code because I no longer had to worry about formatting my data every time that I hit the database. Effectively, it formed an abstraction layer for data retrieval, freeing me from thinking about SQL entirely. In fact, aside from writing the primitives used by the filtering code, I didn’t write any SQL in my application logic at all.

Now that I’ve made the switch from PHP to Node.js, the first thing I needed was a suitable database abstraction layer. I decided to use the excellent node-mysql package for handling the connection to MySQL, and I decided to port my old database filtering library from PHP to Node.js, with many improvements, and release it on github. If you hadn’t guessed, the last six paragraphs were just background and motivation for my actual purpose here: to introduce db-filters for Node.js, explain why it is a good solution to these problems, and ideally motivate you, the reader, to use it for all of your SQL needs, and then convince your friends to use it as well. I mentioned the Zend Framework earlier, but since I am developing a Node.js solution, it is more useful to compare against Node.js Database Drivers a database abstraction layer that provides query generating functionality for MySQL and drizzle (which is really just also MySQL, but the use of different libraries makes it a good example of how current solutions are effective for API abstraction, but not language abstraction).

How I’d Like to Solve the Problem

The primary goals of db-filters are specifically to address the three problems outlined above, with many eventual stretch-type goals that may or may not be feasible. I want to provide a solution that has a complete abstraction of SQL, is simple to use, has a succinct yet flexible API, provides appropriate parameter escaping without programmer action, and supports pseudo-fields (I call them special keys) in table definitions that can be used to implement frequently repeated object property decoding in a central location with proper hooks into the SQL abstraction and filter decoding process. Additionally, db-filters provides all of these capabilities to INSERT, UPDATE, and DELETE queries as well, which is uncommon in other query-building abstraction layers. Importantly, db-filters is not an API abstraction, although it is an effective consequence, as it wraps around an underlying driver for communications. You use it with a (presumably simpler) library that provides a means for sending queries and retrieving results. With that in mind, let’s take a look at some example SQL and see how to generate it using a filter. All of the examples assume that appropriate filter definitions exist for the tables being used. Normally, filter definitions are initially created using the generation script included with db-filters, and then any necessary special handlers are implemented manually, so I will not include the definitions themselves.

We’ll start with a pair of basic queries to illustrate the simplicity of the API, as well as the inferred formatting. We have a table of users, and we wish to retrieve all of the information for a specific user, and then we wish to retrieve information for a group of users, using two queries:

SELECT * FROM `users` WHERE `id` = 1
SELECT * FROM `users` WHERE `id` IN (1, 2, 3, 4, 5)

Side note: all of the “expected SQL” I am providing is the exact output of the filtering code given afterwards, except that I use the buildQuery() method to obtain the SQL without sending it to the server. I provide db-mysql comparison code in a few places, but the installation actually failed on my system, so I am not able to guarantee that my comparison code is 100% correct as it is based on the (limited) documentation available on their website, but it should provide a good approximation of how much code must be written, including how much SQL is hand-written. Assuming that we have a filter for the users table stored in the users variable, the code would look like this:

// Success and failure are callback functions invoked asynchronously with the results
// they forward the results of mysql.query exactly as given
users.select({id : 1}).exec(success, failure)
users.select({id : [1, 2, 3, 4, 5]}).exec(success, failure)

// Comparison code snippet using Node.js Database Drivers' db-mysql
conn.query().select('*')
	.from('users')
	.where('`id` = ?', [1])
	.execute(callback);

this.query().select('*')
	.from('users')
	.where('`id` IN ?', [[1, 2, 3, 4, 5]])
	.execute(callback)

At this point, you should probably not be impressed, but it should already be apparent that db-filters provides a more effective way of creating queries. db-mysql requires an extra method call, query(), requires specifying the * field manually, which is SQL-specific and can be inferred by not including other field definitions, and requires partial SQL inclusion in the call to where(). Although it doesn’t matter in this example, requiring you to write out the contents of the WHERE clause defeats the point of abstracting the backend, because you will be tempted to use MySQL extensions, and suddenly your code isn’t portable anyway. With the filtering approach, you write no SQL, you write less code, you write simpler code, and the result is completely portable to any database engines supported by the library. With that in mind, let’s look at a more complicated type of query, to see if these attributes extend to these solutions as well. This time I have to write a query that that retrieves all of the posts on a specific page in a thread, along with the user information for the authors of those posts, ideally using a join. First, the SQL we expect to see:

SELECT `p`.*, `u`.* FROM posts AS p LEFT JOIN users AS u ON `p`.`userId` = `u`.`id` WHERE `p`.`threadId` = 1 ORDER BY `p`.`posted` ASC LIMIT 60, 20

This is generated with another very straightforward Node.js snippet. It is important to note that there are no SQL literals or directly-embedded strings in this code, only function calls, table/field names, and parameters. Comparable solutions often require ordering clauses such as posted ASC or ON clauses such as `p`.`userId` = `u`.`id`, which seem fairly innocuous but still serve to break the abstraction by requiring hand-written pieces of SQL.

// Using db-filters with users and posts as predefined filters
var page = 3;
var tId = 1;
var postsPerPage = 20;
posts.select({threadId : tId}, 'p')
	.order(db.$asc('posted'))
	.limit(page*postsPerPage, postsPerPage)
	.left_join(users, 'u')
	.on(['userId', 'id'])
	.exec(success, failure);

// What db-mysql code might look like
this.query().select(['`p`.*', '`u`.*'])
	.where('`p`.`threadId` = ?', [tId])
	.join({type : 'LEFT', table : 'users', alias : 'u', conditions : '`p`.`userId` = `u`.`id`'})
	.order({'`p`.`posted`' : true})
	.limit(page*postsPerPage, postsPerPage)
	.execute(callback)

This is where the difference is really apparent. db-mysql does not know how to combine fields with tables, so all table names/aliases must be prepended onto field names manually whenever they are used in the query. The syntax is also very verbose, requiring 33% more typing despite a weaker abstraction. By contrast, the db-filters code has much more readability to it, contains exactly zero back ticks (`), and can in principle emit a valid query for any RDBMS, because there are again no keywords or operators. These differences only become more significant as we consider different situations. Let’s look at one last SELECT query, which will demonstrate how the function abstraction and special fields work in a filter-based approach:

SELECT * FROM users WHERE `name` = 'greg' AND `password` = MD5(CONCAT(`salt`, 'pass')) LIMIT 1

This might be part of a simple salted, hashed login system. This isn’t exactly a robust login solution (use bcrypt instead, if you’re building one at home), but I’m not presenting a debate, I’m demonstrating how to use the library, and this was the easiest type of demonstration to come up for calling SQL functions.

var name = 'greg';
var pass = 'pass';
users.select({name : name, password : db.$md5(db.$concat(db.$field('salt'), pass))})
	.limit(1)
	.exec(success, failure);

// And again in db-mysql
this.query().select('*')
	.from('users')
	.where('`name` = ? AND `password` = MD5(CONCAT(`salt`, ?))', [name, pass])
	.limit(1)
	.execute(callback);

Using db-mysql, we are forced to write out the entire body of the WHERE clause ourselves, including directly calling MySQL functions. If we had been targeting PostgreSQL, instead, string concatenation would be written using a ||, which I believe is not valid MySQL syntax. Or, if we were using SHA1 hashing, instead of MD5, our solution would not be portable to PostgreSQL, as it does not provide a SHA1() function, although it does provide the digest() function, which can be used for SHA1 hashing. But, by using the abstraction layer for db.$md5, db.$sha1, or db.$concat, the query building engine that implements these functions for each targeted database can produce the appropriate query syntax to provide the desired effect, emitting SHA1(‘text’) on MySQL, but digest(‘text’, ‘sha1′) on PostgreSQL.

You may find yourself saying “well, it’s nice that you’ve abstracted the entire function list, but typing out all of that db.$md5 and db.$concat stuff is quite tedious in its own right, so I’m not sure I am willing to do that everywhere that I want to include code like this.” That is where the final key feature of db-filters becomes relevant: special fields. We can update our filter definition to include a map of special handlers in addition to column names by supplying a third argument to the constructor:

// Earlier column definitions omitted
var columns = {  ...  };
var special = {
'salt_password' : function(key, value, terms, options) {
	value = db.$eq(db.$md5(db.$concat(db.$field('salt'), value)));
	terms.push(value.get('password', this, options));
}
};

var users = new db('users', columns, special);

// Now, use the salt_password special key to invoke the handler, which can automatically apply rules
users.select({name : name, salt_password : pass})
	.limit(1)
	.exec(success, failure);

This generates identical SQL to the “long method” but it simplifies the calling code. The special handler is defined once using somewhat uglier syntax, but it can then be reused many times providing consistency in generating queries that use that condition, and it does not require sacrificing any abstraction to do so. This may seem contrived, as password logins generally happen in one place, but if you require a user to supply a password to change his settings in as well as when logging in, then you can define your password transformation once and use it in both places. By pushing terms to the array that is used to create where clauses, special handlers can emit zero or more terms, which allows them to be used to perform ORM as well, or any type of complicated decoding and parameter preparation.

My last two examples will be to demonstrate how db-filters handles INSERT and UPDATE queries, along with a comparison to db-mysql. In contrast to other libraries, db-filters uses the same filter decoding logic to specify parameters for insert and update queries (and delete), allowing you to leverage all of the function abstraction and the use of special key handlers in these queries as well, automatically.

-- Create a new post in our thread
INSERT INTO posts SET `threadId` = 1, `userId` = 1, `posted` = NOW(), `post` = 'Sample post'
-- Update the user's post count
UPDATE users SET `postCount` = `postCount` + 1 WHERE `id` = 1

The code will also introduce a useful feature of the helper functions: for all functions, if one of the arguments is missing, it is assumed to be the column that keys the value, compressing the syntax for the most common situation, computing fields as a function of themselves.

var user = {id : 1};
var post_text = 'Sample post';

// With db-filters
posts.insert({threadId : tId, userId : user.id, posted : db.$now(), post : post_text})
	.exec(success, failure);
users.update({postCount : db.$add(1)}, {id : user.id})
	.exec(success, failure);

// With db-mysql
this.query().insert('posts',
	['threadId', 'userId', 'posted', 'post'],
	[tId, user.id, {value : 'NOW', escape : false}, post_text])
	.execute(callback);
this.query().update('users')
	.set({postCount : {value : '`postCount` + 1', escape : false}})
	.where('id = ?', [user.id])
	.execute(callback);

By now I sound like a broken record, but again, with the approach used in db-filters, SQL is completely abstracted, whereas db-mysql includes writing raw SQL, in addition to the tedious syntax used to indicate literals that should not be escaped. The db-filter code could also be improved by adding a special key for ‘user’ that understands how to decode the id property from the user object, which would simplify the database “client” code that creates and updates posts.

Final Thoughts and the Future

One thing I want to make a note of is that it seems like I’m panning db-mysql pretty hard with all of my comparisons, but that isn’t really my goal. Realistically, the purpose of db-mysql is to provide an abstraction layer for the API to interact with the database, not to generate portable queries, and it seems that it handles that task well. But, it is always necessary to draw a comparison when advocating a different way to do things, and so it helps that db-mysql also attempts to provide a query abstraction interface, something that I don’t think it should be doing (and something that node-mysql doesn’t do). I think separating the two tasks makes a lot of sense: one library to perform low level communication, and another library to perform query generation, allowing either to be replaced if a better alternative is developed.

Next, I’d like to briefly address performance. It seems like with a powerful abstraction layer, you’d sacrifice a lot in terms of performance, but a quick benchmark of running the example code in this post 10,000 times, generating 70,000 queries total only took approximately 3330 ms, on average, on my Amazon EC2 micro instance. This works out to less than 50 us per query, which is significantly less time than you will spend waiting for and interacting with the database afterwards. The performance thus seems “acceptable” to me, but I do think it could be faster. This won’t really be improved until an undetermined time in the future, as there are more important things to work on.

Finally, db-filters still has a long way to go. At this point, it is possible to create a lot of queries with complex relationships, perform multi-way joins, and use most of the built-in MySQL functions, all without writing any actual SQL. But, there are gaps in the function list, mostly in the date/time area, as these functions can be used in the most diverse ways, and there are limitations to the types of structures that can be generated using where clauses. For instance, you cannot yet specify relationships between fields on two tables in a join using a WHERE clause—the only relationships can be in the ON parameter, which restricts the relationship to that of equality, based on the current implementation. Perhaps most importantly, db-filters currently only targets MySQL. In theory, the abstraction presented should allow it to be used with any RDBMS, but I am currently only developing the one translation engine, because I only use MySQL, so support for other engines is limited to those that are compatible with MySQL’s extensions to SQL. I’d really like to see it ported to support a NoSQL solution with the same interface, but that would be a tremendous undertaking and will definitely wait until the core API and features are frozen.

If you’ve made it this far, I hope I’ve convinced you that db-filters is the best solution for database interaction. It is a very specific solution that attacks only the problem of query generation, and it is specifically designed to address all of the problems that I’ve identified with existing solutions. The project is available on github as well as through npm. There is more complete API documentation, along with a function reference in the project README file, which I will (thankfully) omit here—I’m not trying to explain the details of using the library; I’m only trying to demonstrate its capability and advantages. Please post any bugs, feature requests, missing functions, or other suggestions on the github page.