June 20, 2008
Binary lower bound search

Blogging from my wife's BlackBerry here... while I'm watching the kids outside. Should be fun!

I recently ran into a work problem where I had to find, in a sorted array, the largest entry smaller than my input number. You can approach this as a binary search problem, and I did at first. When Google couldn't find a good implementation in C, I wrote one of my own, even to the point of coding up test scaffoling in the style of Programming Pearls.

But then I recognized that the problem was actually called "finding the lower bound", which gives better search results. There's even an implementation in STL that I could use in my target application. More on that later.

Anyway, about a week later I've spent so much time thinking about the damn algorithm I haven't even put it in the target code yet!

Here's the code, in case anyone finds it useful. Might save someone a half hour coding and testing, in any case.


/* binary search, find the bucket in which v should belong
a is a strictly increasing integer array of n elements
bucket i is defined by k : a[i] <= k < a[i+1]
there are n-1 buckets in the array

 valid returns are 0..n-2
*/
int bsearch_lb( int a[], int n, int v )
{
	int l = 0;
	int u = n-1;
	
	/* this exit is useful if all you need is a validity test */
	if( v < a[l] || v >= a[u] )
		return -1;
	
	/* this exit is more useful if you need to know whether you're out range above or below */
	/* if( v < a[l] ) return -1; */
	/* if( v >= a[u] ) return n; */
	
	while( u - l > 1 )
	{
		int t = (l+u)/2; /* risk of overflow if n > 2**31 */
		
		if( v > a[t] )
			l = t;
		else if( v < a[t] )
			u = t;
		else /* v == a[t] */
			return t; /* by def. of bucket */
	}
	
	return l;
}

#include <stdio.h>
#include <assert.h>

int main()
{
	/* scaffolding */
	int i;
	
	/* equally spaced */
	int a[11] = { 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 };
	
	/* variable spaced, including neg. numbers */
	int a2[6] = { -100, 0, 100, 300, 700, 1000 };
	int t[] = { -101, -100,   -99, -1, 
		0,   1,   99, 
					   100, 101, 299, 
					   300, 301, 699,
					   700, 701, 999,
					   1000, 1001, 5000 };
	int v[] = { -1, 0, 0, 0,
		1, 1, 1,
		2, 2, 2,
		3, 3, 3,
		4, 4, 4,
		-1, -1, -1 };
	
	for( i = 0; i < 100; ++i )
	{
		int b = bsearch_lb( a, 11, i );
		assert( b == i/10 );
	}
	
	for( i = 0; i < sizeof(t)/sizeof(int); ++i )
	{
		int b = bsearch_lb( a2, 6, t[i] );
		assert( b == v[i] );
	}	
}
Posted by Sam at 11:36 AM
June 18, 2008
In which my innocent child is corrupted

Background is necessary. In one episode of Dora the Explorer (that's a children
s TV show), Dora finds a dinosaur egg which hatches. They help the baby dino get back home, all the while singing the Baby Dino song. With verses like:

Who's got the biggest footsteps around?
BABY DINO!

Unfortunately this song is strangely catchy. So we find ourselves singing it. And we screw with the words.

Who's got the rottenest liver around?
BABY WINO!
Who's got the smallest tax-cuts around?
BABY RINO!

I was particularly pleased with that one. But Kaija was upset by it, and wanted to know what I was talking about. Explaining Republican In Name Only seemed beyond the scope of the situation, so I started on taxes.

"See, taxes are when the government steals your money. So tax cuts are when they steal less of your money, so the cuts are good."

She frowned thoughtfully.

"Government bad. Taxes bad."

And in that moment I knew I was going to Hell.

Posted by Sam at 03:19 AM
I just bludgeoned a mouse to death

Fucking mouse.

Lame cat who won't catch the mouse, even when the mouse is pointed out to said cat with a flashlight.

Fucking mouse running around my desk, on the floor and on top of the desk too.

This is a big flashlight.

Hmmmmmmm...

WAP WAP WAP WAP WAP

One dead mouse.

Posted by Sam at 03:15 AM
June 15, 2008
Threads are Evil

The Magician of the Ivory Tower brought his latest invention for the master programmer to examine.

I followed a link labeled "Threads are evil" to this paper (pdf, few hundred k) by the Edward Lee, who is no less than the chair of the EECS department at the University of California, Berkeley.

The thrust of the paper (which is more than two years old) is that threads are a bad model for parallel execution because they make it hard to prove things and reason about programs. I'm not trying to make him sound unreasonable, but there it is. Actual excerpt:

That is, given a pair of multithreaded programs (p1, p2) and another pair (p1', p2'), when are these two pairs equivalent? A reasonable extension of the basic theory defines them to be equivalent if all interleavings halt for the same initial state and yield the same final state. The enormous number of possible interleavings makes it extremely difficult to reason about such equivalence except in trivial cases (where, for example, the state B** is partitioned so that the two programs are unaffected by each others’ partition)

Gotta love the theo-comp notation -- B** having been previously defined, you understand, as the union of B* and Bω. Anyway, as the paper unfolds, it turns out that the good doctor Lee believes that the cure is moving towards a different set of abstractions for concurrency -- expressing the meeting points between concurrent computations using "coordination languages", orthogonal to programming languages. A few examples (and a notation) are presented in the text of the paper, where we learn that Dr. Lee's group has worked for at least a decade on a system that models concurrency and on a graphical notation for expressing coordination.

It would be easier to accept what the ivory-tower crowd want to say if there didn't seem to be a profound conflict of interest at work. I'm not saying that a (self-described) concurrency expert shouldn't make pronouncements about concurrency. But he shouldn't be surprised when ordinary grunt programmers ignore his advice. And lamenting the penetration of academic languages like E and Erlang is, well, just silly.

The middle of the paper is spent giving an example of the Observer pattern in Java, and showing some of the problems that can occur when moving to a multithreaded system. But the example is not convincing -- or at least, I don't see why I would necessarily design a system where the listener routines perform complex computations and acquire locks in the trigger's thread, which is where the concern about deadlocks arises from. If deadlocks were a problem, I'd switch to a message-based architecture where the only thing done inside mutexes is list manipulation.

What's worse, his proposed solution is not convincing either -- he draws a pretty colored diagram and explains that in the coordination language of Ptolemy II, the Observer pattern is handled by a Rendezvous directory. Well, OK, but since he has defined the two value producers, value consumer, and observer as four separate otherwise noninteracting threads, I don't see where the deadlock risk arises from.

It's unfortunate because the message he is trying to deliver -- threads are hard -- is a good one, and one that needs to be heard and handled before the concurrency problems he describes become more commonplace. (I estimate, when quad-processor chips are standard on desktops, which will be around 2015 by my guess.) But expecting a significant fraction of the world's programmers to transform their mental models of programming and computation to accept a new symbolism is ... unlikely to be the solution.

Posted by Sam at 07:59 PM
June 14, 2008
Rails Sucks

I am messing around with a new web project and decided to play around with rails again -- last time I checked it out was a few years ago, and although it looked cool I didn't have enough time then to spend on the project I was playing with.

So I updated the rails install on my machine and got started with the project.

$ gem update rails -y

I like that the new version defaults to using sqlite for the database, saves me screwing around with MySQL (that's another post...). Then I was browsing around reading about testing frameworks and saw that Test::Rails is supposed to be the great thing since it allows you to separate view and controller tests. So I gem installed that and made the manual changes suggested.

And suddenly my tests were half broken.

It turns out that Test::Rails doesn't work with rails 2.0 or greater. and this has been known since January. So I wasted a half hour trying to make it play before I threw in the towel and went on with rails 2.x, having to do without Test::Rails.

Overgeneralizing: this is the problem with dynamically typed languages, hype, and the open-source process in general. Eric Hodel, the author of Test::Rails, probably has better things to do (like give talks and work on lucrative projects) than make Test::Rails work with rails 2.0. The rails developers can't be bothered to maintain compatibility with every rails-related plugin out there, and even if they did provide a patch to Hodel, there's no guarantee it got into his official repository. It may be living out there in the wild in somebody's git repository, but that's not exactly helpful to me.

Of course I could fix it myself. But I'm not trying to make the ruby/rails tools better, I'm just working on my project.

I love open source, and I'm continuing with the project on rails. But it becomes clearer and clearer to me why so many people turn to safe, boring, slow, ugly -- but nonbroken -- Microsoft solutions.

Posted by Sam at 11:41 PM
June 02, 2008
Stupid Republican Strategy

The GOP has a new talking point against Barack Obama:

Obama Has Not Been To Iraq Since January 2006:

In January 2006, Obama Took His Second Trip As A Senator To Qatar, Kuwait, Jordan, Iraq, Israel And The Palestinian Territories. “Obama’s second trip abroad as a U.S. senator starts in Qatar and, in addition to Iraq, will include stops in Kuwait, Jordan, Israel and the Palestinian territories, according to a statement from his Washington office.” (“Obama To Visit Middle East, Including Iraq,” The Associated Press, 1/4/06)

What fresh idiocy is this? "Ooh, ooh, Obama hasn't been to Iraq for 800 days, and McCain was just there!" All Obama has to do is wait until the Republicans finish digging a nice big hole for themselves, and then... GO TO IRAQ.

If he goes for three days, surprise visit right before the DNC, he looks great. Unconcerned about Hillary, confident to have the nomination, great photo ops of him meeting troops, Iraqi leaders, and US military leaders, and then when he comes back and says "Iraq is going well and it's time to bring the troops home" -- only then will the GOP realize that they handed him the election.

Idiots.

Posted by Sam at 11:16 AM
Thought for Today

You can tell that the word "natural" has no legal meaning because it's used so frequently in advertising. Think about the old Velveeta jingle:

"... seven natural cheeses are better than one!"

There's no such thing as a "natural cheese" in the strict sense. Cheese is a man-made food product and does not result from any natural process.

Posted by Sam at 08:20 AM