Deleting From A Collection in STL

July 28th, 2010

Deleting from a collection is hard in STL. Or rather, it’s subtle. Which is roughly equivalent to hard.

Jim Beveridge talks about STL erase in his latest blog post and that inspired me to spend some time messing around with STL containers last night.

You can remove all even numbers from a set with Jim’s code:

set<int> coll;
/* ... */
for (set<int>::iterator itr=coll.begin(); itr!=coll.end(); /*EMPTY*/ )
   if (*itr%2 == 0)
      coll.erase(itr++);
   else
      ++itr;

But you can’t take this code, replace “set” with “vector”, and use it to remove all even numbers from a vector. For one thing, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point, so the trick of post-incrementing itr doesn’t help. You do indeed get a pointer to the next element which is valid until erase is called, which then invalidates the iterator. Fortunately erase doesn’t reallocate the vector’s storage, so the iterator is just off by one. Still, if you populate your vector with the values 1,2,2,4,5 and run it through the above loop, you’ll wind up with 1,2,5.

Here’s how to remove all the even numbers from a vector:

vector<int> coll;
/* ... */
for (vector<int>::iterator itr=coll.begin(); itr!=coll.end(); /*EMPTY*/ )
   if (*itr%2 == 0)
      itr = coll.erase(itr);
   else
      ++itr;

Note that vector::erase returns an iterator which points to the next value (and, incidentally, the same memory location). That’s because it’s an STL sequence, while set is an associative container. (There are five different STL containers: the three sequences vector, list and deque and two associative containers set and multiset.)

So, given that there are two different interfaces for erase corresponding to two different container types, how can I write a template function that deletes even elements from a templated container? I can’t just write two versions of delete_even with different implementations; they need to have different signatures. I can’t use iterator tags because container category isn’t a property of the iterator: set and list both have bidirectional iterators, but are in different container categories.

Here’s what I wound up with:

struct assoc_container_tag {};
struct sequence_tag {};

template <class T> sequence_tag container_category( const vector<T>& v ) { return sequence_tag(); }
template <class T> sequence_tag container_category( const list<T>& v ) { return sequence_tag(); }
template <class T> sequence_tag container_category( const deque<T>& v ) { return sequence_tag(); }

template <class T> assoc_container_tag container_category( const set<T>& v ) { return assoc_container_tag(); }
template <class T> assoc_container_tag container_category( const multiset<T>& v ) { return assoc_container_tag(); }

template <typename T, typename P> void delete_if( T& t, P pred, assoc_container_tag )
{
  for( typename T::iterator i = t.begin(); i != t.end(); )
    if( pred(*i) )
      t.erase( i++ );
    else
      ++i;
}

template <typename T, typename P> void delete_if( T& t, P pred, sequence_tag )
{
  for( typename T::iterator i = t.begin(); i != t.end(); )
    if( pred(*i) )
      i = t.erase( i );
    else
      ++i;
}

template <typename T, typename P> void delete_if( T& t, P pred )
{
  delete_if( t, pred, container_category(t) );
}

I defined two empty types to differentiate between the two container categories. I wrote a templated function container_category and partially specialized it for the five basic container types, returning the appropriate container category tag. Then come two container-category-specific versions of the deletion loop, using either Jim’s postincrementing iterator trick t.erase( i++ ); or the returned iterator i = t.erase(i). The entry point is the two-argument version of delete_if, which uses container_category to select (at compile-time) the appropriate three-argument version of delete_if to call.

Note that to make it look more like STL I’ve factored out the test for evenness, and now you can pass in an arbitrary predicate. And I’ve changed the routine name from delete_even to delete_if.

Why does that sound vaguely familiar?

[...]

Oh yes. Because there is an STL function remove_if which does exactly the same thing, only better, because you can give it an iterator range instead of an entire container.

So: was all this work wasted? No! Object categories are still a useful technique, and it’s good to play around with them so that they’re familiar when needed. It’s important to think about the properties of containers and their iterators, and to know which operations can invalidate iterators. I had an insight about vector::erase and iterator invalidation, but I’m not sure how to put it into words yet.

And anyway, remove_if has problems of its own, which I’ll get to in another post.

Cygwin setup.exe not a valid win32 application

July 23rd, 2010

I ran into this error - “setup.exe is not a valid win32 application” the other day while I was trying to upgrade cygwin (1.5 -> 1.7) on an old laptop.

Underlying problem was that I was upgrading under a different account than had been used to originally install cygwin. Switching to that account and running the upgrade was an effective workaround. (Ineffective attempted workarounds included re-downloading several times, explicitly running as administrator, swearing, and banging on the desk).

So: in case anyone else gets bitten by it, try that.

Domesticity

January 28th, 2010

Baked bread, did the dishes and laundry, and the kids are fed.

Weird.

(Actually this is a test of the Wordpress-for-Blackberry app.)

December 3rd, 2009

Arnold Kling on the Iraq War:

1. Robert Kagan writes

The foreign policy establishment and intellectual world are much the same. They fully supported intervention in Vietnam, mostly supported intervention in Iraq and fully supported the war in Afghanistan — until the wars got hard, or embarrassing and difficult to defend in polite company. Then they bailed, desperately trying to cover their tracks along the way, and offering reassuring images of what losing would look like.

I’ll plead guilty on Iraq. However, my perspective is the opposite of Kagan’s. The problem for intellectuals is not that they lack the persistence to stick with these sorts of wars. It is that they lack the wisdom to avoid them.

Even if one accepts the nationalistic case for war (which many libertarians do not), plain pragmatism would say that you stay out of wars that you can only win by remaking another society. The fact that some intellectuals only appreciate this after the war is underway is a sad commentary on our hubris.

What I’d like to see is an analysis of the Iraq war from a signaling perspective. In my view (aligned with Realist foreign policy), we did not go in there for nationalist reasons or with the primary aim of making Iraq a better place. We went to Iraq to show people like Saddam Hussein (i.e., tin-pot dictators) that behaving like Saddam Hussein would lead to a sad end (i.e., hanging). Which I think we did show.

At a substantial expense in human lives.

Disclosure

October 29th, 2009

Requiring disclosure of campaign donors is grabbing the problem by the wrong end.  I think it would be better to do this:

  • Members of Congress are forbidden from handling cash or accepting gifts during their terms.
  • Their salary is set at $500,000 per year (higher for Senators) and is expected to cover all their needs and expenses.  (If they have to fly to Hawaii to attend a committee meeting, they buy their own ticket.)
  • They can access their single bank account with a credit/debit card issued on assuming office (all other assets are placed in trust); all transactions posted in real time on a public website.
  • Their movements are tracked by GPS and logged, and posted 24-hours-delayed on a public website.
  • They cannot accept gifts with any cash value (no plaques, lunches, golf course visits).
  • If anybody catches them breaking any of these rules and obtains documentary evidence (photo or video), they earn $5,000 per incident which is directly deducted from the representative’s bank account.

C++ Runtime Types: Find All Classes Derived From a Base Class (3)

October 19th, 2009

Working on a way to find all the derived classes of a base class: part 1 is here, discussing the motivation and frame of the problem.  Part 2 is here, discussing one approach that works but not for me.

The problem is laziness: if I am in a hurry to put in a new error message and can build a version and send it to testing without updating the error class registry, then eventually I will do it.  So how can we use laziness to our advantage?  How can we make having a consistent error registry the easiest way to get the code to compile?

I found an article on codeproject that also provides a framework for runtime derived-class discovery.  It’s substantially uglier than the meatspace solution quoted in part 2, but it addresses all of the weaknesses: index reversal, multiple hierarchies, and most importantly (for me) compile-time error if a derived class is not registered.  What we’re going to wind up with is this:


class CError: {
DECLARE_ROOT_CLASS(CError);
public:
  // body as in part 1
};

// error logging function
void errorlog( const CError& );

class CDerivedError: {
DECLARE_LEAF_CLASS(CError);
public:
  CUnableToOpenFile() { Init( L"sample.txt", 2 ); }
  CUnableToOpenFile( const wchar_t * psPath,
       DWORD dwError = GetLastError() )
  { Init( psPath, dwError ); }
  format_type Format() const
  { return "Unable to open file {0}: {1}"; }
};

Later on, in the implementation file for the error system:


IMPLEMENT_ROOT_CLASS(CError)

IMPLEMENT_LEAF_CLASS(CError,CUnableToOpenFile)
IMPLEMENT_LEAF_CLASS(CError,CBadMagicNumber)
// etc.

Why does this help? Why does it make any difference at all? By designing the system this way, there is now a chain of compile-time or link-time — at any case, not runtime — dependencies that make maintaining a consistent error table the easiest way, the laziest way, to add an error.

So under the hood, what are the macros doing? DECLARE_ROOT_CLASS() is the heaviest. It declares static member functions on CError, and it declares a vector of class factories as a static member variable. (Notice that it does not use the elegant Singleton pattern that the meatspace solution did, and I might change that.) It also declares a pure virtual function on CError — GetClassID().

IMPLEMENT_ROOT_CLASS() is mechanical, it just provides the implementation of the factory-adding mechanism.

DECLARE_LEAF_CLASS() is an interesting beast. It declares a static class variable. It implements two functions that are necessary to make CDerived a concrete class: it provides an implementation for GetClassID(), and it provides an implementation for CreateObject(), required by the class factory template. If I remove DECLARE_LEAF_CLASS, I get these errors:

errorlog.cpp(50) : error C2039: ‘CreateObject’ : is not a member of ‘CUnableToOpenFile’

e\ unabletoopenfile.h(3) : see declaration of ‘CUnableToOpenFile’

errorlog.cpp(50) : error C2259: ‘CUnableToOpenFile’ : cannot instantiate abstract class due to following members:

e\unabletoopenfile.h(3) : see declaration of ‘CUnableToOpenFile’

errorlog.cpp(50) : warning C4259: ‘int __thiscall CError::ClassID(void) const’ : pure virtual function was not defined errorlog.h(31) : see declaration of ‘ClassID’

And  IMPLEMENT_LEAF_CLASS() defines the static class variable that was declared by DECLARE_LEAF_CLASS().  The initialization of that static variable (at runtime, before main() executes) causes the class to be registered in CError’s factory table.  So if I create and use an error class but forget to use the IMPLEMENT macro, I get the following errors:

File.obj : error LNK2001: unresolved external symbol “private: static class CBootStrapper<class CError> CUnableToOpenFile::s_oBootStrapperInfo” (?s_oBootStrapperInfo@CUnableToOpenFile@@0V?$CBootStrapper@VCError@@@@A)

.debug/program.exe : fatal error LNK1120: 1 unresolved externals

To recap: there is a chain of compile-time or link-time errors that ensure that when a new error class is created, it is available through the runtime error-class table maintained in CError.  The chain runs like this:
  1. Convert an old-style error message to a new one by creating  a call to errorlog( CErrorName( arg1, arg2 ) );
  2. Errorlog requires a class derived from CError, so create CErrorName as a new CError-derived class in a header file
  3. In order to derive from CError, we must supply bodies for pure virtual functions declared in CError; the simplest way to do that is with DECLARE_LEAF_CLASS
  4. External dependencies are created by DECLARE_LEAF_CLASS, and the simplest way to resolve them is to use IMPLEMENT_LEAF_CLASS
  5. IMPLEMENT_LEAF_CLASS causes the class to be registered in the error table.
There are other design issues (like, “Why did the derived error class suddenly get a default constructor?”) which I hope to address in a later article; also there are other design considerations about the error logging, the choice of fastformat, “fun” “quirks” of fastformat, getting the errors into a useful database in a useful way (in this case it means using sqlite and possibly using an ORM), which will come up later.
I also made some changes to the original version of BootStrap.h which I hope to publish soon.  I’m not yet completely happy with the templates and macros that I’m using, but I haven’t figured out what I want to change yet.

C++ Runtime Types: Find All Classes Derived From a Base Class (2)

October 18th, 2009

Working on a way to find all the derived classes of a base class: part 1 is here, discussing the motivation and frame of the problem.

StackOverflow gives a pointer to meatspace (a blog), where he’s got a solution involving factory methods and a single macro per class.


class Base;

typedef Base* (*base_creator)(void);

class BaseRegistry {
    public:
        std::vector m_bases;
};

class BaseRegistration {
    public:
        BaseRegistration(base_creator);
};

#define AUTO_REGISTER_BASE(base) \
    BaseRegistration _base_registration_ ## base(&base_factory);

It’s functional: you can walk through all the registered base classes and create an instance of each.   The example doesn’t provide a way to reverse the index (to go from a derived class its position in the factory method table) but that would be easy to add.

Another minor complaint is that this example hardcodes the identity of the base class. If you wanted to have two distinct hierarchies of derived classes, you’d need to change a few things, not just a single template argument.

Another minor complaint — so minor that I hesitate to mention it — is that each of the derived classes is assumed to have its own implementation file. That’s fine for a situation where the derived classes are doing some actual work, but my error classes are very lightweight and won’t have their own implementation files. This is a purely aesthetic objection.

The main drawback that I see is maintainability: it seems too easy to fall through the cracks. I can declare a class derived from type CBase and if I forget to put the AUTO_REGISTER_BASE(CDerived) macro in an implementation file, the class doesn’t get registered and therefore can’t be enumerated. In this framework, there’s no way to enforce the rule that each derived class must be registered.

C++ Runtime Types: Find All Classes Derived From a Base Class (1)

October 17th, 2009

I’m working on a program in C++, recently converted from C.  I’m redesigning the error logging subsystem to have type-safety, log-to-database, and also to make it possible to generate a list of all the errors in the system.  So what I’ve decided to do is have a base class CError, and require that all the errors derive from it:

class CError
{
public:
  typedef const char * format_type;
  typedef std::string message_type;

  virtual format_type Format() const = 0;
  virtual format_type Message() const { return Format(); }

  virtual ~CError() {}

private: // disable copying
  CError( const CError& );
  CError& operator=( const CError& );
};

And an actual error message might look like this:


class CUnableToOpenFile : public CError
{
public:
  CUnableToOpenFile( const wchar_t * psPath, 
       DWORD dwError = GetLastError() )
  { Init( psPath, dwError ); }
  format_type Format() const
  { return "Unable to open file {0}: {1}"; }
};

Init is a function that converts the error code to a string and uses fastformat to format the string.  (I’m omitting some of the mechanism here, because I want to talk about the runtime type discovery issue.)

OK, so now what?  I have a bunch of types that derive from CError, and no way to enumerate them at runtime.

Well maybe I could do what ruby does, enumerate all the classes in the type system and see if they can be cast down to CError.  I could do that if I had a list of all the classes in the system, which Ruby does have; the  C++ runtime has it too (to implement type_info and dynamic_cast), but it can’t give it to me.  I still need to separately maintain some a list of (at least) the types I’m interested in.

Library Annoyances

October 7th, 2009

Why aren’t libraries more commercial?  I find a book in the library.  I REALLY like it.  The library won’t sell it to me.  I mean, not over the table.  Of course I can “lose” it and the library takes the replacement cost, a processing fee, and a penalty (!) money and either replaces the book or doesn’t.  But why don’t they straight-up sell books out of their catalog?

Or how about this: you’re 12,000th in line for Dan Brown’s new magnum o’piss (sorry).  But if you pay $3 we will put you in this other line for “preferred” customers, and we will guarantee that you don’t get a dog-eared copy, to boot!  Or if you prefer we will sell you a copy for 40% off the cover price.

Another annoyance, not commercial: why don’t they enter author/title info for every book?  I get it that before electronic cataloguing and checkout it was labor-intensive, but these days it’s ludicrous to put JUVENILE BOARDBOOK on something which comes barcoded with ISBN and EAN, when the library is affixing not one but two identifiers (library barcode and library RFID), and full author/title/cover picture information is available online keyed by ISBN.  Can’t we do better than this?  I ask because somewhere in my house are two overdue JUVENILE BOARDBOOKs, one FICTION PAPERBACK, and one FICTION MYSTERY PAPERBACK, and it would be helpful when looking under couches &c to know what the silly things look like.

More Polanski

October 6th, 2009

Can’t get it out of my head, sorry.

I think that the people who are defending him must be operating out of some sort of cognitive dissonance.  They know him, or met him once — sat next to him at some banquet, or helped with the editing on The Pianist — and they didn’t recoil from him.  He was a normal enough guy.  He was nice.

Now that he’s in custody and the whole child-rapist thing is coming up again, they’re all “Whaa?  How can you do that?  He was so niiice!”  And it’s not so much that they know him and we don’t (though that’s true, of course, how could the little people know the true artist’s heart that beats inside Roman’s chest).  But it’s also that in condemning him for his crime and his flight, and in condemning those who have sheltered him over the years, those little people (how dare they!) profess to pass judgment not just on Polanski but on all of his enablers.

I’m glad that my job has never asked me to sit next to a child-rapist and pretend to be pleasant.