Thursday, July 14, 2005

Peeking at An Iterator
C'mon, You Know You Want To


I'm currently designing a set of classes that will be used for parsing and importing data.

Being a devotee of Design Patterns, I am quite familiar with the Iterator pattern, and felt that it was the appropriate tool to apply to parsing rows in text-based files. Using an interator, rows could be parsed one character at a time, looking for field delimiters (for example, in comma delimited files, the field delimiter is a comma), and moving the data into into the appropriate field in a DataRow.

Strings in .NET already have an iterator associated with them. Strings implement the IEnumerable interface (from the System.Collections namespace), and therefore have a method called GetEnumerator. String's GetEnumerator method returns an instance of the CharEnumerator class (again, in System.Collections). As the name suggests, it iterates through through the characters in the string.

Iterators in .NET basically know 3 things:
  • how to view the current item in the aggregate (via the Current property).
  • how to move to the next item in the aggregate (via the MoveNext method)
  • how to determine whether there is anything left to iterate (via the return value of the MoveNext method)
Notice that Iterator does not, by default, provide a method for moving backwards: the pattern does suggest that it is possible to add this behavior, but says that there are some aggregates that cannot be iterated in reverse: for example, in ADO a forward-only cursor can only be iterated in one direction - forward.

I point this out because of an issue that I ran into while parsing Comma-Separated-Value (CSV) files. Certain CSV-formats surround character field data with double quotes: this allows commas to be be interpreted as part of the character field data. The side-effect of this is that if the character data needs to contain a double quote, you must put two double quotes right next to each other. For example, the following string:

I said to him, "Walter get off my foot." But he didn't listen.

would be:

"I said to him, ""Walter get off my foot."" But he didn't listen."

If you are reading a string one character at a time, and run into a double quote, you need to be able to look at the following character to determine whether the field data is ending, or whether the field data contains a double quote. The problem lies in the fact that to peek at the next character you need to move the Iterator forward one character: once you have done that, you cannot move backward.

Another quality of Iterators is that they maintain their own state (where they are in the aggregate: ie. which character position in the string the Iterator is pointing at). Because they maintain their own state, you can point multiple iterators at the same aggregate (ex. string). I can take advantage of this by:

If the Iterator I'm using to walk the string (ex. oIterator) is pointing at a double quote character

  • Creating another instance of CharEnumerator (ex. oPeekingIterator) that is pointing at the same character in the string.
  • Telling oPeekIterator to move forward one character by calling its MoveNext method
  • Read the character from oPeekIterator and see if it is a quote
  • Throw away the oPeekIterator
Note that moving in oPeekIterator does not change which character oIterator is pointing at: that's why you can throw it away. But how do you get an Iterator that points to the same place in the string? Take out your genetics kit and Clone it:

private char Peek(CharEnumerator oIterator) {
// Create a clone of the Iterator
// Since a clone is a copy, it will have the same state (will be pointing at the same character)

CharEnumerator oPeekingIterator = oIterator.Clone();

// Move the peeking iterator ahead one character

oPeekingIterator.MoveNext();

// return the current character
//(the character one-past the character that oIterator is pointing to

return oPeekingIterator.Current;

}

I know that error checking needs to be included in the routine, but I'm demonstrating a technique.

If you are not at least aware of Design Patterns, I would invite you to take a look, google it, even. It provides many Eureka! slaps and helps you avoid "How stupid could I be" slaps. Check it out ...

or else I'll have to slap you. ;-)

No comments: