|
The XmlTextReader - A Beginner's Guide
Introduction The XmlTextReader class is not the most intuitive class to work with, as the methods and properties are very low level. While the XmlTextReader class is rich in properties and methods, I've found that most of what it provides isn't necessary for the average day-to-day job. So, in this article, I'm going to present a moderately thin wrapper class for the XmlTextReader, which should be a helpful guide to using the XmlTextReader for programmers not familiar with this class. This article is also an introduction to a variety of other disciplines that I feel a beginner should be aware of--code commenting, abstraction and architecture, and unit tests. So, hopefully, there's something here for everybody! Why Use an XmlReader?To summarize from this site:
Naturally, XmlTextReader is closer to the XML. It is a forward only reader, meaning that you can't go backwards. Contrast this with the XmlDocument class, which pulls in the entire document and allows you to random-access the XML tree. The XmlTextReader supports streams, and therefore should reduce memory requirements for very large documents. Another advantage of the XmlTextReader is that it provides line and character position information, which can be very useful diagnostic information when there's a problem with the XML. What Features Do I Want to Support?There's a core set of features in XML that I want my reader to support:
There are other features of XML, but these are the most common ones, and the ones I want to start with. The link cited above demonstrates a somewhat different approach than I am doing here, and it's useful to briefly discuss the difference. In the SoftArtisans link, many of the code snippets demonstrate looking for and evaluating specific elements and attributes of the XML, including optional ones. In other words, the application has expectations regarding the XML graph and content. The reader that I am presenting is tailored more to processing ad hoc XML, where there are no expectations regarding the graph and the content. Both approaches have there value depending on what you need to get accomplished. The Unit TestsThe unit tests are written for my Advanced Unit Test application, downloadable here. The reason I'm using my unit test application instead of NUnit is because I want to take advantage of AUT's ability to execute tests in sequence, as I read through the XML. Yes, I could have instead written an XML fragment for each unit test, but I find this more convenient and more realistic, as I can work with the entire XML document. Here's the XML test document, which illustrates each of the features described above: Something this simple doesn't need an architecture, does it? In fact, it does. Even with something this simple, it's a good idea to consider what abstraction you might want (planning for the future) and helper objects that will make understanding and working with the code easier. And of course, we need to consider what kind of exceptions the reader will throw. As a side comment, it always surprises me how a good architecture, even for the simplest of functionality, practically eliminates monolithic code and helps to create nice small methods that are easily unit tested. I potentially want to read formats other than XML, while staying within the constraints of an XML-ish structure. For example, a comma separated value file (CSV) is a good candidate for an alternative reader implementation. By abstracting the reader, I can support alternative formats without having to change the code that uses the reader. This is a design decision that is best made early on. The reader implements an IReader interface that provides the necessary abstraction layer: Since this is a beginning article, I want to emphasize something here--there is no excuse for not putting in at least basic comments in your code. None. It is a discipline that I myself have worked hard to achieve, but if you're writing a professional application that you or others may one day need to maintain, you simply have to force yourself to become disciplined about writing comments. The interface:
Anyone interested in implementing a custom reader now knows what the custom reader needs to implement. An application needing a reader can now reference the reader via the IReader interface, and a factory pattern can be used to instantiate the appropriate reader. The Container ClassesThere are several container classes that help encapsulate information relevant to all nodes and relevant to specific nodes. Creating classes that encapsulate fields improves code readability and provides a layer of separation from the underlying implementation (the Reader class, in this case). And no, none of the container classes are unit tested--you have to draw the line somewhere, and these classes are much too simple to spend the time on unit testing. LineInfoAll XML nodes have line and character position information, which is encapsulated in the LineInfo class: Since this class is instantiated strictly by the reader, the properties are read-only. NodeInfoNodeInfo is an abstract class that encapsulates the two common elements of just about every XML node (there are a few exceptions): the node name and the node prefix. It's an abstract class because we want to make sure that the implementation utilizes an appropriate concrete class derived from NodeInfo. The concrete implementation improves readability (since it qualifies the type of node information), and usually provides additional fields specific to the node type. ElementNodeInfoThis class is a concrete implementation of NodeInfo, and adds a local namespace field, as elements can have local namespaces: This class is a concrete implementation of NodeInfo, and adds a value field, as attributes have values: This class derives from AttributeNodeInfo. A processing instruction has a name and a value, like an attribute, but I've implemented a separate class to represent the concept of a processing instruction, even though it does not extend the AttributeNodeInfo class. This is merely a code readability decision. Instead of talking about the XmlTextReader as a class and its methods, which you can easily read about yourself, I'm going to show you the XmlTextReaderwithin the context of my Reader wrapper. This way, instead of just looking at documentation, you'll see the XmlTextReader in actual code, and I'll explain what I'm doing in the code and why. Creating an XmlTextReaderQuite literally, the first stumbling block is in creating an XmlTextReader. It sounds simple, but according to Microsoft:
Second, I want to control some aspects of the reading process, specifically, I almost always want to ignore whitespace. The default XmlTextReader returns all whitespace. So, to properly construct an XmlTextReader using Microsoft's recommended method and to have the ability to set some options, we have to do something like this: Before I go further, this constructor is the one I use for the unit tests, and it takes an XML string. You might instead want a constructor that takes a stream, and as you can see in the first line, I create a StringReader stream. The second line creates an XmlReaderSettings instance, and I explicitly (just to show you another useful property) choose not to ignore comments, but I do want to ignore whitespace. Next, I create the XmlTextReader from the stream. But that's not enough. I now have to create an XmlReader, passing in theXmlTextReader and the desired settings. Now, we have properly constructed a reader, complying with Microsoft's guidelines, and having the ability to configure the reader to ignore whitespace. If you're wondering about the last line, we'll get to that later. The Constructor Unit TestThe constructor reveals the fact that the XmlTextReader does not position itself on a valid node immediately after construction, as the NodeType is "None". Reading the XML DeclarationReading the XML declaration, as with all other elements, requires calling ReadNode: My wrapper for the reader optionally skips end elements. If you don't do this, the reader will return EndElement node types, which, depending on what you are doing with the XML, may be superfluous. In the unit test constructor, this flag is set to true. The XML Declaration Unit TestAn XML declaration contains attributes just like an element node. I'll demonstrate the ReadFirstAttribute and ReadNextAttribute shortly. Reading the Root Node and Other ElementsImmediately following the XML declaration should be the root node. My reader provides an Element property which returns an ElementNodeInfo instance that encapsulates the element name, prefix, and optional namespace. Looking at the implementation: You'll see that the ElementNodeInfo also consists of the reader's line and character position, and the element name is stripped of the prefix. Reading the Root Node Unit TestReading an element node is straightforward, as the unit test demonstrates: The ReadNode method is called to move past the XML declaration node and onto the root node. The unit test verifies that this happened correctly. There's another element test later on, which tests that a local namespace has been correctly read: Most XML elements contain attributes, and the root node includes two attributes, one of which is an XML namespace declaration. The XmlTextReader provides two methods for reading an attribute, MoveToFirstAttribute and MoveToNextAttribute, which return a boolean true if successful, false otherwise. I've modified this implementation slightly: and: Both of these methods return an AttributeNodeInfo instance, encapsulating the reader's line and character position and the attribute name, prefix, and value. A null is returned if there are no further attributes to read. You can use these methods, or you can use another method that avoids having to figure out whether to call ReadFirstAttribute or ReadNextAttribute. My reader figures this out automatically for you, and here's where the firstAttributeboolean comes into play: The firstAttribute flag is set whenever ReadNode is called. It's cleared when the first attribute is read, either by calling ReadFirstAttribute orReadAttribute. The Attribute Unit TestsThe following sequence of unit tests test the first, next, and "smart" attribute reader: A CDATA block lets you include freeform text in the XML, such as code. My reader provides a CData property which returns the CDATA text as a string: As you can see, the CData property validates the node type that wraps the Value property. The CDATA Unit TestReading XML comments is just like getting the CDATA text. Once we know that the node is a comment node, we return the Value property which contains the comment text. The reader also trims any leading and trailing whitespace, which is often used to make the XML comments more readable. As the XmlTextReader moves through the XML, any inner text is its own Text node type. The reader's Text property is a thin wrapper for theXmlTextReader's Value property: Process instructions are another kind of XML nodes. These may contain useful meta-instructions for the engine that is processing the XML. The reader provides a thin wrapper for getting the process instruction: Lastly, one of the important things about XML is that it is hierarchical. The reader provides a thin wrapper to the XmlTextReader's Depth property (a very thin wrapper): The point being though that we need this property implemented by any class that realizes IReader. The Depth Unit TestThis unit test reveals one of the side-effects of ignoring the XML end element node type, which is that the depth can pop several levels. This should be taken into consideration when writing an application that actually does something with the XML. About the Author
|
Recent Posts
The XmlTextReader - A Beginner's Guide
Creating XML and Java Technology Based Application Source Code : XML for Beginners Source code : JavaScript For Beginners Top 5 ways to save on car insurance SQL Server : The Most Powerful DBA Tool How to Spend Your Credit Card Cash Back Money Types Of Hair Loss Treatment Available To Sufferer 5 Things You Must Have Succeeding in Affiliate Mar How To Get A Six Pack - Calling All Couch Potatoes Syndication Tools |
|
You are not logged in. FREE Sign Up or Log In
©2009 Flixya Entertainment, LLC. All rights reserved.




Free Sign Up - Start Making Money on Flixya »



