SAX (Simple API for XML) is a sequential access parser API for XML. See Wikipedia. As stated, SAX parsing requires less memory and no preprocessing. But, you should know that SAX cannot be an alternative to the DOM (Document Object Model) parser, because it is literally “simple”.

Anyway, there are not much SAX parser implementations in C++ out there. Most well-known libraries among them are Xerces-C++ library (from Aapache project) and Expat library. I tried both libraries and got to know that Expat was easier to customize, and above all, it was to my taste.

But, Expat library is based in C language instead of C++. So I wrote a simpler to use wrapper class having the following definition.


class SaxParser
{
public:
  SaxParser(SaxParserHandler& handler);
  SaxParser(SaxParserHandler& handler, const std::wstring& encodingName, bool bSkipWhitespaces);
  ~SaxParser();
public:
  // Parse XML text in memory.
  bool ParseXml(const std::wstring& xmlText);
  // Parse XML text in file.
  bool ParseXmlFromFile(const std::string& xmlFilePath, bool bUTF8);
private:
  // ...
}; 

SaxParserHandler class is defined as follow.


typedef std::pair<std::wstring, std::wstring> SaxParserAttributePair;

typedef std::vector<SaxParserAttributePair> SaxParserAttributes;

struct SaxParserHandler		
{
  virtual void OnElementStart(const std::wstring& element, const SaxParserAttributes& attributes);
  virtual void OnElementEnd(const std::wstring& element);
  virtual void OnCharacterData(const std::wstring& characterData);
  virtual void OnComment(const std::wstring& comment);
};

It’s straightforward and easy to use, at least IMHO.

  1. Define your own event handler class by deriving from SaxParserHandler class.
  2. Create a SaxParser instance with the handler you defined.
  3. Invoke ParseXml() or ParseXmlFromFile() method to start parsing.

And, here is my example code.

struct Handler : public SaxParserHandler
{	
  virtual void OnElementStart(const std::wstring& element, const SaxParserAttributes& attributes)	
  { 	
    std::wcout << "OnElementStart() pElement=" << element << std::endl;
    SaxParserAttributes::const_iterator it;	
    for( it = attributes.begin() ; it != attributes.end() ; ++it )	
    {	
      std::wcout << "\tAttribute: " << it->first << "=" << it->second << std::endl;	
    }	
  }
  virtual void OnElementEnd(const std::wstring& element) 	
  { 	
    std::wcout << "OnElementEnd() pElement=" << element << std::endl;	
  }
  virtual void OnCharacterData(const std::wstring& characterData) 	
  { 
    std::wcout << "OnCharacterData() characterData=" << characterData << std::endl;	
  }
  virtual void OnComment(const std::wstring& comment)	
  {	
    std::wcout << "OnComment() comment=" << comment << std::endl;	
  }
};

int main( int argc, char** argv )
{
  Handler handler;
  SaxParser parser( handler, L"UTF-8", true );
  parser.ParseXmlFromFile( "test.xml" );
  return 0;
}

 

You can download the source code here. Please note that all files of Expat library are included in ‘Expat’ sub-folder, and all *.c files include ‘ExpatDefs.h’ header for some configurations. I think you can easily modify the configurations and add more features to my wrapper class.