Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: code/branches/netp2/src/tinyxml/readme.txt @ 3008

Last change on this file since 3008 was 2710, checked in by rgrieder, 16 years ago

Merged buildsystem3 containing buildsystem2 containing Adi's buildsystem branch back to the trunk.
Please update the media directory if you were not using buildsystem3 before.

  • Property svn:eol-style set to native
File size: 19.4 KB
Line 
1/** @mainpage
2
3<h1> TinyXML </h1>
4
5TinyXML is a simple, small, C++ XML parser that can be easily
6integrated into other programs.
7
8<h2> What it does. </h2>
9       
10In brief, TinyXML parses an XML document, and builds from that a
11Document Object Model (DOM) that can be read, modified, and saved.
12
13XML stands for "eXtensible Markup Language." It allows you to create
14your own document markups. Where HTML does a very good job of marking
15documents for browsers, XML allows you to define any kind of document
16markup, for example a document that describes a "to do" list for an
17organizer application. XML is a very structured and convenient format.
18All those random file formats created to store application data can
19all be replaced with XML. One parser for everything.
20
21The best place for the complete, correct, and quite frankly hard to
22read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
23http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML
24(that I really like) can be found at
25<a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>.
26
27There are different ways to access and interact with XML data.
28TinyXML uses a Document Object Model (DOM), meaning the XML data is parsed
29into a C++ objects that can be browsed and manipulated, and then
30written to disk or another output stream. You can also construct an XML document
31from scratch with C++ objects and write this to disk or another output
32stream.
33
34TinyXML is designed to be easy and fast to learn. It is two headers
35and four cpp files. Simply add these to your project and off you go.
36There is an example file - xmltest.cpp - to get you started.
37
38TinyXML is released under the ZLib license,
39so you can use it in open source or commercial code. The details
40of the license are at the top of every source file.
41
42TinyXML attempts to be a flexible parser, but with truly correct and
43compliant XML output. TinyXML should compile on any reasonably C++
44compliant system. It does not rely on exceptions or RTTI. It can be
45compiled with or without STL support. TinyXML fully supports
46the UTF-8 encoding, and the first 64k character entities.
47
48
49<h2> What it doesn't do. </h2>
50
51TinyXML doesn't parse or use DTDs (Document Type Definitions) or XSLs
52(eXtensible Stylesheet Language.) There are other parsers out there
53(check out www.sourceforge.org, search for XML) that are much more fully
54featured. But they are also much bigger, take longer to set up in
55your project, have a higher learning curve, and often have a more
56restrictive license. If you are working with browsers or have more
57complete XML needs, TinyXML is not the parser for you.
58
59The following DTD syntax will not parse at this time in TinyXML:
60
61@verbatim
62        <!DOCTYPE Archiv [
63         <!ELEMENT Comment (#PCDATA)>
64        ]>
65@endverbatim
66
67because TinyXML sees this as a !DOCTYPE node with an illegally
68embedded !ELEMENT node. This may be addressed in the future.
69
70<h2> Tutorials. </h2>
71
72For the impatient, here are some tutorials to get you going. A great way to get started,
73but it is worth your time to read this (very short) manual completely.
74
75- @subpage ticppTutorial
76- @subpage tutorial0
77
78<h2> Code Status.  </h2>
79
80TinyXML is mature, tested code. It is very stable. If you find
81bugs, please file a bug report on the sourceforge web site
82(www.sourceforge.net/projects/tinyxml). We'll get them straightened
83out as soon as possible.
84
85There are some areas of improvement; please check sourceforge if you are
86interested in working on TinyXML.
87
88<h2> Related Projects </h2>
89
90TinyXML projects you may find useful! (Descriptions provided by the projects.)
91
92<ul>
93<li> <b>TinyXPath</b> (http://tinyxpath.sourceforge.net). TinyXPath is a small footprint
94     XPath syntax decoder, written in C++.</li>
95<li> <b>@subpage ticpp</b> (http://code.google.com/p/ticpp/). TinyXML++ is a completely new
96     interface to TinyXML that uses MANY of the C++ strengths. Templates,
97         exceptions, and much better error handling.</li>
98</ul>
99
100<h2> Features </h2>
101
102<h3> Using STL </h3>
103
104TinyXML can be compiled to use or not use STL. When using STL, TinyXML
105uses the std::string class, and fully supports std::istream, std::ostream,
106operator<<, and operator>>. Many API methods have both 'const char*' and
107'const std::string&' forms.
108
109When STL support is compiled out, no STL files are included whatsoever. All
110the string classes are implemented by TinyXML itself. API methods
111all use the 'const char*' form for input.
112
113Use the compile time #define:
114
115        TIXML_USE_STL
116
117to compile one version or the other. This can be passed by the compiler,
118or set as the first line of "tinyxml.h".
119
120Note: If compiling the test code in Linux, setting the environment
121variable TINYXML_USE_STL=YES/NO will control STL compilation. In the
122Windows project file, STL and non STL targets are provided. In your project,
123It's probably easiest to add the line "#define TIXML_USE_STL" as the first
124line of tinyxml.h.
125
126<h3> UTF-8 </h3>
127
128TinyXML supports UTF-8 allowing to manipulate XML files in any language. TinyXML
129also supports "legacy mode" - the encoding used before UTF-8 support and
130probably best described as "extended ascii".
131
132Normally, TinyXML will try to detect the correct encoding and use it. However,
133by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXML
134can be forced to always use one encoding.
135
136TinyXML will assume Legacy Mode until one of the following occurs:
137<ol>
138        <li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf)
139                 begin the file or data stream, TinyXML will read it as UTF-8. </li>
140        <li> If the declaration tag is read, and it has an encoding="UTF-8", then
141                 TinyXML will read it as UTF-8. </li>
142        <li> If the declaration tag is read, and it has no encoding specified, then TinyXML will
143                 read it as UTF-8. </li>
144        <li> If the declaration tag is read, and it has an encoding="something else", then TinyXML
145                 will read it as Legacy Mode. In legacy mode, TinyXML will work as it did before. It's
146                 not clear what that mode does exactly, but old content should keep working.</li>
147        <li> Until one of the above criteria is met, TinyXML runs in Legacy Mode.</li>
148</ol>
149
150What happens if the encoding is incorrectly set or detected? TinyXML will try
151to read and pass through text seen as improperly encoded. You may get some strange results or
152mangled characters. You may want to force TinyXML to the correct mode.
153
154You may force TinyXML to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or
155LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all
156the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may
157force it to TIXML_ENCODING_UTF8 with the same technique.
158
159For English users, using English XML, UTF-8 is the same as low-ASCII. You
160don't need to be aware of UTF-8 or change your code in any way. You can think
161of UTF-8 as a "superset" of ASCII.
162
163UTF-8 is not a double byte format - but it is a standard encoding of Unicode!
164TinyXML does not use or directly support wchar, TCHAR, or Microsoft's _UNICODE at this time.
165It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding
166of unicode. This is a source of confusion.
167
168For "high-ascii" languages - everything not English, pretty much - TinyXML can
169handle all languages, at the same time, as long as the XML is encoded
170in UTF-8. That can be a little tricky, older programs and operating systems
171tend to use the "default" or "traditional" code page. Many apps (and almost all
172modern ones) can output UTF-8, but older or stubborn (or just broken) ones
173still output text in the default code page.
174
175For example, Japanese systems traditionally use SHIFT-JIS encoding.
176Text encoded as SHIFT-JIS can not be read by TinyXML.
177A good text editor can import SHIFT-JIS and then save as UTF-8.
178
179The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great
180job covering the encoding issue.
181
182The test file "utf8test.xml" is an XML containing English, Spanish, Russian,
183and Simplified Chinese. (Hopefully they are translated correctly). The file
184"utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that
185if you don't have the correct fonts (Simplified Chinese or Russian) on your
186system, you won't see output that matches the GIF file even if you can parse
187it correctly. Also note that (at least on my Windows machine) console output
188is in a Western code page, so that Print() or printf() cannot correctly display
189the file. This is not a bug in TinyXML - just an OS issue. No data is lost or
190destroyed by TinyXML. The console just doesn't render UTF-8.
191
192
193<h3> Entities </h3>
194TinyXML recognizes the pre-defined "character entities", meaning special
195characters. Namely:
196
197@verbatim
198        &amp;   &
199        &lt;    <
200        &gt;    >
201        &quot;  "
202        &apos;  '
203@endverbatim
204
205These are recognized when the XML document is read, and translated to there
206UTF-8 equivalents. For instance, text with the XML of:
207
208@verbatim
209        Far &amp; Away
210@endverbatim
211
212will have the Value() of "Far & Away" when queried from the TiXmlText object,
213and will be written back to the XML stream/file as an ampersand. Older versions
214of TinyXML "preserved" character entities, but the newer versions will translate
215them into characters.
216
217Additionally, any character can be specified by its Unicode code point:
218The syntax "&#xA0;" or "&#160;" are both to the non-breaking space characher.
219
220<h3> Printing </h3>
221TinyXML can print output in several different ways that all have strengths and limitations.
222
223- Print( FILE* ). Output to a std-C stream, which includes all C files as well as stdout.
224        - "Pretty prints", but you don't have control over printing options.
225        - The output is streamed directly to the FILE object, so there is no memory overhead
226          in the TinyXML code.
227        - used by Print() and SaveFile()
228
229- operator<<. Output to a c++ stream.
230        - Integrates with standart C++ iostreams.
231        - Outputs in "network printing" mode without line breaks. Good for network transmission
232          and moving XML between C++ objects, but hard for a human to read.
233
234- TiXmlPrinter. Output to a std::string or memory buffer.
235        - API is less concise
236        - Future printing options will be put here.
237        - Printing may change slightly in future versions as it is refined and expanded.
238
239<h3> Streams </h3>
240With TIXML_USE_STL on TinyXML supports C++ streams (operator <<,>>) streams as well
241as C (FILE*) streams. There are some differences that you may need to be aware of.
242
243C style output:
244        - based on FILE*
245        - the Print() and SaveFile() methods
246
247        Generates formatted output, with plenty of white space, intended to be as
248        human-readable as possible. They are very fast, and tolerant of ill formed
249        XML documents. For example, an XML document that contains 2 root elements
250        and 2 declarations, will still print.
251
252C style input:
253        - based on FILE*
254        - the Parse() and LoadFile() methods
255
256        A fast, tolerant read. Use whenever you don't need the C++ streams.
257
258C++ style output:
259        - based on std::ostream
260        - operator<<
261
262        Generates condensed output, intended for network transmission rather than
263        readability. Depending on your system's implementation of the ostream class,
264        these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML:
265        a document should contain the correct one root element. Additional root level
266        elements will not be streamed out.
267
268C++ style input:
269        - based on std::istream
270        - operator>>
271
272        Reads XML from a stream, making it useful for network transmission. The tricky
273        part is knowing when the XML document is complete, since there will almost
274        certainly be other data in the stream. TinyXML will assume the XML data is
275        complete after it reads the root element. Put another way, documents that
276        are ill-constructed with more than one root element will not read correctly.
277        Also note that operator>> is somewhat slower than Parse, due to both
278        implementation of the STL and limitations of TinyXML.
279
280<h3> White space </h3>
281The world simply does not agree on whether white space should be kept, or condensed.
282For example, pretend the '_' is a space, and look at "Hello____world". HTML, and
283at least some XML parsers, will interpret this as "Hello_world". They condense white
284space. Some XML parsers do not, and will leave it as "Hello____world". (Remember
285to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become
286Hello___world.
287
288It's an issue that hasn't been resolved to my satisfaction. TinyXML supports the
289first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior.
290The default is to condense white space.
291
292If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool )
293before making any calls to Parse XML data, and I don't recommend changing it after
294it has been set.
295
296
297<h3> Handles </h3>
298
299Where browsing an XML document in a robust way, it is important to check
300for null returns from method calls. An error safe implementation can
301generate a lot of code like:
302
303@verbatim
304TiXmlElement* root = document.FirstChildElement( "Document" );
305if ( root )
306{
307        TiXmlElement* element = root->FirstChildElement( "Element" );
308        if ( element )
309        {
310                TiXmlElement* child = element->FirstChildElement( "Child" );
311                if ( child )
312                {
313                        TiXmlElement* child2 = child->NextSiblingElement( "Child" );
314                        if ( child2 )
315                        {
316                                // Finally do something useful.
317@endverbatim
318
319Handles have been introduced to clean this up. Using the TiXmlHandle class,
320the previous code reduces to:
321
322@verbatim
323TiXmlHandle docHandle( &document );
324TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).ToElement();
325if ( child2 )
326{
327        // do something useful
328@endverbatim
329
330Which is much easier to deal with. See TiXmlHandle for more information.
331
332
333<h3> Row and Column tracking </h3>
334Being able to track nodes and attributes back to their origin location
335in source files can be very important for some applications. Additionally,
336knowing where parsing errors occured in the original source can be very
337time saving.
338
339TinyXML can tracks the row and column origin of all nodes and attributes
340in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return
341the origin of the node in the source text. The correct tabs can be
342configured in TiXmlDocument::SetTabSize().
343
344
345<h2> Using and Installing </h2>
346
347To Compile and Run xmltest:
348
349A Linux Makefile and a Windows Visual C++ .dsw file is provided.
350Simply compile and run. It will write the file demotest.xml to your
351disk and generate output on the screen. It also tests walking the
352DOM by printing out the number of nodes found using different
353techniques.
354
355The Linux makefile is very generic and runs on many systems - it
356is currently tested on mingw and
357MacOSX. You do not need to run 'make depend'. The dependecies have been
358hard coded.
359
360<h3>Windows project file for VC6</h3>
361<ul>
362<li>tinyxml:            tinyxml library, non-STL </li>
363<li>tinyxmlSTL:         tinyxml library, STL </li>
364<li>tinyXmlTest:        test app, non-STL </li>
365<li>tinyXmlTestSTL: test app, STL </li>
366</ul>
367
368<h3>Makefile</h3>
369At the top of the makefile you can set:
370
371PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in
372the makefile.
373
374In the tinyxml directory, type "make clean" then "make". The executable
375file 'xmltest' will be created.
376
377
378
379<h3>To Use in an Application:</h3>
380
381Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your
382project or make file. That's it! It should compile on any reasonably
383compliant C++ system. You do not need to enable exceptions or
384RTTI for TinyXML.
385
386
387<h2> How TinyXML works.  </h2>
388
389An example is probably the best way to go. Take:
390@verbatim
391        <?xml version="1.0" standalone=no>
392        <!-- Our to do list data -->
393        <ToDo>
394                <Item priority="1"> Go to the <bold>Toy store!</bold></Item>
395                <Item priority="2"> Do bills</Item>
396        </ToDo>
397@endverbatim
398
399Its not much of a To Do list, but it will do. To read this file
400(say "demo.xml") you would create a document, and parse it in:
401@verbatim
402        TiXmlDocument doc( "demo.xml" );
403        doc.LoadFile();
404@endverbatim
405
406And its ready to go. Now lets look at some lines and how they
407relate to the DOM.
408
409@verbatim
410<?xml version="1.0" standalone=no>
411@endverbatim
412
413        The first line is a declaration, and gets turned into the
414        TiXmlDeclaration class. It will be the first child of the
415        document node.
416       
417        This is the only directive/special tag parsed by by TinyXML.
418        Generally directive tags are stored in TiXmlUnknown so the
419        commands wont be lost when it is saved back to disk.
420
421@verbatim
422<!-- Our to do list data -->
423@endverbatim
424
425        A comment. Will become a TiXmlComment object.
426
427@verbatim
428<ToDo>
429@endverbatim
430
431        The "ToDo" tag defines a TiXmlElement object. This one does not have
432        any attributes, but does contain 2 other elements.
433
434@verbatim
435<Item priority="1"> 
436@endverbatim
437
438        Creates another TiXmlElement which is a child of the "ToDo" element.
439        This element has 1 attribute, with the name "priority" and the value
440        "1".
441
442@verbatim
443Go to the
444@endverbatim
445
446        A TiXmlText. This is a leaf node and cannot contain other nodes.
447        It is a child of the "Item" TiXmlElement.
448
449@verbatim
450<bold>
451@endverbatim
452
453       
454        Another TiXmlElement, this one a child of the "Item" element.
455
456Etc.
457
458Looking at the entire object tree, you end up with:
459@verbatim
460TiXmlDocument                                   "demo.xml"
461        TiXmlDeclaration                        "version='1.0'" "standalone=no"
462        TiXmlComment                            " Our to do list data"
463        TiXmlElement                            "ToDo"
464                TiXmlElement                    "Item" Attribtutes: priority = 1
465                        TiXmlText                       "Go to the "
466                        TiXmlElement            "bold"
467                                TiXmlText               "Toy store!"
468                TiXmlElement                    "Item" Attributes: priority=2
469                        TiXmlText                       "Do bills"
470@endverbatim
471
472<h2> Documentation </h2>
473
474The documentation is build with Doxygen, using the 'dox'
475configuration file.
476
477<h2> License </h2>
478
479TinyXML is released under the zlib license:
480
481This software is provided 'as-is', without any express or implied
482warranty. In no event will the authors be held liable for any
483damages arising from the use of this software.
484
485Permission is granted to anyone to use this software for any
486purpose, including commercial applications, and to alter it and
487redistribute it freely, subject to the following restrictions:
488
4891. The origin of this software must not be misrepresented; you must
490not claim that you wrote the original software. If you use this
491software in a product, an acknowledgment in the product documentation
492would be appreciated but is not required.
493
4942. Altered source versions must be plainly marked as such, and
495must not be misrepresented as being the original software.
496
4973. This notice may not be removed or altered from any source
498distribution.
499
500<h2> References  </h2>
501
502The World Wide Web Consortium is the definitive standard body for
503XML, and there web pages contain huge amounts of information.
504
505The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
506http://www.w3.org/TR/2004/REC-xml-20040204/</a>
507
508I also recommend "XML Pocket Reference" by Robert Eckstein and published by
509OReilly...the book that got the whole thing started.
510
511<h2> Contributors, Contacts, and a Brief History </h2>
512
513Thanks very much to everyone who sends suggestions, bugs, ideas, and
514encouragement. It all helps, and makes this project fun. A special thanks
515to the contributors on the web pages that keep it lively.
516
517So many people have sent in bugs and ideas, that rather than list here
518we try to give credit due in the "changes.txt" file.
519
520TinyXML was originally written by Lee Thomason. (Often the "I" still
521in the documentation.) Lee reviews changes and releases new versions,
522with the help of Yves Berquin, Andrew Ellerton, and the tinyXml community.
523
524We appreciate your suggestions, and would love to know if you
525use TinyXML. Hopefully you will enjoy it and find it useful.
526Please post questions, comments, file bugs, or contact us at:
527
528www.sourceforge.net/projects/tinyxml
529
530Lee Thomason, Yves Berquin, Andrew Ellerton
531*/
Note: See TracBrowser for help on using the repository browser.