|
SerializationTutorial |
serialize
into save/load
ar << data;
ar & data;
An input archive is similar to an input datastream. Data can be loaded from the archive
with either the >> or the & operator.
ar >> data;
ar & data;
When these operators are invoked for primitive data types, the data is simply saved/loaded
to/from the archive. When invoked for class data types, the class
serialize
function is invoked. Each
serialize
function is uses the above operators
to save/load its data members. This process will continue in a recursive manner until
all the data contained in the class is saved/loaded.
serialize
function to save and load class data members.
Included in this library is a program called demo.cpp which illustrates how to use this system. Below we excerpt code from this program to illustrate with the simplest possible case how this library is intended to be used.
#include <fstream>
// include headers that implement a archive in simple text format
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
/////////////////////////////////////////////////////////////
// gps coordinate
//
// illustrates serialization for a simple type
//
class gps_position
{
private:
friend class boost::serialization::access;
// When the class Archive corresponds to an output archive, the
// & operator is defined similar to <<. Likewise, when the class Archive
// is a type of input archive the & operator is defined similar to >>.
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
int main() {
// create and open a character archive for output
std::ofstream ofs("filename");
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::text_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
// ... some time later restore the class instance to its orginal state
gps_position newg;
{
// create and open an archive for input
std::ifstream ifs("filename", std::ios::binary);
boost::archive::text_iarchive ia(ifs);
// read class state from archive
ia >> newg;
// archive and stream closed when destructors are called
}
return 0;
}
For each class to be saved via serialization, there must exist a function to
save all the class members which define the state of the class.
For each class to be loaded via serialization, there must exist a function to
load theese class members in the same sequence as they were saved.
In the above example, these functions are generated by the
template member function serialize
.
The above formulation is intrusive. That is, it requires that classes whose instances are to be serialized be altered. This can be inconvenient in some cases. An equivalent alternative formulation permitted by the system would be:
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
class gps_position
{
public:
int degrees;
int minutes;
float seconds;
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
namespace boost {
namespace serialization {
template<class Archive>
void serialize(Archive & ar, gps_position & g, const unsigned int version)
{
ar & g.degrees;
ar & g.minutes;
ar & g.seconds;
}
} // namespace serialization
} // namespace boost
In this case the generated serialize functions are not members of the
gps_position
class. The two formulations function
in exactly the same way.
The main application of non-intrusive serialization is to permit serialization
to be implemented for classes without changing the class definition.
In order for this to be possible, the class must expose enough information
to reconstruct the class state. In this example, we presumed that the
class had public
members - not a common occurence. Only
classes which expose enough information to save and restore the class
state will be serializable without changing the class definition.
A serializable class with serializable members would look like this:
class bus_stop
{
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & latitude;
ar & longitude;
}
gps_position latitude;
gps_position longitude;
protected:
bus_stop(const gps_position & lat_, const gps_position & long_) :
latitude(lat_), longitude(long_)
{}
public:
bus_stop(){}
// See item # 14 in Effective C++ by Scott Meyers.
// re non-virtual destructors in base classes.
virtual ~bus_stop(){}
};
That is, members of class type are serialized just as members of primitive types are.
Note that saving an instance of the class bus_stop
with one of the archive operators will invoke the
serialize
function which saves
latitude
and
longitude
. Each of these in turn will be saved by invoking
serialize
in the definition of
gps_position
. In this manner the whole
data structure is saved by the application of an archive operator to
just its root item.
Derived classes should include serializations of their base classes.
#include <boost/serialization/base_object.hpp>
class bus_stop_corner : public bus_stop
{
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
// serialize base class information
ar & boost::serialization::base_object<bus_stop>(*this);
ar & street1;
ar & street2;
}
std::string street1;
std::string street2;
virtual std::string description() const
{
return street1 + " and " + street2;
}
public:
bus_stop_corner(){}
bus_stop_corner(const gps_position & lat_, const gps_position & long_,
const std::string & s1_, const std::string & s2_
) :
bus_stop(lat_, long_), street1(s1_), street2(s2_)
{}
};
Note the serialization of the base classes from the derived
class. Do NOT directly call the base class serialize
functions. Doing so might seem to work but will bypass the code
that tracks instances written to storage to eliminate redundancies.
It will also bypass the writing of class version information into
the archive. For this reason, it is advisable to always make member
serialize
functions private. The declaration
friend boost::serialization::access
will grant to the
serialization library access to private member variables and functions.
bus_stop
.
class bus_route
{
friend class boost::serialization::access;
bus_stop * stops[10];
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
int i;
for(i = 0; i < 10; ++i)
ar & stops[i];
}
public:
bus_route(){}
};
Each member of the array stops
will be serialized.
But remember each member is a pointer - so what can this really
mean? The whole object of this serialization is to permit
reconstruction of the original data structures at another place
and time. In order to accomplish this with a pointer, it is
not sufficient to save the value of the pointer, rather the
object it points to must be saved. When the member is later
loaded, a new object has to be created and a new pointer has
to be loaded into the class member.
If the same pointer is serialized more than once, only one instance is be added to the archive. When read back, no data is read back in. The only operation that occurs is for the second pointer is set equal to the first
Note that, in this example, the array consists of polymorphic pointers. That is, each array element point to one of several possible kinds of bus stops. So when the pointer is saved, some sort of class identifier must be saved. When the pointer is loaded, the class identifier must be read and and instance of the corresponding class must be constructed. Finally the data can be loaded to newly created instance of the correct type. As can be seen in demo.cpp, serialization of pointers to derived classes through a base clas pointer may require explicit enumeration of the derived classes to be serialized. This is referred to as "registration" or "export" of derived classes. This requirement and the methods of satisfying it are explained in detail here
All this is accomplished automatically by the serialization library. The above code is all that is necessary to accomplish the saving and loading of objects accessed through pointers.
class bus_route
{
friend class boost::serialization::access;
bus_stop * stops[10];
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & stops;
}
public:
bus_route(){}
};
#include <boost/serialization/list.hpp>
class bus_route
{
friend class boost::serialization::access;
std::list<bus_stop *> stops;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & stops;
}
public:
bus_route(){}
};
Suppose we're satisfied with our bus_route
class, build a program
that uses it and ship the product. Some time later, it's decided
that the program needs enhancement and the bus_route
class is
altered to include the name of the driver of the route. So the
new version looks like:
#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>
class bus_route
{
friend class boost::serialization::access;
std::list<bus_stop *> stops;
std::string driver_name;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & driver_name;
ar & stops;
}
public:
bus_route(){}
};
Great, we're all done. Except... what about people using our application
who now have a bunch of files created under the previous program.
How can these be used with our new program version?
In general, the serialization library stores a version number in the archive for each class serialized. By default this version number is 0. When the archive is loaded, the version number under which it was saved is read. The above code can be altered to exploit this
#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>
#include <boost/serialization/version.hpp>
class bus_route
{
friend class boost::serialization::access;
std::list<bus_stop *> stops;
std::string driver_name;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
// only save/load driver_name for newer archives
if(version > 0)
ar & driver_name;
ar & stops;
}
public:
bus_route(){}
};
BOOST_CLASS_VERSION(bus_route, 1)
By application of versioning to each class, there is no need to
try to maintain a versioning of files. That is, a file version
is the combination of the versions of all its constituent classes.
This system permits programs to be always compatible with archives
created by all previous versions of a program with no more
effort than required by this example.
serialize
into save/load
serialize
function is simple, concise, and guarantees
that class members are saved and loaded in the same sequence
- the key to the serialization system. However, there are cases
where the load and save operations are not as similar as the examples
used here. For example, this could occur with a class that has evolved through
multiple versions. The above class can be reformulated as:
#include <boost/serialization/list.hpp>
#include <boost/serialization/string.hpp>
#include <boost/serialization/version.hpp>
#include <boost/serialization/split_member.hpp>
class bus_route
{
friend class boost::serialization::access;
std::list<bus_stop *> stops;
std::string driver_name;
template<class Archive>
void save(Archive & ar, const unsigned int version) const
{
// note, version is always the latest when saving
ar & driver_name;
ar & stops;
}
template<class Archive>
void load(Archive & ar, const unsigned int version)
{
if(version > 0)
ar & driver_name;
ar & stops;
}
BOOST_SERIALIZATION_SPLIT_MEMBER()
public:
bus_route(){}
};
BOOST_CLASS_VERSION(bus_route, 1)
The macro BOOST_SERIALIZATION_SPLIT_MEMBER()
generates
code which invokes the save
or load
depending on whether the archive is used for saving or loading.
In this tutorial, we have used a particular
archive class - text_oarchive
for saving and
text_iarchive
for loading.
text archives render data as text and are portable across platforms. In addition
to text archives, the library includes archive class for native binary data
and xml formatted data. Interfaces to all archive classes are all identical.
Once serialization has been defined for a class, that class can be serialized to
any type of archive.
If the current set of archive classes doesn't provide the attributes, format, or behavior needed for a particular application, one can either make a new archive class or derive from an existing one. This is described later in the manual.
The astute reader might notice that these examples contain a subtle but important flaw.
They leak memory. The bus stops are created in the
main
function. The bus schedules may refer to these bus stops
any number of times. At the end of the main function after the bus schedules are destroyed,
the bus stops are destroyed. This seems fine. But what about the structure
new_schedule
data item created by the
process of loading from an archive? This contains its own separate set of bus stops
that are not referenced outside of the bus schedule. These won't be destroyed
anywhere in the program - a memory leak.
There are couple of ways of fixing this. One way is to explicitly manage the bus stops.
However, a more robust and transparent is to use
shared_ptr
rather than raw pointers. Along
with serialization implemenations for the Standard Library, the serialization library
includes implementation of serialization for
boost::shared ptr
. Given this, it should be
easy to alter any of these examples to eliminate the memory leak. This is left
as an excercise for the reader.
© Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)