Advertising Information
March 2003
Feature Article


Return Home

XML, Databases, DTD and Publishing:
A Workflow Example
By Bernard Aschwanden, Senior Member, Toronto, Canada


Author bio


First, the good news. If you decide to convert XML content for publishing to the Web, to paper, or to Help, it can be done. It can be done with ease and with little cost.

Now for the bad news. For this to happen, you need to first spend time and money and allocate resources to setting up a publishing solution. The return on that investment is a document set that is faster to create (and to update or maintain), more accurate, simpler to work with, and can be delivered to numerous outputs in very little time.

This article is not about definitions of XML, databases, DTDs, or publishing. Instead, this article is about using tools and technology to make the job of writing simpler. The goal is to give an overview of how you can take XML content (primarily that which is exported from a database) and publish the content quickly and professionally. We explain the key benefits and detail a workflow. We even include a sample of how this can be done by using a very simple Access driven database. Finally, we have a very brief case study of a company that has successfully implemented this type of a system.

Brief descriptions and definitions

For those of you who are already familiar with XML, DTDs, and databases, feel free to skip ahead (which should save you about a 1,000 words of reading). For those of you who want more background on some of the terms and concepts, please read the XML, Database, DTD and Publishing: Terms and Definitions first. Finally, if you want a more detailed look at the ideas presented in this article, please contact the author.

Database publishing with XML

This section comprises the following topics:

Based on descriptions in XML, Database, DTD and Publishing: Terms and Definitions, we can say that the phrase Database Publishing with XML means distributing information from a collection of data so that anyone can use it (viewing alone does not make it useful, we need to expand on use to include reuse, printing, sharing, distribution, and more) on any platform using any tool.

Despite the best efforts of marketing departments, publishing is not a push-button environment. If it were, writers would no longer need to do a thing. Instead, the writer has the very important job of taking data and reviewing it to ensure that it meets corporate standards, that it is phrased to meet the needs of the audience, that it is formatted well, and so on. These tasks can't be completed by clicking a button. However, some content (especially catalogs, lists, and so on) is well suited to automation. Additionally, databases are used more and more to manage documents such as user guides, product manuals, and more. Enter the database publisher.

Overview

With more information in database format, the need to easily extract for writing becomes more urgent. Imagine even a simple database for a product catalog. This catalog may be delivered in print, on a CD, and via the web. In each case, data is organized and delivered differently.

Each of the following sample workflows incorporates a different toolset, but is based on the same source data.

Print delivery

For delivery in print the catalog is converted from the database to an XML format, converted to FrameMaker, and marked up with index entries, cross-references, document numbering, and more. The file is then converted to PDF and delivered to a printing company who creates the paper copy of a catalog. The same source PDF file is also used for delivering content via the Web and on disk.

CD delivery

In addition to a PDF file, the catalog is delivered on CD and installed at a client site. This allows orders placed using a laptop computer to be downloaded via a network connection.

Web delivery

Finally, the same catalog needs to be delivered via the Internet, so clients can use Web-based features to work with the files to find specific parts, order parts, and review support files. Of course, with access to PDF copies of all products, clients can print copies on demand and retain a very professional copy for internal records.

Benefits

The primary benefits of publishing a database using XML are as follows:

  1. Speed
  2. Accuracy
  3. Simplicity
  4. Multiple outputs

1. Speed

Working faster means that the time to market is reduced. Alternatively, the time available for reviews, edits, and corrections is greater. In either case you win.

2. Accuracy

Because content is stored in a database, accuracy is improved. In a database, changing one item impacts numerous others immediately. Modifications to all data can be done quickly and last-second changes impact all components.

In other words, if menu choices should read Select rather than Click, changing the word Click to Select once in the source applies the change to the entire XML output. Similarly, changing the product name from Widget to Gadget can be done in one place and updates impact all content.

The nature of a database eliminates (or dramatically reduces) the need to review content to ensure that consistent phrasing is used.

3. Simplicity

By simplifying the writing process we reduce errors. Simplicity allows writers to focus on getting content correct the first time and reusing it. The simpler a source is to manage, the less frustration a writer has to face. The edit cycle is also simpler. Correcting phrasing in the database ensures that all the content reflects the edit.

4. Multiple outputs

Another benefit of XML is the ability to deliver the same source document to numerous devices and tools. An XML file can be used for a variety of purposes, including, but not limited to, creation of content that is:

  • specific to browsers (or even browser versions)
  • used by publishing tools
  • delivered to handheld devices
  • used by automated reading devices for people with visual impairments

Summary of benefits

We now have a fast way to create content that can be updated and corrected quickly. Content is simple to correct by modifying one source and delivering it to multiple mediums. We can deliver one set of materials with fewer errors (no more updating three sets of files for print, the Web, and PDF) and decrease the time to market.

Workflow

The general workflow for database publishing is as follows:

  1. Develop a database
  2. Create an automated conversion system
  3. Update and export content
  4. Import content into publishing tools
  5. Modify and edit content
  6. Publish and distribute

1. Develop a database

First we need to develop a database and populate it. For this article we assume a database exists and that content already exists (but we know that data constantly changes).

2. Create an automated conversion system

Next we create a system to convert the database to XML. This article does not examine this aspect of the project. Instead, we assume that part of the development cycle is the creation (and extensive testing) of an automated conversion system. Some tools allow you to "save as" XML, while others require a report generator or some other system of writing an XML document.

3. Update and export content

The third step requires the update of the database and export to an XML file. This requires a database (step 1) and a way to create XML very quickly (step 2).

The content that is in our database is either stored as XML (in which case the third step is already addressed) or converted to XML. Of course, the data is constantly updated, so the later we can export it for publishing the better. This gives us very current information and a quick way to distribute it. (Exception: Some databases allow XML content to be imported as well. Therefore, a writer could make edits in a specific application and return the modifications to the databased later.)

Using Microsoft Access XP, for example, we can open a database and directly export to XML. The default use of File > Export may not create the specific XML code we need and, therefore, we may have to manipulate the XML or develop our own custom export. However, for this example, let's assume a clean data export.

4. Import content into publishing tools

These include (in our example) Adobe FrameMaker 7.0 (for creation of PDF document for use online (Web/CD) and as the source for printing) and Internet Explorer. Of course, numerous other tools work with XML.

5. Modify and edit content

The development of the content for final publishing may involve adding a variety of markup such as index entries, additional cross-references, page break adjustments, and tables of contents.

6. Publish and distribute

The publishing and distribution phases require that the content is printed, mailed, burned to disk, uploaded, and so on.

Sample

Now that we have explored the benefits of an XML-based publishing system and reviewed a workflow, let's follow a sample publishing project from start to end.

Remembering our workflow from above, we need to explore the following in this sample:

  1. Develop a database
  2. Create an automated conversion system
  3. Update and export content
  4. Import content into publishing tools
  5. Modify and edit content
  6. Publish and distribute

For the purposes of this sample, let us assume that:

  1. A database has been developed in Microsoft Access.
  2. Code has been written to export the content to XML.

Now we'll follow a sample workflow starting from step 3.

3. Update and export content

Sample content

Our sample product list contains the following two items:

  • Adobe FrameMaker, initially defined as Document publishing tool, with a serial number of fmk and a screenshot of the product.
  • Adobe Acrobat, initially defined as PDF creation and modification tool, with a serial number of acr and a screenshot of the product.
Database content and display

In our database, the records may appear as:

In addition, the same records can be displayed as forms to add visual elements and display content in a format that may be easier for a user to work with.

Database export: Prior to corrections

Finally, the XML export of the content that is currently in the database may appear, in part, as:

<ProductDescription>Document publishing tool</ProductDescription>
<ProductDescription>PDF creation and modification tool</ProductDescription>

Once the content is exported it can be imported into publishing tools. In addition, the data may be updated at a moment's notice up to just before it is published.

Database export: After corrections

Imagine that both the descriptions need to be updated just before publishing. An editor who works in the database decides to change the two descriptions for FrameMaker and Acrobat. Of course, if the phrases are used in numerous areas of the database, the one change is reflected in all instances.

Therefore, the two lines in the XML sample above that relate to these products are changed as soon as a new export, in its entirety may read as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="catalog.xsl"?>
<?xml-stylesheet href="catalog.css" type="text/CSS"?>
<Catalog>
<Products>
<ProductName>Adobe FrameMaker</ProductName>
<ProductDescription>XML smart, enterprise ready</ProductDescription>
<BoxShot src="fm.jpg"/>
</Products>
<Products>
<ProductName>Adobe Acrobat</ProductName>
<ProductDescription>Secure, reliable distribution of business documents</ProductDescription>
<BoxShot src="Acr.jpg"/>
</Products>
</Catalog>

In this example, the XML content exports only the ProductName, the ProductDescription and the BoxShot. All other data is used in the database only.

4. Import content into publishing tools

The content that has been exported can now be opened in numerous tools. Again, in our example we open this in both Internet Explorer and Adobe FrameMaker 7.

Internet Explorer display

The XML file is transformed to an HTML representation by the XSL file and Internet Explorer may display content as:

The specific appearance of the content can be formatted using a variety of methods. The important thing to note is that the same source of content can be reused with other tools and that the update is very quick.

FrameMaker display

When the XML file is imported into FrameMaker, it is instructed to drop the reference to the stylesheets and the transformation sheet (both the CSS and the XSL). In addition, a template is developed with structural rules, and content is simply imported and displayed. In this example, a custom first page is applied to the catalog.

As with Internet Explorer, the specific appearance of the content can be formatted using a variety of methods. Again, the same data source is used for the display in FrameMaker for the creation of PDF or print.

5. Modify and edit content

Once the content is imported, and depending on the rules in place, additional markup can be done. Also, the import into a variety of tools is supplemented through the use of conversion features. This allows rules to be developed that manipulate content prior to the display by a utility (such as FrameMaker) to further facilitate single-source publishing.

6. Publish and distribute

Once content has been successfully imported, modified as required and reviewed, it can be published to numerous formats for distribution very quickly. Of course, because the content resides in a database, all future edits are simple, fast, and correct.

Conclusion

While the goal of single-source publishing using XML and databases is an admirable one, until the tools, the processes, and the people are in place and working together, this is still unachievable. However, when all the factors are brought together and work correctly, the idea becomes a reality. The publication of content from one source is a goal worth working for, if the risks and rewards are evaluated first.

Once implemented, the cost to maintain and update data becomes negligible. The finished result is a publishing solution where content can be updated and delivered in a fraction of the time that it takes at present. Single sourcing content becomes a reality and is only limited by edits required to prepare content for a specific market. No amount of planning or setup can replace the work of the writer, but a proper publishing system does make the job he or she has to do faster, more accurate, and simpler.

If you are interested in reading a case study based on these ideas, feel free to read about Rogers Media and their implementation of a markup-based publishing solution.

Return Home

Feature | Editor's Desk | President's Podium
Chapter Meetings |
Advice | Tech Issues
Humor | Introductions | New Members