Author Archive

Topic-based authoring

November 3, 2011

Topic-based Authoring is an approach to developing end-user documentation that explicitly identifies and uses topics as a fundamental organising principle.

It is a set of practices, processes, tools, and an organising conceptual framework.

Topics are present in all human communication, but they are often implicit and not utilized as a formal organisational principle.

Contrast “writing to make the sentences and paragraphs clearer“, on one hand, with “writing to make the thinking clearer“, on the other.

The first – writing to make the sentences and paragraphs clearer – involves interaction with words and sentences, which can be explicitly identified and whose relationships can be formally described.

The second – writing to make the thinking clearer – involves interaction with topics.

When topics are implicit there are no terms to describe topics, and no vocabulary to describe their interaction and relationship. Authors can intuitively deal with and structure information without an explicit vocabulary for topics, but capturing and communicating practices and standards, and creating processes around those, is impossible.

Topics become explicit first when they are a participant in the universe of discourse of authors. This can be thought of as weakly topic-based authoring. In this stage authors know what topics are, can see them in existing documentation, and may even use them as an organising principle while planning and writing. However, there is no specialised tooling support for topics, and the topic does not have an existence beyond an organising principle in the mind of the author.

Topics become completely explicit when they become first-class citizens, with a formal existence and participation in documentation workflow from planning to implementation as an organising principle with support from tooling. This is strongly topic-based authoring. In this stage topics are both used as an organising principle, and also have a physical existence in the workflow and toolchain.

Topic-based authoring (either weak or strong) produces the traditional output media expected by users, such as books, articles and help files. However, it can also produce enhanced output such as dynamic websites.

Topic-based authoring is used by companies to aid in information planning and project management, increase automation, decrease maintenance costs, and increase reuse (including multiple output structures from the same content).

The Tao of Topics Part 1

October 29, 2011

It’s the 21st century.

In contrast to the vast majority of history, today a significant percentage of humans can read and write. In fact, statistically, the odds of you being one of those humans is 99.9%.

Everyone who reads this blog, can read. I’d wager that everyone who can read can also write, but not everyone who can write can necessarily write well.

What does it mean to write well? From a technical writing perspective, it means primarily to write content that is clear, unambiguous, comprehensive, and coherent.

It must be clear – necessary information should be discoverable and not embedded in an “unlikely place”; it must be unambiguous – it is worded in such a way that reduces uncertainty to the greatest extent possible; it must be comprehensive – it contains all the information needed for the user to generate the complete solution to the problem provided by the technology; and it must be coherent – users should consistently use the same patterns to use the documentation.

Well-written content might also be engaging, inspiring, or entertaining, but for technical writing these are secondary. The goal of technical writing is to reduce uncertainty in users of a technology. If your documentation is entertaining, but ambiguous, or it is inspiring but has massive gaps in it that leave the user confused, then it’s not well-written from a technical writing perspective. Better flat, dry prose that gets the job done, than inspiring, entertaining prose that doesn’t!


Handwriting is so 19th century, but most of this blog’s readers still know how to do it (you are totally next-gen if you skipped it to go straight to texting). To have a conversation about “writing clearly” from the perspective of handwriting we need to talk about letters. If you don’t know what a letter is, then we have to start with understanding that before we can move on to how to write a letter clearly.

Incidentally, having people write their email address on a survey form or email sign-up list has been a bane of my existence for years. People don’t seem to realise that if I can’t read it, it’s useless. The scribbles that they put down look more like mnemonic aids than actual communication. They certainly don’t reduce my uncertainty!

When it comes to writing clear prose using a keyboard, you don’t need to focus on the formation of the letters, because the machine takes care of that for you. Then come words, where you have to spell them in such a way that others can understand them.

Text speak

Once you get the words right, we’re looking at sentences.

To write well-formed, clear, and unambiguous sentences, you need to understand what nouns, verbs, adjectives, adverbs, and prepositions are. You can write without knowing what these are, and many people do. Some people even manage a passable level of written communication without knowing this explicitly, but to have a conversation about “how to write clearly“, you need to be able to talk about what it is you are writing.

If I say: “adverbs generally weaken a sentence“, if you know what an adverb is you’ll see how “adverbs weaken a sentence” is both a test of the idea, and a demonstration of it. You’ll also be able to take the idea and apply it to your own writing to eliminate unnecessary adverbs to tighten it up and increase clarity.

My argument is that you can write without explicitly understanding the elements of writing, but writing well, and participating in and benefiting from a conversation about improving writing, requires an explicit understanding of the elements of writing.

Beyond letters with their shape and spatial relationships in writing, we have words, which aggregate these inferior units to form higher-order units of meaning. Beyond words we have sentences with their parts of speech, syntax, and grammar. At all of these stages we want to make sure that everything that needs to be there is, whatever doesn’t need to be there isn’t there, and whatever is there is clear.


Beyond letters, beyond words, beyond sentences in technical communication are topics. Technical communciation is the art and science of communicating useful information about systems. Words are the atoms of communication, with sentences as the molecules. The atoms of “useful information” are topics.

Topics are written representations of elements of the mental models that we use to interact with the real world. An expert user of a system has an internal model of the system that she uses to make predictions about how a system will act, and how a system will react. An accurate, complete mental model allows her to accurately make predictions and influence the system to achieve her desired outcomes (or know if this is not in fact possible).

An inaccurate or incomplete model is the cause of uncertainty. Technical communication aims to reduce uncertainty. As users consume technical communication they enhance and refine their internal model. The role of the technical communicator is in many cases to explicate the internal model of a subject matter expert, convert it into transmissible “chunks”, and deliver it to a user, who can then internalise the chunks, reconstruct the model, and rock out like an expert!

Topics are atomic chunks of mental models.

Just as all food can be analyzed in terms of its protein, fat, and carbohydrate content, all technical information can be divided into similar “macronutrient” groups. Both our digestive system and our brain are systems designed to interact with the world, so they have systems that reflect broad categories in the environment.

Just as the food we digest can be divided into macronutrient groups, the information our brain digests can also be divided into macronutrient groups.

Topics are exactly that – ontological categories that exist in the environment, which lead to an organ that processes these categories differently (the brain). Different topic types correspond to different physical mechanisms in the brain.

When we want to have a conversation about writing clearly we can then tackle it at three different levels: writing the glyphs clearly (taken care of by the machines now), writing sentences that are clear and unambiguous, and providing all necessary information in digestible chunks.

Just as some mixtures of food can give you indigestion, some mixtures of topics are indigestible. Think about this: it’s counter-productive to give someone elements of a mental model that rely on other underpinnings before they have those underpinnings. As an example, it’s pointless to talk about writing clear sentences to someone who doesn’t know how to write!

Let’s just skip back to those three levels of writing well:

  • writing the glyphs clearly
  • writing sentences that are clear and unambiguous
  • and providing all necessary information in digestible chunks

We have a progression here from letters / words, to sentences, to topics. Sentences are aggregations of words according to a defined structure that gives rise to sense. Topics are aggregations of sentences according to a defined structure that give rise to sense.

Let’s look at the last point in a little more detail:

Having a conversation about topics allows us to talk about how we “provide all the necessary information in digestible chunks“.

What is the necessary information? If you’re familiar with technical writing then you know that this depends on the audience. Think about topics as pieces of Lego. The mental model in the mind of the expert user is a fully assembled construction. To communicate this you can deconstruct it into its constituent pieces, and then deliver those pieces to the user, along with the assembly instructions.

Lego kit

A well-executed structured approach like this enables a user with a partially constructed internal model to quickly identify, locate, and consume the missing pieces, which are available as atomic units.

Rather than having to wade through monolithic blocks of interleaved topics, a user can quickly identify and locate the piece that they are missing.

Organising the information in topics makes it possible to provide multiple methods of locating specifc information, which I’ll discuss in more depth in a subsequent post dedicated to the topic. For now, suffice it to say that as an atomic unit each topic has a surface area, and surface area is discoverable. Think of the difference between sifting through a box of individual Lego pieces for the 3×2 green block, versus sifting through a collection of randomly joined pieces looking for it.

That’s not to say that approaching information as topics is a reductionist approach that somehow does away with top-down views, progressive disclosure, or overarching narratives – any more than considering sentences in terms of the parts of speech does away with prose.

What an understanding of topics gives us, among other things, is the ability to have a conversation about improving the information content: its coverage, clarity, and coherency for users. And that’s always a good thing.

Stay tuned for part two, where we’ll look at the “macronutrient groups” of topics – Topic Types.

Docbook 5.1 and topic-based authoring

June 4, 2011

Docbook 5.1 adds support for topic-based authoring. It’s studiedly neutral in its approach to topic-based authoring, in contrast with DITA’s almost religious zeal for the historical inevitability of the Topicalypse.

Without naming any names, the Docbook Definitive Guide V1.3 states:

One modern school of thought on technical documentation stresses the development of independent units of documentation, often called topics, rather than a single narrative.

It’s clear from this that the Docbook Technical Committee did not arrive at the topic epiphany by contemplating their navel in isolation, or by a blinding flash of light on the road to Damascus. And while no names are mentioned in this introduction to a pragmatic concession to topic-based authoring, Docbook 5.1 incorporates most, if not all of DITA’s features, while maintaining Docbook’s core strength of static linear narrative targeting a book-like output medium.

One thing that really screams: “Come to the mountain!” is the ability to build Docbook 5.1 topic-based output incorporating DITA topics. Just-in-time xsl processing (<transform>) is used to incorporate existing DITA topics in Docbook 5.1 topic assemblies, allowing you to come back to the fold without an expensive conversion of existing content.

Docbook 5.1 lacks the topic-based purity of DITA, implementing a dual (but isolated) static linear narrative / modular topic model, in contrast to DITA’s focus on topics uber alles, and adopting a pragmatic, descriptive approach (“hey, looks like people want to do this“), rather than a religious, prescriptive approach (“this is the one true way!“).

However, it looks to have learned from DITA, including DITA’s shortcomings, and brings its rich semantic model to the table, as well as its installed base of expertise and tooling.

A pragmatic approach is the approach that organisations need to take when moving from a pure static linear narrative approach to a modular topic-based one, so Docbook’s pragmatic compromise, along with its tooling maturity, may sit well with many organisations moving in that direction.

Syntext Serna 4.2 and DITA support

May 26, 2010

Oh, one more thing while I’m at it:

There seems to be a problem with the DITA 1.1 templates included with Syntext Serna 4.2 (the Open Source version, of course). The xml generated by the template contains both xsd and dtd declarations.

For example:

<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd" []>
<concept xmlns:xsi="" id="concept-1" xsi:noNamespaceSchemaLocation="urnasis:names:tc: dita:xsd:concept.xsd:1.1">

You can see there a Doctype declaration, which invokes the DTD for validation, followed by an xsd declaration as an xmlns attribute of the concept element.

The problem with this is that either xsd or dtd should be used as the xml schema, not both. If both are present Xerces will attempt to do both types of validation. When it does DTD validation it will fail, because an xsd declaration is not part of the DTD.

Putting both types of validation into the template looks like a bug. If you create a new DITA 1.1 Concept from the template, and then Publish > HTML, it produces the error:

[pipeline] Using XERCES.
[pipeline] [Error] :4:155: Attribute "xmlns:xsi" must be declared for element type "concept".
[pipeline] [Error] :4:155: Attribute "xsi:noNamespaceSchemaLocation" must be declared for element type "concept".

My workaround has been to edit the template files in


and remove the xsd declaration. DITA topics created with the templates then validate fine using DTD validation.

DITA-OT 1.5 and Java version

May 26, 2010

I’ve been doing some work with the DITA Open Toolkit 1.5 lately, and got no love from Google on this issue, so here’s my contribution to anyone else who finds it:

If you are trying to build and get an error like:

DITA-OT1.5/build_preprocess.xml:80: java.lang.StringIndexOutOfBoundsException

Then check the version of your Java. I got this error with Java 1.5, and resolved it by switching to Java 1.6.

I’ll have something more to post about DITA soon, but I just thought I’d get that out there for anyone else who is scouring Google in vain.

Wanted: Schedule Monkey

February 24, 2010

If you have any open source writing or writing-related opportunities, let us know and we’ll pimp them here.

Love technology? Love a challenge? We have the job for you.

Red Hat, the defining technology company of the 21st century, continues to expand its team to create compelling open source products.

We need someone to monkey around with our schedules. We have a bunch of schedules for our writers, and we need to merge them into one uber-schedule. We’re working with open source tools, so we need someone who isn’t scared of TaskJuggler.

You don’t need millions of years in the industry – you just need to be able to learn new tricks.

This is a part-time job is based in Brisbane, Australia. Check it out on

Red Hat Enterprise Linux 5.4 Technical Notes – “Every Change to Every Package”

September 2, 2009

We’re getting ready to push the new “Technical Notes” document to for Red Hat Enterprise Linux 5.4.

The Technical Notes document, new for this release, contains errata documentation for every single change to every single package between Red Hat Enterprise Linux 5.3 and 5.4. In the six months between releases there have been more than 2000 changes to more than 250 packages; and every one of them has been documented for this release in a document that, at 500 pages, is the length of a short fantasy novel.

This document has been the work of a number of authors, led by Ryan Lerch, the Technical Notes author, and Brian Forte, the Red Hat errata queue maintainer. It’s also involved the collaboration of engineers throughout Red Hat, processes, and automation.

Update: The Technical Notes are now live on You can view them here.

Dejargonize your documentation

July 1, 2009

I recently attended a product demo given by an Apple representative. It was held in the local music store and covered the latest versions of Garage Band and Logic Pro.

During the demonstration the rep showed how Garage Band can adjust the timing of recorded audio tracks, such as a live drum take recorded using a microphone. Adjusting the timing of instrument tracks recorded using MIDI (Musical Instrument Digital Interface) is old hat. It’s known as “quantization”. However, the ability to adjust the timing of an audio track is a novel development. He stressed that it could only be done to audio tracks that are recorded with Garageband, and not to audio tracks that are recorded elsewhere and imported into Garageband.

I asked him: “Does Garageband store some kind of metadata for audio files that it records?” He replied: “No, it stores additional information along with the sound file.” Then he paused, and said: “…which is pretty much the definition of metadata“.

It was interesting to me to see the contrast in communication style. Garageband is designed, as he explained: “for people who know nothing about making music“. As such it avoids using the jargon regularly employed by those familiar with music-making technology. Quantization becomes adjust timing. Metadata becomes additional information.

Sometimes a concept benefits from a precise technical term, sometimes it just serves to make the material harder to understand for someone unfamiliar with it.

As someone who knows what quantization and metadata are, I had no problem understanding what he was talking about when he talked about adjusting timing and storing additional information. The reverse is not true: someone who can grok* adjusting timing and storing additional information may be left completely in the dark when the terms quantization and metadata are used. It’s not that the subject matter has changed and is any harder to understand, but the use of unfamiliar terms reduces comprehensibility by raising the bar for the audience.

Glossaries can help, and so can really thinking about the choice of words: “Can I say this in a more direct, simple way, without using “jargon”?”

Something to keep in mind.

* to grok = to understand

Neologisms and Localization

June 25, 2009

One of my fellow writers tweets the gems she uncovers while editing docs, marking them with the hashtag #docfail. (I leave it as an exercise for readers to track her down and stalk her if they are so inclined).

A recent tweet read:

#docfail “Parameterized”. 😦 Sadly this is an official term.

“Parameterized” is actually not a neologism, one of the subjects of this post. According to the Merriam-Webster dictionary entry for parameterize, it’s been in the authoritative (according to the Merriam-Webster) english lexicon since 1940.

A “neologism”, a term that entered into the English language in 1803, again according to Merriam-Webster, is “a new word, usage, or expression“. New technologies give rise to new terms, obviously, so information technology is a major source of contemporary neologisms.

Since developers develop new and innovative technologies and ways of doing things, they routinely coin a new word to describe a novel method or application. An excessive proliferation of neologisms by developers can make them begin to resemble the second definition that Merriam-Webster gives for “neologism”: “a meaningless word coined by a psychotic” (at least to people tasked with translating them).

Neologisms pose particular challenges for technical documentation, especially when a document is translated (localized) into languages other than the language in which it was originally written (mostly, and for the purpose of illustration in this post, English).

Often an reader of a technical document can infer the meaning of a neologism from its context; because it is a compound of previously existing words; or because it is a novel transformation of an previously existing technical term.

“Parameterize” is a classic example of the “turn a noun into a verb” method of neologism generation that is favored by another goldmine of contemporary neologisms – business-speak. “Aspectize” and “Annotationed” are two examples of taking a specific technical definition of a common English noun, turning it into a verb, and then going postal with it.

While English readers can infer or deduce the meaning of these words, translating them into another language is problematic. To do it properly a technical translator will have to accomplish the following:

  1. Find out if this neologism already exists in the target language. This involves researching the subject area by reading related existing documentation in the target language (if there is any), or trawling through message boards and mailing lists to see if people are talking about this, and if so, what terms they are using.
  2. If a term does not exist, the technical translator must coin a term in the target language. To do this they have to understand both the intended meaning of the term, and the already existing terms in the target language. Will the translated term by generated through a similar process of grammatical Frankensteinization in the target language, or will it be a modification of another already existing native term?

This process is repeated for every target language. When a technical document is localized into 26 different languages, as Red Hat Enterprise Linux documentation is, that adds up to a whole lot of friction – costing time and money.

A recent example I observed: last night I watched the opening of the 2007 movie “Transformers: The Beginning” subtitled in Spanish. The translators of the movie opted to use the term “La Matriz”, a term which carries the sense of “The Original (Source | Form)” (or literal: “The Matrix”), as their translation for “The All Spark”. The “All Spark” is an esoteric item at the center of the battle between the Decepticons and the Autobots. Interestingly, while the “All Spark” is a neologism in English, its equivalent term in Spanish “La Matriz” is not. If the translators were to translate it literally as “La Chispa de Todo” (“The Spark of Everything”) it would be an unfamiliar term in Spanish, when it doesn’t have to be. Sure, in English the name conveys that it’s an esoteric item, but to convey the sense of what it is in Spanish does not require the invention of a new term. Sometimes a neologism doesn’t have a need to exist beyond satisfying a developer’s desire to underscore that they are doing something COMPLETELY NEW!!!!!111

Neologisms also come into use as a form of short hand. As new technologies are constructed by aggregating previous technologies, the complex aggregate then becomes one of the building blocks for something else. To reduce complexity, new terms are coined to refer to these complex structures. A Central Processing Unit becomes a CPU. The whole CPU, hard disk, monitor, plus input devices becomes a computer. A bunch of computers becomes a cluster, and so on. Especially in the software world, which is all about the rapid aggregation of complex elements, these ever-more-encompassing terms appear frequently and regularly. In helping us deal with increasing complexity by encapsulating it in linguistic terms, neologisms serve an important purpose.

When editing we try to reduce the vocabulary of technical documentation as far as possible, running it through the lexical equivalent of an mastering audio compressor. Whenever and wherever possible we replace unnecessary neologisms with “plain English” to clarify the meaning and assist translation.

Technical writing is not about creativity – it’s about communicating information as efficiently as possible.

We need to be wary of the human tendency to create a new priesthood of the elite that distinguishes itself by an incomprehensible dialect. Sure it’s always cool to belong to a group that converses in a form of “leet-speak”, but if the goal is to be understood, then in documentation it’s important to relate the unknown to the known. When neologisms do appear in a document they benefit from explanation, or from the inclusion of a glossary. Always think of the audience.

And developers – please think twice before coining yet another new word to go with your technological innovation. Is it really needed? Can you explain it in plain English? Does a new term reduce complexity more than it increases it?

In the beginning…

June 23, 2009

In the beginning was the word… and it was so wrong. It was in the passive voice, so I had to rewrite it.

It had been written by a developer^H^H^H^H^H^H^H I mean, a developer wrote it. A lovely chap, and a brilliant software engineer – but more suited to writing code than documentation. His documentation was more notes than finished product, and that’s fine and to be expected. It’s my job to take these notes from developers and turn them into something easily digested by users. I’m a technical writer.

Passive is weak. Active is powerful. Empower your writing, and your readers, by using the active voice.

In a passive voice construction something is done to something. If an actor does make an appearance, it does so attached to the construction through a clause. For example: “It had been written” is a passive construction. The actor, in this case a developer, is indicated by the clause “by a developer“. However, the sentence is grammatically correct and complete without the presence of the actor: “It had been written“. Sounds very epic, doesn’t it? Which may be why it is a writing style favored by academia, and drilled into students in universities around the world.

A friend of mine, currently completing his doctorate in Psychology, explained to me that in the rarefied academic atmosphere he moves in the passive voice “is seen as being more objective” – a passive construction if ever I saw one. After some thought, I realized that the relative objectivity of the passive construction is illusory in nature. The passive voice is not objective, it merely obscures its subjectivity by omitting the subject.

This is a problem when you’re writing user documentation and the subject of your writing is the user.

Documentation needs to be served fresh, hot, and ready to eat, steaming on the plate. Nobody reads the manual until they need to, right? Frequently, when a user picks up the manual they are already facing a situation of overwhelming complexity. If they have to then chew the documentation until it’s digestible, they are going to get indigestion before they can satisfy their intellectual hunger, or maybe they’ll starve to death first (am I taking this metaphor too far?). The likely result is that in the future they will eschew the manual.

The point is that without anchoring the user in the material by using the “strong language” of the active indicative voice (“after you do this” vs “after this is done“) readers can be lost at sea: “I was lost and confused before I picked up the manual, now I definitely have no idea where I am“. The manual is a map; it is going to lead the user from their lost predicament to the other side of the woods – use the active voice to give them the reassuring message: “You are here.”

Remember: Active voice rocks.