Introduction to DITA (Darwin Information Typing Architecture)

intro to DITA header (1).png

Download print version.

What is DITA? The Darwin Information Typing Architecture is an open-sourced, structured information architecture. It provides a neutral, modular content source for almost any type of content.

Why is the standard named DITA?

The Darwin in DITA refers to the premier scientist Charles Darwin in that it provides a mechanism for extension or evolution to meet your content needs.

Information Typing indicates that the architecture identifies the type of information in each module or topic. This means that typed topics define the structure of the content based upon its purpose. For example, an assessment topic is designed to test a learner’s knowledge, and therefore contains a question. Another example is a task that leads the reader through the actions required to complete a procedure.

DITA is an Architecture. It’s an XML framework that provides mechanisms for organizing content. I want to be very clear: DITA is not a tool, nor is it an application – it is an architecture, and that means that there are many different ways to use it and there are many different tools that support it. I have helped implement DITA for a diverse range of organizations including educational companies, publishers, technology companies, manufacturing companies, financial services companies, medical device companies, and many others. Enterprises worldwide implement XML as a content source because they share the need to adapt to the everchanging types of deliverables that are required in today’s information climate.

DITA graphic.png

Why use XML for content?

Rather than applying styles to text or phrases to indicate that content’s function within the document, XML allows you to semantically identify content at an elemental level. This allows you to apply style and formatting of text for each output deliverable. For example, if you have a reference to another document that you want to be presented in print as italics in English, the transform that generates the output can format the citation in italics. You could take that same topic and process it with a different transform, maybe for online content, that instead presents that same text in quotation marks. There is no difference in the source because, rather than applying the style format in the deliverable that you would then have to change for the different output types, you instead identify what it is semantically.

Another good example of the agility of XML is a glossary term, which would be identified as a term and perhaps in a printed glossary be presented in bold with a colon and only the definition. The same term could be presented on a printed flashcard or a mobile app with different fonts and styling to create a totally different experience for the learner — one that is optimized for the deliverable output without any changes to the source. That is what makes DITA a very powerful content source. It allows you to apply the formatting to the generation of the deliverable rather than requiring you to have separately managed and formatted instances of the same content.

Is DITA only for professional or technical writers?

No. Because all the formatting is applied based upon the element names, you can customize the authoring experience as much as you can control the output generation. For example, an organization may have different types of authors with varying levels of technical sophistication. Because DITA is an architecture, you could have one authoring environment that the content contributor might use and another for more technical authors. Let’s say you were doing a graduate-level standardized test support, you might hire a chemist to write your chemistry information. They may not be familiar with DITA, and they don’t have to be. You can provide a form-based authoring experience that helps them provide you the content source, then you open that same content in another DITA tool and have a different authoring experience. That’s one of the benefits of being in an architecture instead of tool-specific support.

Are there other structured architectures?

DITA is one of many XML standards. Other standards include QTI, DocBook, S1000D— all of which have unique strengths and uses. DocBook, for example, is designed for books and as such is very book-oriented. Most likely, for enterprises that are going to be producing more than just books, I would recommend DITA over DocBook for its modular structure. QTI is the interoperability format used for learning management systems, so it’s a great output format, but not a great source format for non-assessment content.

What information types are in DITA?

In DITA 1.3, there are several supported information types: Concept, Task, Reference, Troubleshooting, Glossary, and Topic are the main topic types. There is also a specialized set of topics for learning and training content, including Learning Objective, Learning Summary, Learning Assessment, Learning Content, and Learning Plan. Each new version of DITA includes more support as users request additional types of topics.

purpose of DITA quote.png

What if DITA doesn’t have content structures for the types of content you need to create?

Remember, the “Darwin” in DITA refers to the premier scientist of evolutionary theory because it can evolve to meet your needs. The term used for this is specialization. What specialization allows you to do is extend the base architecture to meet your content needs. You can do this to create a single element; for instance, I’ve had publishing teams that need a call-out or side bar that was not part of the original content model, so we simply extended the model to enable those items to appear in the margin or in the body of the text with different formatting. I’ve worked with teams who need to have a domain-specific framework; they want to be able to adapt their content, not just for their main usage but also for Learning & Training; there is a set of specializations for this. I also see people do industry-specific extension, such as insurance. There is a whole set of nomenclature in the insurance industry that, of course, is not part of the out-of-the-box standard. We simply extended the existing nomenclature so that there are semantic identifiers for items like policy numbers and all of the information that goes with supporting HIPAA privacy requirements. They needed to have advanced processing against that information, so we were able to semantically identify it through an extension of the standard.

How big are topics?

Because topics are discrete units of information that cover a specific subject or idea, the amount of content in a topic depends upon your content and the structure. I see some folks rigorously segregate their content into smaller units because they want a lot of flexibility, and other folks who keep their content in larger units until they have a business need to parse it into smaller units. For books, topics usually map to heading levels. For flashcards, each card would be a separate topic. For glossary, each term and all of its associated information would be defined as a single topic. Because you are defining the source and not the deliverable, your glossary topic would have in it all the information about the glossary term and the transform would indicate what subsets of the element you would present in any given deliverable. So, you could generate a book with an alphabetized list of abbreviations and terms from the same glossary topics from which you generate your flashcards. The front of the card would include the term, pronunciation, and the part of speech, and on the back would have the definition, the usage, and an image. It’s all in the architecture.

How do you make deliverables from the separate topics?

Basically, you organize the topics into collections using a file called a map. Maps provide structure and a hierarchy for each of the deliverables. In some cases, I see folks reuse the same map for multiple deliverables. For instance, if you have a book, you may have a map that organizes the entire book and then you may have sub-maps that organize a part or chapter of the book. If you want to generate the exact same content for a printed book and, perhaps, its corresponding online deliverable of a PDF or an e-pub, you could use the exact same map (and content) and simply run a different transform to generate a different deliverable.

output_html_pdf_webhelp_epub.png

Are there different types of maps?

Yes, but the map types correspond to the structure of the collection or deliverable and not to a specific deliverable or document type. The core map types are map to organize content collections for reuse or deliverables and bookmap to organize content into collections for producing a book or book-like deliverable. The bookmap includes book metadata, such as ISBN and copyright, and it has the structural element such as a parts, notices, appendices, and chapters. All of these are semantically identified so that your transforms can process them appropriately for each of your deliverable outputs.

There are other maps for specific purposes, including learning maps to organize learning content into collections for units, lessons, or courses and subject scheme and classification maps to define and manage metadata. As with topics, if you need a map for a specific purpose, you can specialize the base map to meet your needs.

Is it easy to reuse content with DITA?

Yes! DITA supports reusing content in multiple ways. The most obvious way is to reuse a map or reuse a topic in multiple maps. If you want to reuse content within a topic in other topics, DITA provides several methods, including replaceable text or variable support with key references, and condition or profiling support. Most teams use a combination of methods and determine the methods to use when they develop their information architecture.

topic_map_reuse.png

Summary

if your content is not created and stored in XML, you are locked into the format in which you initially created your content and that is usually tied to a specific delivery platform. This means that to produce an integrated set of deliverables for distribution across a range of media, you must convert your content multiple times using a process that, I can tell you from experience, is fraught with technical and logistical challenges. As a third point, this also becomes cost-prohibitive. However, if you make the upfront financial investment to convert to XML and design your transforms, you can run your content through this system and deliver what your users need when they need it much more efficiently. In addition, you will have future-enabled your content for deliverables that we can’t even foresee at this point.

Interested in learning more? Book an information architecture coaching package with DITA expert, Amber Swope.

Share on LinkedIn Share on Twitter
Previous
Previous

Information Architecture Glossary

Next
Next

Everyday IA Episode 6: The Deliverable Structure Diagram