My Corner of the Web - Are Elements and Attributes Interchangeable?

Are Elements and Attributes Interchangeable?

- Michael C. Daconta

In designing markup languages, one of the first questions customers ask is a religious one - “do you prefer elements or attributes?” In fact, if you examine many of the current markup languages on the internet, you often see a strict schism between those that use mostly (or even exclusively) attributes and those that use mostly or exclusively elements. I believe this should not be an either/or proposition because elements and attributes are not interchangeable even though they have similarities. This article will examine why this problem exists, the nature of elements and attributes and a logical set of rules to determine when each is appropriate to use. Table 1 provides a comparison of an element-centric approach to an attribute-centric approach on a canonical ADDRESS BOOK example.

Element-Centric	Attribute-Centric
<ADDRESS-BOOK> <ADDRESS> <STREET> 424 Any St. </STREET> <CITY> Bealeton </CITY> <STATE> VA </STATE> <ZIP> 22712 </ZIP> </ADDRESS> </ADDRESSBOOK>	<ADDRESS-BOOK> <ADDRESS street = “424 Any St.” city = “Bealeton” state = “VA” zip = “22712” /> </ADDRESS-BOOK>

The roots of this problem arise due to a collision of two opposing biases. On one hand, there is a simplicity, consistency and coverage bias towards using elements over attributes; and on the other hand, is a historical bias of DTDs which favor attributes for validation. The bias towards elements begins with the learning sequence. You must learn elements before you learn attributes because attributes are part of an element. Therefore, you cannot have attributes without elements but you can have elements without attributes. This is an important axiom since it is the basis of an element-centric bias - elements don’t need attributes but attributes need elements. In other words, while it is possible to use an element-only approach it is not possible to use an attribute-only approach. This brings us to the second reason for an element-centric bias: consistency. Since a markup language must have elements, it is more consistent to use child elements for characteristics. The final bias, tutorial book coverage, is a direct result of the first two biases. Authors teaching XML lend more space discussing elements than attributes and therefore tend to use an element-centric approach in the majority of their XML examples. As an illustration, Elliotte Rusty Harold states on Page 101 of the XML Bible, “when in doubt put the info in the elements.” Now let’s turn to the other side of the coin: a historical bias to use attributes for limited validation. Attributes can have a primitive type (although the XML 1.0 types are all text-based) and can be constrained to a set of enumerated values. An element with a single text node is not type and has no constraints (i.e. the #PCDATA content model). Thus, historically, for a markup language to enforce validation of instance values, the best choice was attributes. This historical bias is erased with the advent of XML Schema whose strong data typing and constraint facilities apply equally to elements and attributes.

Let us begin with formal definitions from the XML 1.0 specification (second edition) dated 6 October 2000. The definitions are “Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and value.” The specification goes on to say that “An element type declaration constrains the element's content.” And an element’s content is “the text between the start-tag and end-tag”. The above formal specification of an element states that its type has only a single dimension which is membership defined by the content model. On the other hand, attributes correspond to one “of three kinds: a string type, a set of tokenized types, and enumerated types”. What the formal definitions provide us is two distinct definitions of type - an element type which is a content model and an attribute type which can be one of a set of primitive types. The confusion arises by different nature of an element at the root and branch of a tree vice its nature as a leaf node. An element as a leaf node can be one of two varieties: an empty element or an element containing a single text node. Thus we actually have four distinct categories of an element’s type: sub-element content, mixed content (sub-elements and text), empty and text only. The last one in the list overlaps the functionality (though not exactly) of an attribute value of type CDATA. Therein lie the problem, confusion and assumption of interchangeability.

This schism should not exist. Attributes and elements are sufficiently different to develop judgement criteria to determine which to use under different circumstances. I will cover eight design rules for using attributes and elements. First, I will present a brief listing of all eight rules and then explain each rule in detail and provide an example.

1. For containment you must use elements.

2. For characteristics where whitespace may be significant and the value could be a multi-line string or paragraph use elements.

3. For DTDs, To constrain an instance value to an enumerated or primitive type you use an attribute.

4. For repeating identical parts (homogenous aggregation) you should use child elements.

5. For DTDs, to set a default or fixed value you must use an attribute.

6. For common composite characteristics use global attributes.

7. For performance sensitive applications, you can reduce the size of large XML files by using attributes for an object’s characteristics and parts.

8. To reference an object part or characteristic using an IDREF you must use an element.

Now let’s examine each rule in detail.

Rule # 1: For containment you must use elements. How do you determine when you need containment? In examining the pieces of information your document will contain you can separate them into one of three categories: objects, object characteristics, or object parts. The top-level objects are normally obvious, usually nouns, and form the root and top-level branches of your document tree. Examples of objects are “window”, “body”, “borrower”, etc. As in object-oriented programming, many times these are modeled after real-world objects. The distinction between part and characteristic is key. A characteristic is commonly represented via an attribute as it describes a facet of the object and has little or no semantic meaning if isolated from the object. For example, an eXtensible User-interface Language (XUL) window element has a height and width characteristic.

On the other hand, an object part can be semantically separated from its parent object. Therefore, except for stringent and validated performance constraints (see Rule #7), use a child element to express object parts. Fore example a button can be placed inside a window as a part yet be semantically independent. Here is a simple XUL window that contains a button:

</window>

In modeling terms a characteristic relates to its parent object via composition while a part is related via aggregation. While this division of characteristic versus part is a good rule of thumb, the next rule covers an exception.

Rule #2: For characteristics where whitespace may be significant and the value is a multi-line string or several paragraphs, a child element should be used. The reason for this exception is that attribute values are normalized. One of the normalization rules is to replace carriage returns and tabs with a space. For example the value of the description attribute in the following element would be normalized as follows:

<product name = “Teddy Bear” description =

“ Soft cuddly fur. Button nose. 3 feet tall. Double stitching on all seams. Guaranteed to last more than 5 years.

A great gift for children and a favorite among hospitals nationwide.”>

Would be normalized to: “ Soft cuddly fur. Button nose. 3 feet tall. Double stitching on all seams. Guaranteed to last more than 5 years. A great gift for children and a favorite among hospitals nationwide”.

Therefore if the paragraph divisions are useful to the receiving application, a sub-element should be used like this:

Soft cuddly fur. Button nose. 3 feet tall. Double stitching on all seams. Guaranteed to last more than 5 years.

A great gift for children and a favorite among hospitals nationwide.

</description>

</product>

Rule #3 [DTD Specific]: To constrain an instance value to an enumerated or primitive type you use an attribute. For example to constrain a value of a color characteristic to the values “Red”, “Green”, or “Blue”, you would declare an attribute like this:

<!ATTLIST canvas

color (Red | Green | Blue) #REQUIRED >

For Schemas, you would declare a type and attach it to either an element or attribute like this:

<xsd:simpleType name=”RGB”>

<xsd:restriction base="xsd:NMTOKEN">

<xsd:enumeration value="Red"/>

<xsd:enumeration value="Green"/>

<xsd:enumeration value="Blue"/>

</xsd:restriction>

</xsd:simpleType>

Since Schema allows you to declare a type for either an attribute or element, this rule only applies to the use of DTDs. Here is how to declare an element with the RGB type.

<xsd:element name="Color" type="RGB" />

Rule #4: For repeating identical parts (homogeneous aggregation) you must use child elements. The XML 1.0 specification states “No attribute name may appear more than once in the same start-tag or empty-element tag.” Therefore, the only possible way to have multiple attributes with the same semantic meaning would be to differentiate them with a one-up number like (button, button1, button2, etc.). Attaching a count to an attribute name is extremely poor design on the order of using a wrench to pound nails. Therefore, let’s examine a sample of using child elements for repeating parts. A XUL menupopup may contain any number of menuitems like this:

</menupopup>

Rule #5 [DTD-Specific]: To set a default value or a fixed value you must use an attribute. The attribute declaration allows you to specify a default or fixed value for the attribute. For example, let’s add a default value to our previous color example (Rule #3):

<!ATTLIST canvas

color ( Red | Green | Blue ) “Red” >

The above example sets the color Red as the default value. Specifying a default value with Schemas is done using the default attribute (this is used for both defaulting elements and attributes):

<xsd:element name="Color" type="RGB" default=”Red” />

Rule #6: Common composite characteristics should be expressed should be expressed with global attributes. In a DTD, you express a global attribute with a parameter entity reference like this:

<!ENTITY % ename “name CDATA #REQUIRED>

In a Schema you create a base type that you extend. Here is an example of extending a type from the Schema Specification Part 0 - Primer:

</sequence>

</complexType>

</sequence>

</extension>

</complexContent>

</complexType>

Once again, since Schema’s type system applies to both elements and attributes this rule only applies to DTDs.

Rule #7 [performance]: To reduce the size of large XML files use attributes for both characteristics and parts. Attributes are more space efficient than elements since they do not require end tags. A note of caution here - do not assume your application needs high performance. The use of an attribute-centric approach is a tradeoff of speed for flexibility.

Rule #8: To reference a part via an IDREF you must use an element. An IDREF refers to an element with an attribute of type ID. When to use IDREFs versus containment will be the focus of a later article. Besides IDREFs there are many XPATH expressions for selecting elements. Considering Moore’s law, it is often better to choose flexibility over performance.

Here is a simple example of an IDREFS attribute referring to elements:

</Student>

</Students>

</Grades>

</Semester>

In conclusion, elements and attributes are not interchangeable. Ignore the bias of some and the pressure to choose one or the other; instead, use the guidelines above to determine which is suitable for your markup language.