Let me ask a rhetorical question: What is metadata?
Here's a really simple answer: Metadata is data about data.
In practice, the term metadata seems to be taking on a meaning of its own. Consider the Dublin Core Metadata Initiative. The elements of the Dublin Core include Creator, Subject, Date, Format, and so on. Are these elements metadata? Or are they attributes?
From a certain perspective, I suppose you could say the Dublin Core elements are data about data. If I were not a programmer -- if I were a journalist, for instance -- then I would probably think of the content of an article as the data and the other information -- when it was published, a brief description, copyright information, and such -- as data about the data, and hence metadata.
However, as a programmer, I know that data also has to be processed, and that metadata contains "data about the data," which provides information related to the processing of that data. To me, the Dublin Core metadata elements are just additional attributes. In the case of a published article, the article has a main body, which is its primary attribute, and the Dublin Core "metadata," which are secondary attributes, but attributes nonetheless. The metadata, on the other hand, includes information such as the MIME type and the text character encoding -- which are all data elements related to the processing of the data.
In short, the term "metadata" is overloaded. One man's data is another man's metadata.
Here are some more examples:
In .NET programming, a compiled assembly contains metadata about the types the assembly defines. I agree with this use of the term metadata.
In the new WinFS file system announced by Microsoft as part of the next version of Windows, stored items contain metadata. The metadata includes attributes similar to the DC metadata: the creation date of the content, the author, the subject, links to related items, and so on. For much of this data, I disagree with the use of the term "metadata." The date that the content was created is not metadata: it is an attribute of the content. If the content is a photograph, the time the picture was taken is not metadata, but the format of the data -- JPEG, BMP, or TIFF -- is metadata.
In an email message, the subject, the list of recipients, the sender, the date, and the subject are not metadata. The information in the MIME header fields is metadata: the content type, the transfer encoding, and the text character encoding.
XML Schema provides metadata about XML content. XML tags themselves provide metadata about the information in an XML document.
Let me rephrase my original question: What is "data about the data"?
Posted by Doug Sauder at November 13, 2003 11:20 PM