Your continued donations keep Wikipedia running!    

Metadata

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Metadata (Greek: meta- + Latin: data "information"), literally "data about data", are information about another set of data. A common example is a library catalog card, which contains data about the contents and location of a book: They are data about the data in the book referred to by the card. Other common contents of metadata include the source or author of the described dataset, how it should be accessed, and its limitations. Another important type of data about data is the link or relationship between data. Some metadata schemes attempt to embrace this concept, such as the Dublin Core element link.

Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata." Machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.

Metadata that are embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data.

Contents

Uses

Metadata have become important on the World Wide Web because of the need to find useful information from the mass of information available. Manually-created metadata add value because it ensures consistency. If one webpage about a topic contains a word or phrase, then all webpages about that topic should contain that same word. It also ensures variety, so that if one topic has two names, each of these names will be used. For example, an article about Sports Utility Vehicles would also be given the metadata keywords ‘4 wheel drives’, ‘4WDs’ and ‘four wheel drives’, as this is how they are known in some countries.

Examples of metadata for an audio CD include the MusicBrainz project, and AMG's All Music Guide. Similarly, MP3 files have metadata tags in a format called ID3.

Metadata is more properly called ontology or schema when it is structured into a hierarchical arrangement. Both terms describe “what exists” for some purpose or to enable some action. For instance, the arrangement of subject headings in a library catalog serves as not only a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects “exist” in the library’s own ontology and how more specialized topics are related to or derived from the more general subject headings.

Metadata is frequently stored in a central location and used to help organizations standardize their data. This information is typically stored in a Metadata Registry.

Types

Relational database metadata

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:

  • Tables of all tables in database, their names, sizes and number of rows in each table.
  • Tables of columns in each database, what tables they are used in, and the type of data stored in each column.

In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata.

Data warehouse metadata

Data warehouse metadata systems are sometimes seperated into two sections:

  1. back room metadata that are used for Extract, transform, load functions to get OLTP data into a data warehouse
  2. front room metadata that are used to label screens and create reports

Kimball1 lists the following types of metadata in a data warehouse (See also [1]):

File system metadata

Nearly all file systems keep metadata about files out-of-band. Some systems keep metadata in directory entries; others in specialized structure like inodes or even in the name of a file. Metadata can range from simple timestamps, mode bits, and other special-purpose information used by the implementation itself, to icons and free-text comments, to arbitrary attribute-value pairs.

With more complex and open-ended metadata, it becomes useful to search for files based on the metadata contents. The Unix find utility was an early example, although inefficient when scanning hundreds of thousands of files on a modern computer system. Apple Computer's current version of its Mac OS X operating system (Tiger) supports cataloging and searching for file metadata through a feature known as Spotlight. Microsoft Windows (Vista) is expected to include a similar functionality via the WinFS file system. Linux implements file metadata using extended file attributes.

Image metadata

Examples of image files containing metadata include Exchangeable Image File Format (EXIF) and Tagged Image File Format (TIFF).

Having metadata about images embedded in TIFF of EXIF files is one way of acquiring additional data about an image. Image metadata are attained through tags. Tagging pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections. A prime example of a image tagging service is Flickr, where users upload images and then describe the contents. Other patrons of the site can then search for those tags . Flickr uses a folksonomy: a free-text keyword system in which the community defines the vocabulary through use rather than through a controlled vocabulary.

Program metadata

Most executable file formats include metadata describing issues that need to be considered by the runtime or operating system when executing the program.

In Java, the class file format contains metadata used by the Java compiler and the Java virtual machine to dynamically link classes and to support reflection. The J2SE 5.0 version of Java included a metadata facility to allow additional annotations that are used by development tools.

In MS-DOS, the COM file format does not include metadata, but the EXE file format does, and Windows PE format also. These metadata can include the company that published the program, the date the program was created, the version number and more.

In the Microsoft .NET executable format, extra metadata is included to allow reflection at runtime.

Document metadata: Most programs that create documents, including Microsoft Word and other Microsoft Office products, save metadata with the document files. These metadata can contain the name of the person who created the file (obtained from the operating system), the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. Other saved material, such as deleted text (saved in case of an undelete command), document comments and the like, is also commonly referred to as "metadata", and the inadvertent inclusion of this material in distributed files has sometimes led to undesirable disclosures.

For a list of executable formats, see object file.

See also

External links

References

1 Ralph Kimball, The Data Warehouse Lifecycle Toolkit, Wiley, 1998

2 Guy V Tozer, Metadata Management for Information Control and Business Success, Artech House, 1999

Personal tools