Knowledge Organization Systems: UDC and Subject Headings

Systematic Knowledge Organization Systems (KOS)

Concept of Digital Knowledge Organization Systems

Knowledge Organization (KO) is one of the oldest concepts; the classification of knowledge arose from the human need to systematize all knowledge about the surrounding world and the concept of knowledge itself. There is no universal or perpetual classification of knowledge, since the creation of KOS is based on the worldview of their creators and their way of ordering and structuring knowledge. Organization is such a common activity we do it without thinking, and it is part of our daily lives: we organize physical objects, digital objects, and information about them.

Knowledge Definition: Barité (2015)

Area of knowledge […] that studies the laws, principles, and procedures by which specialized knowledge in any discipline is structured, with the aim of thematically representing and retrieving the information contained in documents of any nature, by efficient means that provide a rapid response to the needs of users.

Knowledge Definition: Hjorland (2008)

“KO is about describing, representing, filing and organizing documents and document representations as well as subjects and concepts both by humans and by computer programs.”

Two main aspects are:

  • Processes (KOP): cataloging, subject analysis, indexing, tagging, and classification.
  • Systems (KOS): selection of concepts with an indication of selected semantic relations, like in classification systems, lists of subject headings, thesauri, ontologies, and other systems of metadata.

There are terminology problems with the concept of knowledge (semantically close terms or quasi-synonyms), including: Knowledge and Information Management, Information Architecture, Classification, and Categorization.

History of Knowledge Organization (KO)

KO is a field of research, teaching, and practice, which is mostly affiliated with Library and Information Sciences (LIS). Different aspects include:

  • History of library classification systems (e.g., Melvil Dewey, Paul Otlet).
  • Classification of the sciences (e.g., Aristotle, Francis Bacon).
  • Scientific taxonomies (e.g., Carl Linnaeus, Dmitri Mendeleev).
  • Knowledge organization as a discipline: Charles A. Cutter or W. C. Berwick established KO as an academic field around 1990.
  • Key intellectual contributions: Henry Bliss’s book The Organization of Knowledge and the System of the Sciences (1929).
  • Ingetraut Dahlberg created the journal International Classification (1974) (renamed Knowledge Organization in 1993) and established the International Society for Knowledge Organization (ISKO) in 1989.

KOS Definition: Barité (2015)

A system of concepts whose main purpose is to provide unambiguous designations for the thematic representation of the content of documents, data, and other information resources, in any medium or structure, by means of codified symbols or linguistic expressions, in order to facilitate thematic search and retrieval, in an efficient, relevant, and pertinent way.

KOS Definition: Hodge (2000)

The term knowledge organization systems is intended to encompass all types of schemes for organizing information and promoting knowledge management. KOS are used to organize materials for the purpose of retrieval and to manage a collection (acting as a bridge between the user’s information need and the material in the collection).

KOS should permit users to:

  • Identify an object of interest without prior knowledge of its existence.
  • Guide users through a discovery process, by browsing, direct searching, or filtering.
  • Answer questions about the collection and its context.
  • Support efficient retrieval.
  • Be applicable by automatic or human catalogers.
  • Be meaningful to its users.

Common Characteristics of KOS

  • Category structure.
  • Language reflected.
  • No single knowledge classification is universal (multiple, variant ways to organize knowledge exist).
  • Imposes a particular view of the world on a collection.
  • A same entity can be characterized in different ways depending on the KOS used.
  • There must be sufficient commonality between the concept expressed in a KOS and the real world so the system can be used with reasonable reliability.
  • A person seeking relevant material must be able to connect his/her concept with its representation in the system.

Main Digital KOS Types

Terms Lists

These enumerate expressions, often with definitions:

  • Authority Files: Lists of terms used to control the variant names for an entity or the domain value for a particular field (e.g., names for countries, individuals, organizations). Nonpreferred terms may be linked to the preferred versions. This type of KOS generally does not include deep organization or complex structure.
  • Glossaries: A list of terms, usually with definitions. The terms may be from a specific subject field or from a particular work. The terms are defined within a specific environment and rarely include variant meanings.
  • Dictionaries: Alphabetical lists of words and their definitions (with a more general scope than glossaries). They may also provide information about the origin of a word, variants (by spelling and morphology), and multiple meanings across disciplines. They may also provide synonyms and related words.
  • Gazetteers: A list of place names where each entry may be identified by feature type, such as river, city, etc.

Classification and Categories

These systems create subject sets:

Subject Headings

A scheme type providing a set of controlled terms to represent the subjects of items in a collection. They can be extensive but often have a shallow and limited hierarchical structure. They tend to be coordinated, with rules explaining how they can be joined to provide concepts that are more specific. Examples: Medical Subject Headings (MeSH) and the Library of Congress Subject Headings (LCSH).

Classification Schemes, Taxonomies, and Categorization

An organized structure of terms corresponding to one or all areas of knowledge, represented by numeric or alphabetic notations, which aims to assign symbols to documents, according to their subject matter, in order to group, separate, organize, or reference them in a logical way. They are increasingly used in object-oriented design and knowledge management systems to indicate grouping of objects based on a particular characteristic. Examples: Dewey Decimal Classification, Bliss Bibliographic Classification, etc.

Relationship Lists: Thesauri

Thesauri are based on concepts and the relationships among terms. Relationships commonly expressed include hierarchy, equivalence (synonymy), and association or relatedness. These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonym), and RT (associative or related term). Standards exist for monolingual thesauri (NISO 1998; ISO 1986) and multilingual thesauri (ISO 1985). Examples: UNESCO Thesaurus, EuroVoc, etc.

Semantic Networks

They structure concepts and terms not as hierarchies but as a network or a web. Concepts are thought of as nodes, and relationships branch out from them. The relationships may include specific whole-part, cause-effect, or parent-child relationships.

Ontologies

A data system that defines the relationships between concepts in a domain or area of knowledge. They can represent complex relationships among objects, including rules, axioms, or restrictions missing from semantic networks.

The Universal Decimal Classification (UDC)

Definition and Scope

The UDC is a document indexing language in the form of a classification scheme covering the whole universe of knowledge. It is designed for subject description and indexing of content of information resources irrespective of the medium, form, format, or language. It is the world’s foremost multilingual classification scheme for all fields of knowledge and a sophisticated indexing and retrieval tool. It is a highly flexible classification system for all kinds of information in any medium. Because of its logical hierarchical arrangement and analytico-synthetic nature, it is suitable for physical organization of collections as well as document browsing and searching. The UDC is structured in such a way that new developments and new fields of knowledge can be readily incorporated. The classification code (or notation) is independent of any particular language or script (consisting of Arabic numerals and common punctuation marks), and the accompanying class descriptions have appeared in many translated versions.

Origins and Evolution of UDC

The Belgian lawyers and bibliographers Paul Otlet (1868–1944) and Henri La Fontaine (1854–1943) established the foundations of modern documentation and launched three main projects:

  1. The Universal Bibliographic Repertory (UBR).
  2. The International Institute of Bibliography (IIB).
  3. The Universal Decimal Classification (UDC).

Otlet’s Treatise on Documentation

Otlet authored the Traité de Documentation. Le livre sur le livre. Théorie et pratique (1934) [= Treatise on Documentation. The book about the book. Theory and Practice]. It is considered the first document dealing with library science and documentation.

The Treatise presents fundamental ideas and concepts related to the book and the document. The term documentation is approached as an integrating term encompassing any activity or discipline related to the document, such as archival science, bibliography, library science, or museology. The Treatise was written from a universal point of view and includes a systematic table of subjects which, as an index, allows any idea or aspect related to documents or materials to be classified. Otlet defines classification as a key element of thinking and of the document, as it becomes an essential point of access to the document. He felt that information retrieval would be determined to a large extent by classification. All these issues are considered his great contribution to the development of new technical advances in communication, information, and documentation during the 20th century.

UDC’s Debt to Dewey Decimal Classification

Otlet’s Universal Decimal Classification (UDC) has its origin in another classification system developed by Melvil Dewey, an American librarian who designed the Dewey Decimal Classification (DDC) to organize documents and provide an adequate location of books in libraries.

Dewey had analyzed other proposals for the organization of knowledge, such as:

  • The categories approach used by Francis Bacon (1561–1626).
  • The knowledge organization ideas/principles proposed by Georg W. F. Hegel (1770–1831).
  • The classification designed by William T. Harris (1835–1909) at the St. Louis Public School Library.

Dewey aimed to solve the following question: how to place the books in a library in such a simple and comprehensible way that they are immediately accessible in a general classification covering every book of the collection, so that they do not need to be renumbered, even if the shelves are overcrowded. Dewey’s Decimal Classification project intended to reduce the time, cost, or effort in preparing the organization and classification of books in the libraries. Its structure is based on a decimal hierarchical model, ranging from the broadest to the most specific topics. Each of the ten main classes is further divided into ten divisions and each of these into ten sections. Thus, each lower level is subordinate to the higher level.

Dewey’s classification system allowed for a complete grouping of all documents according to their subject, avoiding the need to reissue their call number with each new addition or expansion of the collection. The DDC was conceived in 1873 and applied at the Columbia College Library in 1883. In 1876, the DDC was publicly introduced at the Amherst College Library as A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library. Subsequently, the DDC was used systematically in almost all American libraries, with more than 20 editions.

Characteristics of the UDC

Main characteristics of the UDC:

  • Bibliographic and library classification.
  • Systematic representation of all branches of human knowledge.
  • It is organized as a related system of knowledge fields inter-linked.
  • It includes detailed vocabulary and syntax for content indexing and information retrieval in large collections.
  • Since 1991, it is owned and managed by the UDC Consortium, a non-profit publisher association in The Hague, Netherlands.
  • Conceived and maintained as an international scheme. The online UDC summary is in more than 50 languages.
  • It can comprise not only textual documents, but also media (video, sound recordings, illustrations, etc.) and museum objects.

Structure and Use of the UDC

Principle of Organization

The organization of knowledge in UDC is discipline-based. This means that concepts are included and placed in the field under which they are studied. This particular feature is usually implemented in UDC by re-using the same concept in various combinations with the main subject, e.g.:

  • A code for language in common auxiliaries of language is used to derive numbers for ethnic grouping, individual languages in linguistics, and individual literatures.
  • A code from the auxiliaries of place, e.g., (410) United Kingdom, uniquely representing the concept of United Kingdom can be used to express 911(410) Regional geography of United Kingdom and 94(410) History of United Kingdom.

Hierarchical Structure

UDC has the ability to express not just simple subjects but relations between subjects by a hierarchical structure, in which knowledge is divided into ten classes, then each class is subdivided into its logical parts, each subdivision is further subdivided, and so on. The more detailed the subdivision, the longer the number that represents it. The longer the notation, the more specific the class. This is made possible by the decimal notation.

Notation

The code representing a class and hierarchy. The symbols chosen for a UDC notation are language independent and universally recognizable: the Arabic numerals, supplemented by other signs familiar from mathematics and ordinary punctuation. Classifications with hierarchically expressive notations are much friendlier to navigate and use. The decimal notational system used is extensible, so it allows the introduction of new subdivisions. Notation mirrors the hierarchy, and each digit or letter of the notation will represent one level in division. The deeper in the hierarchy the concept is, the longer the notation.

UDC Syntax

Codes from different tables combine to present various aspects of document content and form. For example:

  • 94(410) ’19′(075) represents: History (main subject) of United Kingdom (place) in 20th century (time), a textbook (document form).
  • Another example: 37:2 represents the Relationship between Education and Religion.

Complex UDC expressions can be accurately parsed into constituent elements.

Main Tables

These tables contain the disciplines and branches of knowledge divided into 10 classes hierarchically divided. Main tables or classes are numbered from 0 to 9 (number 4 is vacant).

Auxiliary Tables

Each main UDC class may also contain tables called special auxiliaries (or special auxiliary numbers), which express aspects that are recurrent, but in a limited subject range. These are usually facets of concepts related to techniques, processes, materials, agents, etc. They are listed only in particular sections of the main tables. Special auxiliary numbers can be recognized as they all begin with one of these three specific symbols/indicators: .0 (point nought), - (hyphen), or ' (apostrophe). Any UDC number beginning with any of these symbols can be combined with any other UDC number in its designated area of application. These tables contain common auxiliary signs and common auxiliary numbers.

Alphabetic Knowledge Organization Systems: Subject Headings

Defining a Subject Heading

A subject heading can be defined as a word or phrase expressing a concept or combination of concepts. It is the standard entry in a list of subject headings. It describes each of the topics identified in a document, and is used as a thematic access point to the document.

There are two ways to establish subject headings consisting of more than one word:

  1. In their natural order (e.g., Library architecture).
  2. By subordinating the secondary term (subheading) to the main term (heading), for instance: Libraries—architecture.

The Subject Heading List

A subject heading list is a standard list of terms to be used as controlled vocabulary, either for the whole field of knowledge or for a limited subject area, and including references made to and from each term, notes explaining the scope and usage of certain headings, and occasionally corresponding class numbers. Such a list is normally arranged alphabetically. Both preferred and rejected terms are listed in the same sequence. The terms are linked by “SEE” and “SEE ALSO” references.

IMPORTANT: At present, most of the subject headings lists have adopted a thesaural structure. Both thesauri and subject headings lists control the use and form of index terms and summarize the relationships between terms in an indexing language. Most of the main subject headings lists are oriented toward an alphabetical subject approach.

Main Differences between Thesauri and Subject Headings Lists

  • A thesaurus is likely to contain terms that are more specific than those found in a subject headings list.
  • A thesaurus tends to avoid inverted terms, such as: Art, French.
  • The relationship visualization in a thesaurus is often more extensive than the relationship visualization in subject headings lists.
  • Different types of relationships are shown in a thesaurus by the use of BT (Broader Term), NT (Narrower Term), and RT (Related Term), instead of SEE or SEE ALSO which is frequently used to indicate all relationships in a subject headings list. (Lately, however, many subject headings lists are also using BT, NT, and RT to show relationships).

Forms of Headings

Single-Term and Compound Headings

  • Single-term headings: Consist of a single word or phrase used to represent a specific subject or topic. These headings are direct and concise, providing a clear indication of the content of the resource. Examples from MeSH:
    • Demography
    • Support of research
  • Compound headings: Consist of multiple terms or phrases joined together to represent a complex subject or topic. These headings are formed by combining two or more terms, typically using conjunctions like “and,” “or,” or “with.” Examples from MeSH:
    • Analytical, diagnostic and therapeutic techniques, and equipment
    • Congenital, Hereditary, and Neonatal Diseases and Abnormalities.

Subheadings

Subheadings are used to concretize the subject matter or delimit its meaning, as well as to pre-coordinate concepts, thus facilitating the retrieval process. A hyphen (-) or slash (/) separates the subheading from the heading. Examples: English literature—20th century—History and criticism (from BNE) or Aspirin/therapeutic use (from MeSH).

Types of subheadings include:

  • Subject/Topical subheadings: They specify the simple heading and can indicate the point of view from which a subject is studied.
  • Form/Genre subheadings: Describe the physical or intellectual characteristics of the resource, such as its format, genre, or type. Examples include subheadings like “Biographies,” “Case studies,” “Dictionaries,” or “Theses.”
  • Topographic/Geographic subheadings: Specify the geographical area or location to which the document refers or the region associated with the content of the resource.
  • Name headings: Include personal (authors) or corporate names (organizations) as the primary focus.
  • Chronological headings: Indicate the time period or historical context of the content.
  • Faceted headings: Faceted headings break down subjects into distinct facets or components, allowing for more granular and precise subject representation.

Characteristics of Subject Headings

  • Consistency: Subject headings follow established rules and guidelines to ensure consistency in indexing and retrieval across different library catalogs and databases.
  • Hierarchical Structure: Subject headings often have a hierarchical structure, with broader terms encompassing narrower ones. This structure allows for more precise searching and browsing of information.
  • Controlled Vocabulary: Subject headings are typically chosen from a controlled vocabulary, a predetermined list of terms approved for use in indexing. This helps to avoid ambiguity and ensures accurate representation of content.
  • Metadata Enhancement: Subject headings enhance the metadata associated with a resource, providing additional information about its content beyond basic bibliographic details.
  • Subject Access: Subject headings allow users to access resources by subject, providing an alternative to searching by author, title, or keyword.

Use of Subject Headings

  • Browsing: Subject headings enable users to browse through related materials within the same subject area. This capability helps users discover new resources and extend understanding of a particular topic.
  • Retrieving: Subject headings facilitate the information retrieval process by allowing users to search for materials based on their subject matter. Thus, users can quickly locate resources on specific topics of interest.
  • Organizing: Subject headings support librarians and information professionals in organizing library collections by grouping materials on similar subjects, thus enhancing accessibility to library holdings.
  • Indexing: Subject headings are used in database indexing to assign descriptive terms to documents, making them more searchable and retrievable, improving the accuracy and relevance of search results.
  • Cataloging: Subject headings play a crucial role in cataloging and metadata management, ensuring that resources are accurately described and classified according to their subject content.

Principles for Using Subject Headings

  • Economy: Avoid giving a document too many headings; three are often sufficient. If the document touches on many specific topics, choose a more generic one.
  • Specificity: The term chosen must represent the correct subject matter of the document. Two headings, one general and the other specific, should not be assigned at the same time in the same work.
  • Linguistic: The terms must belong to the usual language and respect the natural order of expressions.
  • Uniformity: Each subject must always be referred to by the same name. Where polysemy occurs, the ambiguity must be clarified or eliminated by means of a modifier of the meaning of the heading.
  • Usage: Rules should be established according to the organization and user needs.
  • Summarization: The aim should be to represent a document by reducing its content.

Structure, Scope Notes, and References

Structure of Subject Heading Lists

Subject heading lists typically follow a hierarchical structure, organized into broader, narrower, and related terms. This helps users navigate the lists and locate the most appropriate subject headings for their information needs. Each subject heading may be accompanied by a unique identifier, code, or notation for unique identification. Subject headings are often grouped by subject area or category, with related terms grouped for easy comparison and exploration. Some subject heading lists may include cross-references, which point users from one term to another related term, helping them to refine and improve search results.

Scope Notes

Scope notes provide additional context and clarification for individual subject headings, explaining their intended meaning and usage. Scope notes may define the scope or boundaries of a subject heading, specifying what topics are included or excluded from its coverage. Scope notes may also offer guidance on when to use a particular subject heading in preference to others, helping users select the most appropriate terms for their searches. Scope notes may highlight specific nuances or distinctions between related subject headings, aiding users in understanding the subtle differences in meaning.

References

References in subject heading lists serve several purposes, including directing users from synonymous or related terms to the preferred or authorized term. Cross-references guide users from non-preferred or variant terms to the authorized form of the subject heading, promoting consistency and standardization in indexing and retrieval. References may also point users to broader, narrower, or related terms, facilitating navigation within the hierarchical structure of the subject heading list. References may include SEE REFERENCES (e.g., “SEE: [Preferred Term]”) and SEE ALSO REFERENCES (e.g., “See also: [Related Term]”), depending on the nature of the relationship between terms.

Online Subject Heading Lists

Examples of subject heading lists for the whole field of knowledge:

  • Library of Congress Subject Headings (LCSH).

Example of subject headings list on a limited subject area:

  • Medical Subject Headings (MeSH).

MeSH searching involves direct searching and tree view (browsing). See also PubMed searching by MeSH terms. Examples of search terms: Aspirin, Heart attack, Headache, Clinical trial.