MEDIA IMPACT PROJECT
  • ABOUT
    • MISSION
    • SERVICES
    • TEAM
    • MIP FELLOWS
    • PARTNERS
  • RESEARCH
    • AFRICA NARRATIVE
    • ACTION CAMPAIGNS
    • FILM DIPLOMACY
    • JOURNALISM STUDIES
    • VIRTUAL REALITY
    • IDEOLOGY & ENTERTAINMENT
    • IMMIGRATION ON TV
  • PUBLICATIONS
    • Africa in the Media
    • IMMERSIVE - VR >
      • FRONTLINE VR STUDY
    • IMMIGRATION
    • HOW TO GUIDES
    • MEASUREMENT >
      • Data Repository
      • Measurement System
    • EXPERTS TALK IMPACT
  • BLOG
  • NEWS & EVENTS
  • CONTACT
  • ABOUT
    • MISSION
    • SERVICES
    • TEAM
    • MIP FELLOWS
    • PARTNERS
  • RESEARCH
    • AFRICA NARRATIVE
    • ACTION CAMPAIGNS
    • FILM DIPLOMACY
    • JOURNALISM STUDIES
    • VIRTUAL REALITY
    • IDEOLOGY & ENTERTAINMENT
    • IMMIGRATION ON TV
  • PUBLICATIONS
    • Africa in the Media
    • IMMERSIVE - VR >
      • FRONTLINE VR STUDY
    • IMMIGRATION
    • HOW TO GUIDES
    • MEASUREMENT >
      • Data Repository
      • Measurement System
    • EXPERTS TALK IMPACT
  • BLOG
  • NEWS & EVENTS
  • CONTACT

Taxonomies for Education News


Introduction

This document contextualizes the outcomes of the Harmony Institute’s commission to develop a taxonomy for coding education news and resources published online. The project’s mandate was to create a descriptive model that could serve the immediate needs of the Education Writers Association (EWA) and exist as a publicly available framework for labeling education journalism and related media for use throughout the field.
 
The summary that follows describes the rationale for developing two separate, though closely related models—a taxonomy for the Education Writers Association (EWAT) and a publicly accessible ontology designed to conform to Semantic Web standards (TagED). We briefly compare the two frameworks, and touch on aspects related to implementation and expansion that, while beyond the scope of this project, would add future value to either schema.

​Both models, along with supporting materials and guidelines for use, are submitted with this report.
​

Background and Purpose

In late fall 2013, the Harmony Institute conducted a needs assessment and audit of content coding practices at a diverse group of education and media organizations. The research plan was designed to inform development of a subject-specific taxonomy for the Education Writers Association (EWA), then in final stages of website redesign.

Toward this end, the audit report mapped the contours of knowledge relevant to journalism on the subject of education in the U.S.; assessed whether, how, and to what extent EWA and peer organizations tag and manage their digital resources; and presented best practices for applying classification techniques to aid in the search, retrieval, and tracking of content published online.
 
As detailed in the preceding Interim Report, our research findings revealed a wide range of approaches to, and protocols for, describing, labeling and managing digital resources. The Report concluded by outlining two distinct strategies for taxonomy design, and proposed next steps in the process. In the end, both solutions were realized, in response to stakeholder feedback and EWA preferences. The remainder of this document provides a high-level overview of the rationale for, and natures of the two taxonomies, which accompany this summary. 

AUDIENCE
The frameworks presented in this and the supporting documents are directed at digital publishers, editors, journalists, managers, IT staff, and stakeholders of organizations who produce or disseminate news media and information content on topics related to education in the U.S. 

RATIONALE
Of the two options presented in the Interim Report—initially labeled “A” and “B”—the former represented a refinement of EWA’s existing framework, in which topics in education were nested under one of two categories: P-12/Higher Education. We also proposed adding facets to the framework, for use in conjunction with the topic tree. (Facets allow objects or entities to be classified by any number of their properties or attributes, and can facilitate content navigation by opening up multiple paths of access to a given resource.)
 
The second strategy, “B,” proposed replacing EWA’s P-12/Higher Education dichotomy with new categories and subcategories, to better accommodate topics that could be applied to education at all levels. As in approach “A,” this faceted classification would permit flexible description of content.
 
Responses to the two strategies were mixed. Ultimately, given the assignment’s criteria—to create a taxonomy tailored to, and usable by the EWA, that would also embody a comprehensive, formative vocabulary for use by others in the field—we proposed to develop plan A as well as provide a draft of Plan B. 

Though challenging to execute within a limited timeframe, we felt that developing two taxonomies would enable us to address the operational needs and preferences of the EWA, as well as make the most of an opportunity to reimagine the framework. The decision to model Plan B as an ontology in the free, online ontology editor Protégé was informed by a desire to explore and illustrate leading technologies in information management on the Web. In addition, storing TagED’s term list in Protégé offers many of the advantages of using a software program to manage a term base, without additional cost. These include shared access for collaborative use, as well as features enabling users to visualize and explore the vocabulary structure in multiple graphic formats. Moreover, the terms included in the ontology can very easily be applied as those in Plan A, that is, as a term base for a faceted taxonomy.
 
Additional circumstances and considerations that informed this decision included:
  • the advanced stage of EWA’s site redevelopment and concern regarding implementation of a new framework;
  • evaluation of the performance of EWA’s existing schema;
  • perceived merits of reimagining and building an alternative taxonomy;
  • lessons learned from tactics adopted by others in the field;
  • methods advocated by service providers and authorities in information management;
  • stakeholder assessments of domain-specific taxonomy needs;
  • thorough research into current standards and best practices for metadata annotation and use. 

Comparing Education Taxonomies

The proposed “A” and “B” designs have been developed and codified as two distinct vocabularies: the Education Writers Association Taxonomy, and TagED, a slightly more advanced, linked data model for describing media coverage of any number of aspects related to education in the US.
Picture
As proposed, both models are grounded in the theory that resources on education can be grouped and classified on the basis of multiple co-existing fields, or facets. Because these fields exist on parallel, hierarchical levels of specificity (or tiers), the structures resemble that of a faceted poly-hierarchy. Multi-faceted description facilitates filtering and non-hierarchical exploration of content, and helps define the contours of the information resources being described.

 A key difference between EWAT and TagED is in the degree to which each can accommodate and describe its domain of knowledge. As an ontology written in OWL, TagED has the potential to capture and express relationships between any two entities described within a resource in a way that is understandable to humans and machines alike.

To put it plainly, metadata tags can be deployed to indicate that a resource “has something to do with” a given concept or entity. Unfortunately, simple keyword tagging fails to convey what the nature of that relationship is (see Stijn Debrouwere’s excellent blog series, in particular “Tags Don’t Cut It” for more on why they fall short).

An ontology, on the other hand, permits the organization and description of entities by virtue of their relationships to other entities. On the Web, ontologies express rules for logical relationships between resources.
 
Again, tagging a resource URL with a topic label (on a site like delicious, for example) doesn’t express how the topic relates to the resource. RDF, a data model discussed in the TagED guidelines, and OWL, a language for describing RDF data, model relations among entities through “triples.” Quite simply, these are three-part subject—predicate—object statements. (Thinking back to the tag, we can understand it as a statement lacking a verb: subject—object.)

​While TagED is still only a prototype, when further expanded and correctly applied, it can provide sufficient syntactical and semantic detail about a particular piece of content to allow a machine to draw logical inferences about the data contained within it and, moreover, relate it to similarly encoded data.

Picture

​EWAT VS TagED

EWAT
As described earlier, EWAT expands on the EWA’s legacy taxonomy, and comprises:
  • a hierarchical topic tree of 276 terms, nested under EWA’s two overarching P-12 and Higher Education categories;
  • a faceted taxonomy of additional elements, to inform the site’s filtered search capacity;
  • a sample thesaurus to promote lexical uniformity across the site as well as improve search functionality.
  • Output formats delivered with this report: Word/Excel.
 
TagED
TagED identifies and orders key concepts and entities within the domain of education journalism, treating the individuals, institutions, and concepts that play a role therein. TagED is:
  • built on Semantic Web technologies;
  • modeled in OWL 2, a computational logic-based language;
  • stored online, in WebProtégé, a free ontology-editing platform;
  • a vocabulary of 349 terms grouped into classes arranged in a hierarchical tree structure representing universals and instances of concepts;
  • 15 associated properties that can describe relations between concepts or entities;
  • a classification that can be visualized, browsed, and collaboratively edited through Protégé’s interface;
  • a model that can be exported from Protégé in multiple formats;
  • a model that can aligned with existing data models and vocabularies (i.e., Schema.org, Dublin Core, etc.) for maximum interoperability.
  • Output formats delivered with this report: OWL 2/Word/Excel/online via WebProtégé
Picture

Limitations and Recommendations

Limitations
The models described in the accompanying documentation were designed in response to the needs articulated during the first phase of the project. Their elaboration was subject to the following circumstances and constraints:
 
  • The short time frame for project development precluded end-user testing. Typically, once a controlled vocabulary has been drafted, it is tested against content, evaluated by site users, and adjusted accordingly. All modifications made to the taxonomy during this process must be carefully recorded, so as to chronicle the taxonomy’s development and provide a basis for future changes.

  • The scope of the project did not include implementation; technical and practical aspects of taxonomy deployment and management are treated only in passing. 
Recommendations and future considerations
(Note: More detailed recommendations may be found in either set of guidelines.)
  • Consider adding facets/classes to accommodate and track behavioral data/user interactions.
  • Weigh facets for comparative analysis across multiple variables.
  • Tag your most important content in alignment with an existing standard (i.e., Schema.org, Dublin Core, etc.) to make it easier for people and machines to find your content on the Web.
  • Work closely with IT staff to add vocabulary directly into your CMS, or configure as needed to facilitate use and application. To the extent possible, when using multiple databases or web publishing platforms, use the same term base across all platforms.
  • Conduct user testing to calibrate vocabulary and determine efficient workflow.
  • Train staff as needed (more on this in the individual guidelines).
  • Describe resources as richly as possible.
  • Make content shareable, findable, trackable, and re-usable!
There is no one way to design a taxonomy. The two models presented above were developed for specific use-cases. We hope the suggestions provided will help the EWA, its stakeholders, and others in the field take advantage of the benefits of subscribing to a controlled vocabulary. Although it may require some initial time and resource investment, developing and implementing a thoughtful, organization-wide, meta-tagging protocol offers many long-term benefits.

Ideally, it will enable administrators to better track content and identify trends in topic coverage, as well as nuances in page visits and browsing habits; inform editorial coverage and planning to advance an organization’s objectives more broadly; promote knowledge exchange throughout the field; and ultimately, provide value to and enrich the end user’s experience online. 

Acknowledgements

​Authors
Emily Toder and Joanna Raczkiewicz, with assistance from Nick Forster.
 
Acknowledgments
This work was supported through a grant from the New Venture Foundation. The authors gratefully acknowledge the feedback and contributions of Caroline Hendrie, Executive Director, and Glen Baity, Multimedia Manager at the Education Writers Association; Elizabeth Green, Executive Editor, and Anika Annand, Director of Engagement and Growth, Chalkbeat; Virginia Edwards, Executive Director, Rachel Delgado Director, Knowledge Services, Kay Dorko, Library Director, and Stacey Decker, Online News Editor, Editorial Projects in Education; Kathleen Kennedy Manzo, Managing Editor, Education Week; and Carol Rava Treat, Director of Operations and Strategy, Get Schooled Foundation.

Copyright and Contact
© 2014 Harmony Institute, unless otherwise noted.
Distributed under a Creative Commons SA-4.0 License.

Harmony Institute
54 W. 21st Street, Suite 310
New York, NY 10010
212.966.7606
harmony-institute.org 

The Norman Lear Center's Media Impact Project researches how entertainment and news influence our thoughts, attitudes, beliefs, knowledge and actions. We work with researchers, film and TV pros, nonprofits, and news organizations, and share our research with all.  We are part of USC's Annenberg School for Communication and Journalism.