INTERNATIONAL ORGANIZATION FOR STANDARDIZATION


ISO/IEC JTC1/SC29/WG12

Multimedia and Hypermedia information coding
Expert Group (MHEG)

MHEG 98/N1179


Date: March 1999
Source: Chris Dobbyn, Oxford Brookes University, David Shrimpton, University of Kent at Canterbury, Tom Casey, Visiting Research Fellow, Oxford Brookes University
Title: Models of Convergence between the World Wide Web and Interactive Television using MHEG-5:
Status: Paper for presentation at: Third IASTED/ISMM Conference on Internet and Multimedia Systems and Applications (IMSA '99)
Requested Action: Review for discussion at Seoul meeting on MHEG-XML requirements
Distribution: MHEG Members,  Mail list reflectors and
http://www.mheg.org

Models of Convergence between the World Wide Web and Interactive Television using MHEG-5:

Chris Dobbyn, Oxford Brookes University, Oxford, UK, Tel: ++44 (0)1865 483673, E-mail: c.dobbyn@brookes.ac.uk
David Shrimpton, University of Kent at Canterbury, Canterbury, UK, Tel: ++44 (0)1227 823532 E-mail: d.h.shrimpton@ukc.ac.uk
Tom Casey, Visiting Research Fellow, Oxford Brookes University, UK, Tel: +44 (0)171 383 5257  E-mail: tcasey@tcasey.demon.co.uk

Abstract:
There is a general consensus that the World Wide Web and Digital Interactive TV, together with other computer features such as database access, are converging technologies: and the question of the merger between these technologies is less a matter of if it will happen but when and exactly how. This paper discusses possible models for convergence and integration of the WWW and Digital ITV. The MHEG-5 ISO Standard for final-form representation and interchange of interactive multimedia objects is introduced and the current pattern of DTV broadcasting in the UK, which incorporates MHEG-5, is outlined. A number of models for extending the current DTV service to enable user interactivity, principally based on MHEG-5 and the Internet, are considered; these extensions all entail a convergence between ITV and the WWW. MHEG-5 is proposed as potentially pivotal to this convergence; and current work on formulating a notation for MHEG-5 standard objects in the Extensible Mark-up Language (XML) is described, together with a discussion of the technical issues involved in this work.

Keywords: Interactive Television (ITV), Multimedia/Hypermedia Experts Group (MHEG), Extensible Mark-up Language (XML)

1 Introduction
In this paper, we set out to discuss various models for the convergence and integration of two rapidly expanding technologies: that of the World Wide Web and that of Digital Television, focussing in particular on the role that the ISO MHEG-5 Standard [1] may be able to play in this unification.

There has been a rapid development of digital transmission technologies over the past few years. In Europe, the DVB [2] project has enabled specifications to be rapidly produced and evaluated for ratification by ETSI. One result of this activity has been the introduction of commercial Digital TV in the UK during 1998. Digital TV (DTV) not only provides higher quality picture and sound dissemination than conventional analogue TV, but also makes user interaction with the service possible: digital transmission technology allows truly interactive TV (ITV). Eventually, it seems probable that applications such as teleshopping, distance learning, multi-player games and video on demand will be integrated into the services available from a domestic television

The World Wide Web (WWW) is, of course, based on the concept of a user’s ability to interact with multimedia documents and to navigate between documents across embedded links. We believe there is a general consensus that the WWW and ITV, together with other computer features such as database access, etc., are converging technologies. It is expected, for instance, that manufacturers will include some form of HTML in set-top decoders and there are already Set-Top-Boxes (STB’s) devoted to Web browsing: Microsoft’s WebTV, which processes HTML received via a modem connection, is an example of this style of convergence of the WWW with television [3]. Moreover, TV cards can easily be fitted into PC’s equipped with Internet software and connections; and, although the technology of the Internet is not suitable for broadcasting ITV at present, much work is currently being done to develop the architectures and protocols necessary for Internet-transmitted ITV. Thus, it seems that the question of convergence between these technologies is no longer if it will happen but when and exactly how.

In Section 2, of this paper we outline the MHEG-5 Standard, which we believe may have an important role in the confluence of ITV and the WWW. Section 3 discusses the current model of DTV in the UK in 1999, while Section 4 goes on to advance three possible paradigms for the delivery of D/ITV and considers a number of possible means by which the model of Section 3 might be amended or expanded in the light of these. In Section 5 we summarise the current discussions about the representations of MHEG-5 objects in the Extensible Mark-up Language (XML) and propose this as an important potential mechanism of WWW/ITV integration. In Section 6, we offer a number of concluding remarks.

2 MHEG
The Digital Audio and Visual Council (DAVIC) has developed a set of specifications to enable interoperability of digital audio-visual services such as ITV [4]. For one part of these specifications—the interchange of multimedia/ hypermedia information—MHEG-5 was chosen as a standard. MHEG-5 is an ISO Standard ISO/IEC /JTC1/SC 29/WG12 [5], one of a range of multimedia/hypermedia standards originating with the Multimedia Hypermedia Experts Group. MHEG-5 specifies a group of formats and functionality for final-form representation, interactivity, synchronisation, and real-time presentation and interchange of multimedia objects The standard allows the grouping of multiple media object types into a single interchange unit, also providing support for other standardised formats, such as MPEG and JPEG by external reference. MHEG-5 allows the specification of small, precise subsets of these objects for simple interactive multimedia domains, such as Video on Demand (VoD), teleshopping, etc. There are many client/server multimedia applications, ranging from home shopping and medical patient records, to education where the MHEG-5 Standard has potential applicability; the Standard refers to such fields as Application Domains (AD’s). A distinguishing characteristic of MHEG-5 is that it can be adopted for many AD’s.

At source, multimedia types are grouped into an MHEG object and encoded into a special format—at present, two such formats are available, ASN-1 and EBNF Textual Notation. The encoded object is then transmitted to the client where it must be decoded by dedicated software—an MHEG Engine—which is also responsible for the presentation of the object and its subsequent modifications, as well as handling synchronisation and user interaction: facilities for such interaction are specified within the MHEG Standard in special Event, Link and Action classes. This model is summarised in Fig 1.

Figure 1

MHEG was designed with the need for an interpreter with as small a footprint as possible in mind. An MHEG engine would generally be required to fit into a "set top box" (STB)—although it could, of course, be implemented on any suitable computational device—consuming as few resources in terms of memory and processing power as possible, in order that the cost of the box can be kept down. Typically, applications would reside on a server, and parts of these applications would be downloaded to the MHEG engine on the client, as required.

MHEG-5 has already been adopted as a standard for DTV broadcasting in the UK and being considered for industry adoption in Japan and other European countries including Spain and Norway.

3 The Current Digital TV Model
Fig 2 illustrates the current model of DTV broadcasting [6], as it has been introduced into the UK in 1998, as laid down in the UK DTG profile—which comprises an MHEG-5 profile and other features. The broadcaster transmits a digital stream, either terrestrially, to a satellite, or across a

Figure 2

cable to a recipient, who will be equipped with an STB incorporating software capable of decoding the stream and rendering it into pictures or sound on a television. The broadcaster also provides a carousel—on the same principle as a teletext carousel—of MHEG-5 encoded objects, which are integrated with the digital television stream. The recipient’s STB contains an MHEG engine to which these objects are passed, after they have been separated from the stream; the engine then handles their rendering on the TV stream, together with any elementary forms of interaction that are possible within this model. Fig 3 illustrates an expanded model, not yet available, but which could quite easily be realised within existing standards, and giving a limited form of interactivity Here the STB also contains some form of internet capability, in which case web pages may be selected and retrieved from a third-party ISP by conventional means, such as a phone line, or possibly along the same cable that supplied the TV broadcast, if the recipient is so equipped. These pages may then be rendered by the STB in a number of ways—superimposed on the TV picture, presented in a separate frame, etc. The browser and the MHEG software may, or may not, be linked internally.

On this model, therefore, DTV is currently a "push" medium, without genuine interactivity, except of the most basic sort. Furthermore, the use of the carousel and the lack of any direct connection between the third party and the broadcaster severely limits the range of multimedia objects that are available to the recipient. Third parties would be expected to provide multimedia objects in MHEG format to the broadcaster for inclusion in the carousel. The third party may prepare material as MHEG objects and then supply these to the broadcaster for distribution. Alternatively, if the information provider is already preparing items for WWW distribution then these could also be sent to the broadcaster: a translation process therefore needs to occur at some point, this translation either being undertaken by the third party or by the broadcaster.

Figure 3

There is current work on mechanisms for new objects to be inserted into a broadcaster’s carousel dynamically by outside parties. These objects might be designed to be included as part of some existing MHEG-5 application or may be an entire application. If we assume a capacity for such dynamic insertion, a question arises of how speedily the objects could be created and supplied: a program prepared weeks or months in advance might give plenty of time for supplementary information to be incorporated; but at the other extreme, live broadcasts might require information to be merged with the broadcast at very short notice—an example of such a requirement would be subtitles for the deaf. Even if we assume a system in which such dynamic access is adopted, then, the model is evidently a limited one.

4 Models for Integration of ITV with the World Wide Web

4.1 Back Channels

For DTV to become truly interactive, it is obvious that there must be some means of communication between the broadcaster and the recipient, as portrayed in Fig.4. In this model, the broadcaster transmits a digital stream comprising pictures, sound and MHEG-5 encoded objects, as in the DTV pattern of Figs. 2 and 3. As in Figs. 2 and 3 also, the STB is responsible for decoding and rendering the TV pictures and sound, the MHEG objects being separated from the stream and passed to the MHEG Engine, which handles the synchronisation, interaction and display appropriate to these objects. However, there is now a back-channel between recipient and broadcaster, across which the recipient may pass responses to items displayed on her screen back to the broadcaster, which will be expected to respond in an appropriate fashion. The provision of the back channel on the same network as the MHEG is received from is, of course, only currently practical with cable. To give an oft-used example, the recipient may be viewing a teleshopping programme and see displayed an item she wishes to purchase; the MHEG Engine may be displaying a panel, either overlying part of the TV picture, or in a separate frame, or the recipient may be able to call up this panel by some action on the hand-held controller. It is then possible, for example, to click a button on this panel to purchase the item—again through some facility, such as a pointing device—provided on the controller. The purchase message is sent across the back-channel; and the broadcaster may then transmit another MHEG object in response, a form perhaps to be displayed by the set top box, filled in by the recipient on-screen and then returned via the back-channel to the broadcaster for forwarding, or to a third party, as it is unlikely that the broadcaster would handle all responses for every advertiser.

Figure 4

4.2 Physical models of integration
Clearly, three hardware setups could achieve the model of integration described in the section above. In the first, the recipient is equipped with an enhanced STB, with the interfaces and software required to handle the back channel and to integrate user interaction and messages. Here the main issues are ones of cost: STB manufacturers generally strive to limit the cost of their hardware by restricting the amount of memory and software that the box carries—most STB’s currently have 4mB of memory and small-footprint decoding software and MHEG engines. To equip boxes with additional interfaces, an Internet browser with its associated plugins and other enhanced software functionality, such as a Java Virtual Machine [7] (the MHEG-6 standard provides for the existence of a JVM), as well as additional memory and—in an extreme case—permanent disc storage, would obviously add heavily to costs. MHEG-5 itself allows for obviating the need for additional software: simple back message passing, such as send forms to the broadcaster could be handled by an MHEG application. However, where HTML or other non-MHEG encoded information is to be exchanged would necessitate additional software.

So the notion of an STB functioning substantially as a dedicated computer leads naturally to a second possible setup, wherein a standard PC is equipped with TV card and additional software is supplied by the broadcaster. Here the PC simply functions as TV and STB combined. In both the above configurations, the back-channel may be realised physically by means of a telephone line (a method which is already employed within analogue TV broadcasts where viewers are ask to phone in). In this case, the recipient’s equipment would have a telephone link built in and could dial the broadcaster (or a third party) when a particular selection was made. Information could then be passed back to the broadcaster across the public telephone service, enabling the user (at a cost) to send orders for goods etc. directly to the third party supplier. Another method for returning information would be via the Internet, for end users with appropriate connectivity. In this mode, the recipient would have an Internet interface as well as an interface to the broadcaster. The user might be able to move between the Internet based material and the broadcaster’s material by either selecting directly from a hand held device or transparently via a software link within the recipient’s receiving and decoding device, be it computer or STB. Information could be passed between the two software domains, requiring MHEG-5 to pass parameters out; this facility is not currently included in the standard.

The potential of the Internet to implement the back channel gives rise to a third possible physical setup, one in which the closest possible integration of WWW and ITV is manifested. In this, the DTV signal is simply broadcast over the Internet under IP to a PC, on which the requisite browser and MHEG software resides; the back channel is simply provided via the same Internet link. As we stated above, the Internet was not originally conceived as a transmission medium for TV and other continuous media applications in real-time streams, due to bandwidth constraints and the lack of support for Quality of Service Guarantees within the Internet Protocol suite. More recently, however, improvements in compression techniques, coupled with the greater processing power generally available on modern personal computers, and the introduction of additional protocols, have allowed the development of continuous media applications for the Internet. However, such improvements may not be enough, and may never be enough, to satisfy broadcasters’ needs for synchronisation with minimal delay.

In all the three patterns described, there is further potential for elaboration of the hardware. The hand-held controller shown on Figs 2, 3 and 4 need not be seen as simply a device for handling an on-screen pointer and for selection, but could be an palm-top computing device, along the lines of a Palm-Pilot or similar electronic organiser/computer. With such a device in place, the MHEG Engine incorporated into the STB or computer could manage a stream of objects between the controller and the STB: pictures and forms could be displayed on the palm-top, with selections being made, and data entered by means of a stylus on a touch-sensitive screen; users could compose e-mail messages on their controllers and forward them to the STB for onward transmission, etc.

Which of the above three physical models, if any, will come to must be decided on the merits of the existing and emerging technologies, as well as on economic and sociological grounds. As we argued above, a barrier to the first model may be cost to the consumer; to the second, conservative attitudes to the role of computers, reflecting technological differences between TV and PC hardware [8], may present difficulties in certain countries. As for the third model, the technology is not yet proven, although we believe it is only a matter of a short time before Internet TV is possible.

4.3 MHEG and the Web
Whichever configuration for ITV predominates, however, it seems almost certain that the Internet will play a major part in it. Furthermore, although many DTV authorities have adopted MHEG-5 as their standard world-wide, there is not as yet a full consensus; there are alternative systems to MHEG and once the quality of service issues are resolved it may be that IP/Internet solutions to DTV transmission are implemented. It has been often stated that the MHEG-5 Standard would become much more widely known, adopted and used if it were more readily accessible to content authors and application developers. As we stated in Section 2, the normative components of the standard are at present unambiguously expressed in ASN.1 and EBNF textual notation. The textual notation was provided as an alternate encoding because it is human readable; but—although the notation is well known to most computer scientists—it is still thought to be an esoteric format, unfamiliar to many potential multimedia application developers. We believe that the standard would attract a wider user community if a more familiar "tag type" mark-up language was used to express the elements of the Standard, raising again the question of MHEG-5’s relationship to existing Web tools. For all these reasons, there fore, the ISO WG12 is now investigating a convergence between MHEG and WWW technologies; and it is to this convergence we turn in the next section.

5 MHEG and XML

5.1 The Task
WG 12 has made a proposal [9] to create a new work item in the form of a Technical Report that would define an XML notation for MHEG –5, to accompany the two existing notations, as well as MHEG tools for the use of XML application developers. The Extensible Mark-up Language (XML), as is well-known, is a widely-used hypertext mark-up language [10] based on the earlier, more broadly defined, ISO Standard Generalised Mark-up Language (SGML) [11] We argued above that many MHEG applications or application components would be retrieved from the Internet and would often be written by authors familiar with HTML document formats ; hence, the existence of an XML-capable MHEG-5 engine would be likely to appeal to a very wide audience of potential users. Furthermore, since much content material would presumably be stored on Intranet servers, this material could then be easily retrieved and quickly inserted into broadcast applications. For example, broadcast reporters could use suitable editing tools in the field to write multimedia material in XML and download this material to a web server, from which it could be retrieved by TV editors and incorporated into broadcasts. The overall outcomes of the MHEG-XML work will be as follows:

The introduction of an MHEG XML representation should enable MHEG objects to be resolvable by both a suitable MHEG engine and an XML enabled browser. An XML document consists of two parts: a Document Type Definition (DTD), in which the format of the document is expressed; and a second component, a Style Sheet (SS) set out in the (e)Xtended Style Language (XSL) [12]. An XML capable browser, such as Microsoft Internet Explorer 5.0, will incorporate a DTD parser and an XSL processor, and thus will be able to combine and render the two components into a final displayed document. The situation is summarised in Fig 5.

Figure 5

An MHEG-5 XML-capable engine would need facilities for retrieving, parsing and rendering conformant objects. An overview of such a system’s features is shown in Fig 6. The combined and appropriately coupled nodes: XSL, DTD, XML Parser and Browser would handle and present MHEG-5 conforming retrieved data and applications

Figure 6

On the surface it would appear that to produce an XML version of MHEG-5 is not a particularly difficult task to accomplish; but a simple set of transition rules for translation from ASN.1 or EBNF textual notation to XML is not by any means the only issue, as we discuss more fully in the next section. For example, the manner in which the MHEG-5 engine is implemented and interprets the behaviour of XML expressed applications is a matter of crucial importance. Browsers are good at presentation but do not yet support synchronisation, moderately complex interactivity, event handing, and unambiguous interpretation of action features, which are an important aspect of MHEG – 5 and the engines that interpret it. These and other issues are discussed in the next section.

5.2 Discussion
The key technical issues associated with the production of an XML-based encoding of MHEG-5 standard objects that have been currently identified by the discussion group set up by the ISO/IEC /JTC1/SC 29/WG12 are as follows:

5.2.1 Scope
The first point to be clarified is the precise overall objective of this work. There are two possibilities: (1) simply the creation of an alternative external notation for expressing MHEG-5 objects, on a par with that of ASN-1 and EBNF text; or (2) the redeployment of the concepts that underlie MHEG-5 into a Web environment. In the case of (1), the advantages are that existing tools could probably be drawn upon or extensions to these can easily be implemented. However, (2) suggests more wide-ranging changes, starting perhaps with an add-on XML module to NG (New Generation) HTML [13]. It has been pointed out [14] that discussions on new developments in Web technology directly relevant to the present and future scope of MHEG –5 are taking place in numerous other forums: the W3C SYMM group [15] (synchronisation); Web-3D [16] (graphic primitives); ASTC/DASE [17] (Broadcast HTML); as well as countless Java and JavaScript interest groups. Any alterations or additions to the scope of to MHEG-5 arising from the XML work should take account of all these other simultaneous discussions. At such a time of flux, with so many parallel developments taking place, the danger of (2) is of losing touch with the original conceptions of MHEG-5, a mature successful standard [14].

5.2.2 Isomorphism
Given that pattern (1), described in Section 5.2.1 above seems initially to be the most profitable and least risky path, the question arises whether the XML notation specified should be fully isomorphic with both other notations, i.e. the relation

    " X : X is MHEG-XML ç è X is MHEG-5

obtains, or whether the more modes subset relation

    " X : X is MHEG-XML è X is MHEG-5

 holds, at least in the initial stages of work. Price [18] points out that the subset relation may facilitate integration with the new dialects of HTML emerging from W3C, and new browsers.

5.2.3 Progressive specification
As the spectrum of possible devices in which MHEG-5 objects could be used is very wide—ranging from high-end presentation devices used in professional applications such as broadcasting and virtual reality, through PC’s and TV’s to handheld devices such as pagers and mobile phones—a single "all or nothing" standard is inappropriate. Thus, a range of progressively less inclusive definitions, starting from a "studio maximum"—an SGML/XML specification without any limitations or omissions—down to small scale subsets of the standard for domestic or mobile use will be put forward [18].

5.2.4 Synchronisation, MHEG Engines and Browsers
As we pointed out above, many aspects of the MHEG-5 standard, principally among them the facility for synchronisation, do not feature in the design of mark-up languages and their browsers. Early discussions of how time-dependent synchronisation might be incorporated into the XML definition of MHEG centred on the idea that the Synchronised Multimedia Integration Language (SMIL) [19] plus XML might provide an appropriate solution. For other (non time-dependent) synchronisation a suitable Engine/Browser can be modified to support such features directly from the XML of the MHEG-5  application. In Fig 7 an overview of the components of an MHEG-5 interpreter thus modified is shown. XML code is input to the object manager where it is parsed, an object table created and XML type data structures are identified. Within the interpreter there resides an XML-capable browser encapsulating an MHEG-5 XML processor and SMIL engine as a Plugin. It is within this inner component that MHEG-5 Action, Event, and Link processing takes place.

Figure 7

However, there is no general agreement on this model. Some participants [20] believe that the idea that existing XML tools can be coupled with SMIL engines and used as MHEG interpreters with minimum adaptation is simplistic; and Price [18] points out that the semantics of mark-up languages such as XML are expressed through style sheets—the MHEG-XML definition would probably follow this—whereas, the synchronisation semantics of SMIL is hard-coded into the language’s grammar; thus SMIL is not a suitable basis for the MHEG-XML studio maximum.

5.3 Conclusion
We have presented a number of models for the possible future integration of ITV and Web technologies. These may be arranged in a rough hierarchy in increasing order of the degree of convergence they embody, as follows:

  1. Ad hoc convergence with a supplementary Internet connection
  2. Convergence based on an Internet-based back-channel
  3. Internet delivered TV with an Internet-based back-channel

In all these models, we concluded, the MHEG-5 standard for multimedia/hypermedia encoding could play a pivotal role. This led to a discussion of current work on MHEG-XML, in which we argued that a successful evolution of an XML notation for MHEG-5 could lead to wider acceptance of the MHEG standard and be motor for convergence between Web and ITV technologies.

6 References

[1] ISO/IEC IS 13522-5 (MHEG), Information Technology - coding of Multimedia and Hypermedia Information, Part 5: Support for Base-level Interactive Applications, November 1996.
[2] DVB, DVB Digital Video Broadcasting 1997, www.dtg.org.uk/tech/dvb.htm
[3] WebTV System Guide: What is WebTV? developer.webtv.net/docs/sysgde/sysgde.html
[4] Digital Audio Visual Council, DAVIC 1.x specifications,  1995,1996,1997    www.davic.org/WORKPLAN.htm
[5] The URL of SC29/WG12 is www.mheg.org
[6] DMUX group, British Digital Broadcasting, Digital Terrestial Television MHEG-5 Specification, 1998 www.dtg.org.uk/dtgstuff/MHEG-5_Profile_Issue_1.pdf
[7] Geyer L., et. al. MHEG in Java -- Integrating a Multimedia Standard into the Web, Sixth International World Wide Web Conference, poster session, ID 723, Santa Clara, California, April 1977.
[8] Mornington-West, A MHEG5 and the WWW in Television Broadcasting: Confusion or Convergence, W3C Workshop "Television and the Web, Sophia-Antipolis - France 1998
[9] MHEG-XML is based on ISO/IEC IS 13522-5 (MHEG) Part 5: Support for Base-level Interactive Applications, (See documents mh_n1172 and mh_n1175, January1999
[10] W3C REC-xml-19980210 Extensible Mark-up Language (XML) 1.0 W3C Recommendation, February 1998 www.w3.org/TR/1998/ REC-xml.html
[11] ISO/IEC IS 8879 Information Processing -- Text and Office Systems -- Standard Generalised Mark-up Language (SGML), 1986
[12] Goldfarb G.F. & Prescod, P, The XML Handbook, Prentice Hall PTR, NY 1998
[13] HTTP-NG Working Group W3C HTTP Next Generation (HTTP-NG), W3C Architecture Document, www.w3.org/Protocols/HTTP-NG/
[14] Hofrichter K., Personal Communication to SC2MHEG-8.25 Discussion Group, February1999.
[15] W3C Synchronized Multimedia Activity www.w3.org/AudioVideo/Activity.html
[16] For example, see www.asymetrix.com/products/web3d/index.html
[17] ATSC Specialist Group T3/S17 for DTV Application Software Environment (DASE)     toocan.philabs.research.philips.com/misc/atsc/dase/
[18] Price R., Personal Communication to SC2MHEG-8.25 Discussion Group, February1999.
[19] W3C WD-smil-0202, Synchronised Multimedia Integration Language, W3C Working Draft, February 1998 www.w3.org/TR/1998/WD-smil-0202.html
[20] Ares A.G.., Personal Communication to SC2MHEG-8.25 Discussion Group, February1999.