Tags: 5f, approach results, beijing, content offline, hong jiang, information throughput, microsoft research, mobile web, multimedia content, novel methods, obstacle, population boom, sampling, semantic structure, sigma, spatial domain, theoretical framework, user experience, web pages, wyma,
Maximizing Information Throughput for Multimedia Browsing on Small Displays
Xing Xie, Wei-Ying Ma, Hong-Jiang Zhang
Microsoft Research Asia
5F Sigma Center, No. 49, Zhichun Road, Beijing, 100080, P.R. China
{xingx, wyma, hjzhang}@microsoft.com
Abstract · Trivial methods. For example, direct down-sampling
of image or video in the spatial domain. This
As a great many of new devices with diverse approach often decreases the user experience since the
capabilities are making a population boom, their limited results may be unreadable or unacceptable.
display sizes become the major obstacle that has · Authoring multiple versions. For example, building
undermined the usefulness of these devices for information separate, dedicated mobile web sites for small devices.
access. In this paper, we introduce our recent research on This approach results in burden on content
adapting multimedia content including images, videos and management. Also, it is hard to predict what devices
web pages for browsing on small-form-factor devices. A will emerge in the market and the solution could be
theoretical framework as well as a set of novel methods transient.
for presenting and rendering multimedia under limited · Re-authoring the content offline or on-the-fly. This
screen sizes is proposed to improve the user experience. approach depends on the extraction of the original
The content modeling and processing are provided as semantic structure of the content. Some success has
subscription-based web services on the Internet. been achieved in some areas but generally it is a hard
Experiments show that our approach is extensible and problem because of the nature of reverse engineering.
able to achieve satisfactory results with high efficiency. · New formats which are scalable by themselves. This
is the most promising direction and has been adopted
1. Introduction in many areas, such as scalable image and video
coding. However, the current research effort is less
In the PC+ era, a variety of new computing devices, such focused on the problem of diverse and small displays,
as SPOT watch, Smartphone, Pocket PC, Tablet PC, etc, and there is much space for improvement on
are making a population boom. These devices are multimedia browsing techniques.
becoming more and more powerful in both numerical In this paper, we focus on latter two approaches since
computing and data storage. However, low bandwidth they are more preferred by content authors or consumers.
connections and small displays are still the two serious In fact, these two schemes are related to each other. The
obstacles that have undermined the usefulness of these intermediate representation used in content re-authoring
devices in people's everyday life. With the rapid and should be adaptive and flexible to the display size.
successful development of 2.5G and 3G wireless networks, Therefore, it will be referential when standardizing a new
the bandwidth factor is expected to less constraint in the scalable format.
near future. At the same time, however, the limitation on
display size is likely to remain unchanged for a certain 2. Related Work
period of time.
Since most of the information on the Internet is So far only a few efforts have addressed the problem of
presented by multimedia (web pages with embedded browsing large web pages on small terminals and little has
images and audios can be considered a composite been done for images or videos. In the following, we will
multimedia document), improving the experience of give a brief introduction to the prior art based on the
information access and browsing on small displays is media type that the content adaptation technique is
critical to unleash the power of these mobile devices. designed for.
Existing research directions to address this problem can be Typical web pages are designed for desktops with large
classified into the following four categories: displays. When they are browsed on small devices, the
user experience is unacceptable. Current approaches for
adapting web pages can be divided into two categories: Therefore, we need not to design multiple versions for the
the first one is to transform existing web pages such as [4], same content. In addition, we should not restrict us to have
while the other attempts to introduce new formats and exactly the same browsing experience as on desktop PCs.
mechanisms [1]. As we notice, few of current approaches More advanced user interface technologies can be
considered the priorities of different parts in a page. employed to improve the usability.
What's more, none of them let authors control the final In summary, a scalable content model and a flexible
layout conveniently. That is to say, the final presentation is rendering algorithm are two essential issues that we would
usually unpredictable during the designing phase. like to address in our framework.
Current digital cameras usually can take photos with
more than 2M pixels. These photos should be down- 3.1 A Content Model for Small Screen
sampled in order to be viewed on small devices like Smart A piece of media content P usually consists of several
phones. However, people might hardly catch the information objects Bi. An information object is an
information, e.g., the human faces and texts in these down- information carrier that delivers the author's intention and
sampled versions. Quite a few efforts have been put on catches part of the user's attention as a whole. For
image adaptation including JPEG [5] and MPEG [9] example, it may be a human face, a flower or a text
standards. Proxy based image transcoding has also been sentence.
studied for many years [8][13]. Most of them focused on Since each information object has different importance
compressing and caching contents in order to reduce the values, we introduce property IMP as a quantified value of
data transmission time. Hence, the results are often not author's subjective evaluation on an information object. It
consistent with human perception because of excessive is also an indicator of the weight of each object in
resolution reduction. contribution to the whole information. This value is used
Though more and more mobile devices are capable of when choosing less important objects for summarization
playing videos, the limited bandwidth and small window under small displays. The importance values in the same
sizes remain to be two critical obstacles. Currently, most content should be normalized so that their sum is 1.
video adaptation efforts only focus on bandwidth As mentioned before, the information delivery of an
constraints. None of them has studied the impact of object is significantly relying on its area of presentation. If
display resolution on the video browsing experience. an information object is scaled down too much, it may not
be perceptible enough to let users catch the information
3. The Theoretical Framework that authors intend to deliver. Therefore, we introduce
minimal perceptible size (MPS) to denote the minimal
The following two observations are important to the allowable spatial area of an information object. They are
development of our framework for optimizing viewer's used as thresholds to determine whether an information
browsing experience on small displays: object should be shrunken or summarized when rendering
Information Asymmetry: Different parts of content have the adapted view.
different importance values. Thus, there exists an optimal As regards to those information objects of less
set of content blocks when a screen constraint is given. importance, it is desirable to summarize them in order to
This observation has its root in psychology community. It save display space for more important objects. Instead of
has become clear that not all but only a small part of deleting contents or showing imperceptible adapted
incoming visual information can reach short-term human version, we introduce alternative (ALT) as a substitute of
memory for further processing, i.e., the Attention as Filter the original content. It should occupy less space than the
Metaphor [6]. Attentional selection allows only attention- original information object.
getting parts be presented to the user without affecting Our proposed content model for small screen
much user experience. For example, human faces in a presentation is defined as below.
home photo are usually more important than the other Definition 1: The basic content representation model for a
parts. Generally, most perceptible information can be piece of media content P is defined as an unordered set of
located inside a handful of objects and at the same time information objects:
these objects catch most attentions of a user. As a result, P = {Bi } 1 i N (1)
the rendering of content can be treated as manipulating
objects to provide as much information as possible under and
resource constraints.
Flexible Rendering: The content layout should not be Bi = (IMPi , MPS i , ALTi ) (2)
fixed to a specific display size. In other words, the layout
where
should be optimized for each specified screen size.
Bi, the ith information object in P
IMPi, importance value of Bi.
MPSi, minimal perceptible size of Bi 4. Adapting Multimedia for Small Displays
ALTi, alternative of Bi.
In this section, we will show how the content model can be
P B1 IMP 0.3 applied to adapt different types of multimedia content for
MPS 5000 small displays. The details of each piece of work can be
ALT "face"
B2 found in [2][3][7][10].
For web pages, we have introduced an approach [3]
B3 similar to the fisheye view. The extensions to the original
representation model are mainly twofold:
· In order to let authors have controls on the final page
Fig 1. An example of the content representation model. layout, we leverage binary slicing trees, a data
The representation can be in a form of XML structure widely used in computer aided design
descriptions and saved as metadata within original content. community, instead of an unordered set to organize
An example of the content representation model is shown the information blocks.
in Figure 1 where three information objects are contained · We add three additional properties to each
in the media content P. information object in order to characterize their
special display constraints.
3.2 Presentation Optimization For images, an attention model based adaptation and
We introduce Information Fidelity (IF) as an objective browsing scheme is developed in [2][10]. Besides that the
comparison of a modified version of media content with notion of attention object is just equivalent to information
the original version. The value of information fidelity is object, other differences are:
confined between 0 (lowest, all information lost) and 1 · The image attention model adds a ROI property to
(highest, all information kept). It is defined as a sum of each information object. It is borrowed from JPEG
importance values of existing objects in the adapted 2000 and is referred as a spatial region or segment
version. If an object is replaced by its alternative, its that corresponds to an information object.
importance value will not be included. Suppose P' is the · We suppose the alternative of an object in images to
set of existing information objects in the adapted version, be null since the information object will be cropped if
P' P = {B1, B2 ,...BN }
it can not be put on the display.
. Thus, the mission of rendering phase
is to find the set P' that carries the largest information Video adaptation is another natural application of our
fidelity while meets the display constraints. content model. In [7], we proposed a solution for browsing
In order to ensure that all the information objects are amateur video clips such as home videos or surveillance
possible to be included in the final presentation, following videos. For this kind of video, it is possible to optimize the
space constraint should be satisfied. contents for different resolution conditions. Previous
size ( ALT i ) + MPS i Area results on image adaptation can be easily extended to
Bi P ' video adaptation if we simply consider each video frame
(3)
as an image. However, this naïve approach will cause
where Area is the size of target area and size (x) is a
jitters in the video sequences since the frames will be
function which returns the size of display area needed by
discontinuous after cropping. To solve this problem,
ALTi. It says that the space occupied by the information
virtual camera control is applied to improve the quality of
objects or their alternatives should be smaller than the
output stream.
target display area.
If the constraint (3) is transformed to
(MPS i - size ( ALT i )) Area - size ( ALT i ) 5. Content Services Networks
Bi P ' Bi P
(4)
We provide above content modeling and adaptation
the rendering problem becomes:
functions as subscription-based web services on the
max IMPi subject t o Internet. Previously, we have proposed a subscription
P'
Bi P ' based system framework named content services networks
(MPS i - size ( ALT i )) Area - size ( ALT i ) (CSN) [11][12] which aim to make content delivery
Bi P '
(5)
Bi P networks (CDN) capable of delivering content adaptation
We can see that the problem (5) is equivalent to a services. In this paper, we will apply this system
traditional NP-complete problem, 0-1 knapsack. It can be framework to deploy the content adaptation services.
efficiently solved by a branch and bound algorithm.
The overall system constitutes two layers of network
infrastructures: content delivery overlay (i.e. CDNs) and 7. Conclusions
service delivery overlay. The content delivery overlay is
constituted of a network of service-enabled web caches In this paper, we introduce our work on adapting
which extend the functionalities of traditional web caches multimedia to small-form-factor devices. A novel
for performing value-added processing. The service framework as well as a set of approaches for presenting
delivery overlay consists of a large number of application different types of multimedia under limited display size
servers which act as remote call-out servers for service- has been proposed. We are currently developing a set of
enabled web caches. These two overlays work together to authoring tools to assist the generation of different content
provide content-oriented web services. models. More user study experiments should be carried
Before the content modeling and adaptation service out to test the usability of our approaches and more
becomes available, it needs to be registered in the UDDI advanced user interface technologies should be studied to
(Universal Description and Discovery Integration) registry best utilize our content model.
first. The received components such as service
specifications and binaries from service providers are 8. References
stored in the service database. In order to use the service, a
mobile client needs to first find and subscribe to the [1] Borning, A., Lin, R.K., and Marriott, K. Constraint-Based
service via UDDI registries. Then the service instructions Document Layout for the Web. ACM Multimedia Systems
are generated and transferred from the management Journal 8(3), 2000, 177-189.
servers to the service-enabled web caches that the [2] Chen, L.Q., Xie, X., Fan, X., etc. A Visual Attention Model
subscriber is associated with. The service-enabled web for Adapting Images on Small Displays. ACM Multimedia
cache determines if a message needs services according to Systems Journal 9(4) 2003, 353-364.
[3] Chen, L.Q., Xie X., Ma W.Y., etc. DRESS: A Slicing Tree
the service instructions. In our case, the instructions may
Based Web Page Representation for Various Display Sizes.
simply be type comparison, i.e., whether the content is an Poster Proc. WWW'03, Budapest, Hungary, May 2003.
image or a video. [4] Chen, Y., Ma, W.Y., and Zhang, H.J. Detecting Web Page
Structure for Adaptive Viewing on Small Form Factor Devices.
6. Experimental Results Proc. WWW'03, Budapest, Hungary, May 2003.
[5] Christopoulos, C., Skodras, A., and Ebrahimi, T. The
JPEG2000 Still Image Coding System: An Overview. IEEE
We have developed a service-enabled web cache based on Trans. on Consumer Electronics 46(4), 2000, 1103-1127.
Microsoft ISA Server 2000. A special web filter is [6] Desimone, R. and Duncan, J. Neural Mechanisms of
implemented using ISAPI, which enables adaptation on Selective Visual Attention. Annual Review of Neuroscience, vol.
HTTP messages containing HTML pages, images or 18, 1995, 193-222.
videos. The processing is executed locally on the proxy [7] Fan X., Xie X., Zhou H.Q., and Ma W.Y. Looking Into
which is a Windows XP system with P4 1.3 GHz CPU and Video Frames on Small Displays. Poster Proc. ACM
256M memory. Multimedia'03, Berkeley, CA, USA, Nov. 2003.
For web page adaptation, 16 web pages were collected [8] Han, R., Bhagwat, P., Lamaire, R., etc. Dynamic Adaptation
in an Image Transcoding Proxy for Mobile Web Access. IEEE
from several popular websites such as MSN, Yahoo! and
Personal Communications 5(6), 1998, 8-17.
Google. The content model for each web page is manually [9] ISO/IEC JTC1/SC29/WG11/N4242. ISO/IEC 15938-5 FDIS
created and the number of information objects in a web Information Technology Multimedia Content Description
page varies from 5 to 20. In the experiment, the average Interface Part 5: Multimedia Description Schemes. Sydney,
time cost for adapting the page is 18 microseconds with Australia, July 2001.
variation from 2 to 56 microseconds. [10] Liu H., Xie X., Ma W.Y., and Zhang H.J. Automatic
Currently, the time cost for automatic image modeling Browsing of Large Pictures on Mobile Devices. Proc. ACM
process is a bit large. Averagely, it is 0.9 second for Multimedia'03, Berkeley, CA, USA, Nov. 2003.
1200x800 images on our test bed. However, as mentioned [11] Ma W.Y., Shen B., and Brassil J. Content Services Network:
The Architecture and Protocols. Proc. WCW'01, Boston, USA,
before, the automatic modeling results can be saved with
Jun. 2001.
the image files for reuse. Therefore, the content model is [12] Ma W.Y., Xie X., Yuan C., etc. Enabling Multimedia
only computed once when the image is first acquired. Adaptation Services in Content Delivery Networks. IMMCN'03,
The performance of video adaptation can be improved Cary, North Carolina, USA, Sep. 2003.
if we do not process every frame in the MPEG stream. In [13] Mohan, R., Smith, J.R., and Li, C.S. Adapting Multimedia
our current implementation based on Microsoft Internet Content for Universal Access. IEEE Trans. on
DirectShow, we process one frame every three seconds. It Multimedia 1(1), 1999, 104-114.
can run smoothly on the test bed without causing any jitter.