Information about http://research.microsoft.com/~xingx/SCOTOverview.pdf

Maximizing Information Throughput for Multimedia Browsing on Small…

Tags: 5f, approach results, beijing, content offline, hong jiang, information throughput, microsoft research, mobile web, multimedia content, novel methods, obstacle, population boom, sampling, semantic structure, sigma, spatial domain, theoretical framework, user experience, web pages, wyma,
Pages: 4
Language: english
Created: Mon Jan 1 00:00:00 1
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Maximizing Information Throughput for Multimedia Browsing on Small Displays

                              Xing Xie, Wei-Ying Ma, Hong-Jiang Zhang
                                        Microsoft Research Asia
                   5F Sigma Center, No. 49, Zhichun Road, Beijing, 100080, P.R. China
                                {xingx, wyma, hjzhang}@microsoft.com


                       Abstract                               ·    Trivial methods. For example, direct down-sampling
                                                                   of image or video in the spatial domain. This
   As a great many of new devices with diverse                     approach often decreases the user experience since the
capabilities are making a population boom, their limited           results may be unreadable or unacceptable.
display sizes become the major obstacle that has              · Authoring multiple versions. For example, building
undermined the usefulness of these devices for information         separate, dedicated mobile web sites for small devices.
access. In this paper, we introduce our recent research on         This approach results in burden on content
adapting multimedia content including images, videos and           management. Also, it is hard to predict what devices
web pages for browsing on small-form-factor devices. A             will emerge in the market and the solution could be
theoretical framework as well as a set of novel methods            transient.
for presenting and rendering multimedia under limited         · Re-authoring the content offline or on-the-fly. This
screen sizes is proposed to improve the user experience.           approach depends on the extraction of the original
The content modeling and processing are provided as                semantic structure of the content. Some success has
subscription-based web services on the Internet.                   been achieved in some areas but generally it is a hard
Experiments show that our approach is extensible and               problem because of the nature of reverse engineering.
able to achieve satisfactory results with high efficiency.    · New formats which are scalable by themselves. This
                                                                   is the most promising direction and has been adopted
1. Introduction                                                    in many areas, such as scalable image and video
                                                                   coding. However, the current research effort is less
In the PC+ era, a variety of new computing devices, such           focused on the problem of diverse and small displays,
as SPOT watch, Smartphone, Pocket PC, Tablet PC, etc,              and there is much space for improvement on
are making a population boom. These devices are                    multimedia browsing techniques.
becoming more and more powerful in both numerical                In this paper, we focus on latter two approaches since
computing and data storage. However, low bandwidth            they are more preferred by content authors or consumers.
connections and small displays are still the two serious      In fact, these two schemes are related to each other. The
obstacles that have undermined the usefulness of these        intermediate representation used in content re-authoring
devices in people's everyday life. With the rapid and         should be adaptive and flexible to the display size.
successful development of 2.5G and 3G wireless networks,      Therefore, it will be referential when standardizing a new
the bandwidth factor is expected to less constraint in the    scalable format.
near future. At the same time, however, the limitation on
display size is likely to remain unchanged for a certain      2. Related Work
period of time.
    Since most of the information on the Internet is          So far only a few efforts have addressed the problem of
presented by multimedia (web pages with embedded              browsing large web pages on small terminals and little has
images and audios can be considered a composite               been done for images or videos. In the following, we will
multimedia document), improving the experience of             give a brief introduction to the prior art based on the
information access and browsing on small displays is          media type that the content adaptation technique is
critical to unleash the power of these mobile devices.        designed for.
Existing research directions to address this problem can be      Typical web pages are designed for desktops with large
classified into the following four categories:                displays. When they are browsed on small devices, the
                                                              user experience is unacceptable. Current approaches for
adapting web pages can be divided into two categories:           Therefore, we need not to design multiple versions for the
the first one is to transform existing web pages such as [4],    same content. In addition, we should not restrict us to have
while the other attempts to introduce new formats and            exactly the same browsing experience as on desktop PCs.
mechanisms [1]. As we notice, few of current approaches          More advanced user interface technologies can be
considered the priorities of different parts in a page.          employed to improve the usability.
What's more, none of them let authors control the final             In summary, a scalable content model and a flexible
layout conveniently. That is to say, the final presentation is   rendering algorithm are two essential issues that we would
usually unpredictable during the designing phase.                like to address in our framework.
   Current digital cameras usually can take photos with
more than 2M pixels. These photos should be down-                3.1 A Content Model for Small Screen
sampled in order to be viewed on small devices like Smart        A piece of media content P usually consists of several
phones. However, people might hardly catch the                   information objects Bi. An information object is an
information, e.g., the human faces and texts in these down-      information carrier that delivers the author's intention and
sampled versions. Quite a few efforts have been put on           catches part of the user's attention as a whole. For
image adaptation including JPEG [5] and MPEG [9]                 example, it may be a human face, a flower or a text
standards. Proxy based image transcoding has also been           sentence.
studied for many years [8][13]. Most of them focused on             Since each information object has different importance
compressing and caching contents in order to reduce the          values, we introduce property IMP as a quantified value of
data transmission time. Hence, the results are often not         author's subjective evaluation on an information object. It
consistent with human perception because of excessive            is also an indicator of the weight of each object in
resolution reduction.                                            contribution to the whole information. This value is used
   Though more and more mobile devices are capable of            when choosing less important objects for summarization
playing videos, the limited bandwidth and small window           under small displays. The importance values in the same
sizes remain to be two critical obstacles. Currently, most       content should be normalized so that their sum is 1.
video adaptation efforts only focus on bandwidth                    As mentioned before, the information delivery of an
constraints. None of them has studied the impact of              object is significantly relying on its area of presentation. If
display resolution on the video browsing experience.             an information object is scaled down too much, it may not
                                                                 be perceptible enough to let users catch the information
3. The Theoretical Framework                                     that authors intend to deliver. Therefore, we introduce
                                                                 minimal perceptible size (MPS) to denote the minimal
The following two observations are important to the              allowable spatial area of an information object. They are
development of our framework for optimizing viewer's             used as thresholds to determine whether an information
browsing experience on small displays:                           object should be shrunken or summarized when rendering
   Information Asymmetry: Different parts of content have        the adapted view.
different importance values. Thus, there exists an optimal          As regards to those information objects of less
set of content blocks when a screen constraint is given.         importance, it is desirable to summarize them in order to
This observation has its root in psychology community. It        save display space for more important objects. Instead of
has become clear that not all but only a small part of           deleting contents or showing imperceptible adapted
incoming visual information can reach short-term human           version, we introduce alternative (ALT) as a substitute of
memory for further processing, i.e., the Attention as Filter     the original content. It should occupy less space than the
Metaphor [6]. Attentional selection allows only attention-       original information object.
getting parts be presented to the user without affecting            Our proposed content model for small screen
much user experience. For example, human faces in a              presentation is defined as below.
home photo are usually more important than the other             Definition 1: The basic content representation model for a
parts. Generally, most perceptible information can be            piece of media content P is defined as an unordered set of
located inside a handful of objects and at the same time         information objects:
these objects catch most attentions of a user. As a result,                         P = {Bi } 1  i  N                     (1)
the rendering of content can be treated as manipulating
objects to provide as much information as possible under         and
resource constraints.
   Flexible Rendering: The content layout should not be                             Bi = (IMPi , MPS i , ALTi )           (2)
fixed to a specific display size. In other words, the layout
                                                                 where
should be optimized for each specified screen size.
                                                                          Bi,       the ith information object in P
               IMPi,          importance value of Bi.
               MPSi,          minimal perceptible size of Bi                 4. Adapting Multimedia for Small Displays
               ALTi,          alternative of Bi.
                                                                             In this section, we will show how the content model can be
                        P           B1        IMP             0.3            applied to adapt different types of multimedia content for
                                              MPS             5000           small displays. The details of each piece of work can be
                                              ALT             "face"
                         B2                                                  found in [2][3][7][10].
                                                                                 For web pages, we have introduced an approach [3]
                               B3                                            similar to the fisheye view. The extensions to the original
                                                                             representation model are mainly twofold:
                                                                             · In order to let authors have controls on the final page
         Fig 1. An example of the content representation model.                    layout, we leverage binary slicing trees, a data
    The representation can be in a form of XML                                     structure widely used in computer aided design
descriptions and saved as metadata within original content.                        community, instead of an unordered set to organize
An example of the content representation model is shown                            the information blocks.
in Figure 1 where three information objects are contained                    · We add three additional properties to each
in the media content P.                                                            information object in order to characterize their
                                                                                   special display constraints.
3.2 Presentation Optimization                                                    For images, an attention model based adaptation and
We introduce Information Fidelity (IF) as an objective                       browsing scheme is developed in [2][10]. Besides that the
comparison of a modified version of media content with                       notion of attention object is just equivalent to information
the original version. The value of information fidelity is                   object, other differences are:
confined between 0 (lowest, all information lost) and 1                      · The image attention model adds a ROI property to
(highest, all information kept). It is defined as a sum of                         each information object. It is borrowed from JPEG
importance values of existing objects in the adapted                               2000 and is referred as a spatial region or segment
version. If an object is replaced by its alternative, its                          that corresponds to an information object.
importance value will not be included. Suppose P' is the                     · We suppose the alternative of an object in images to
set of existing information objects in the adapted version,                        be null since the information object will be cropped if
P' P = {B1, B2 ,...BN }
                                                                                   it can not be put on the display.
                           . Thus, the mission of rendering phase
is to find the set P' that carries the largest information                       Video adaptation is another natural application of our
fidelity while meets the display constraints.                                content model. In [7], we proposed a solution for browsing
   In order to ensure that all the information objects are                   amateur video clips such as home videos or surveillance
possible to be included in the final presentation, following                 videos. For this kind of video, it is possible to optimize the
space constraint should be satisfied.                                        contents for different resolution conditions. Previous
                   size ( ALT i ) + MPS i  Area                              results on image adaptation can be easily extended to
             Bi  P '                                                         video adaptation if we simply consider each video frame
                                                             (3)
                                                                             as an image. However, this naïve approach will cause
   where Area is the size of target area and size (x) is a
                                                                             jitters in the video sequences since the frames will be
function which returns the size of display area needed by
                                                                             discontinuous after cropping. To solve this problem,
ALTi. It says that the space occupied by the information
                                                                             virtual camera control is applied to improve the quality of
objects or their alternatives should be smaller than the
                                                                             output stream.
target display area.
   If the constraint (3) is transformed to
           (MPS i - size ( ALT i ))  Area - size ( ALT i )                   5. Content Services Networks
     Bi  P '                                 Bi  P
                                                             (4)
                                                                             We provide above content modeling and adaptation
   the rendering problem becomes:
                                                                             functions as subscription-based web services on the
    max            IMPi         subject t o                                  Internet. Previously, we have proposed a subscription
      P'
               Bi P '                                                        based system framework named content services networks
           (MPS i       - size ( ALT i ))  Area -           size ( ALT i )   (CSN) [11][12] which aim to make content delivery
     Bi  P '
                                                    (5)
                                                    Bi  P                    networks (CDN) capable of delivering content adaptation
   We can see that the problem (5) is equivalent to a                        services. In this paper, we will apply this system
traditional NP-complete problem, 0-1 knapsack. It can be                     framework to deploy the content adaptation services.
efficiently solved by a branch and bound algorithm.
    The overall system constitutes two layers of network
infrastructures: content delivery overlay (i.e. CDNs) and        7. Conclusions
service delivery overlay. The content delivery overlay is
constituted of a network of service-enabled web caches           In this paper, we introduce our work on adapting
which extend the functionalities of traditional web caches       multimedia to small-form-factor devices. A novel
for performing value-added processing. The service               framework as well as a set of approaches for presenting
delivery overlay consists of a large number of application       different types of multimedia under limited display size
servers which act as remote call-out servers for service-        has been proposed. We are currently developing a set of
enabled web caches. These two overlays work together to          authoring tools to assist the generation of different content
provide content-oriented web services.                           models. More user study experiments should be carried
    Before the content modeling and adaptation service           out to test the usability of our approaches and more
becomes available, it needs to be registered in the UDDI         advanced user interface technologies should be studied to
(Universal Description and Discovery Integration) registry       best utilize our content model.
first. The received components such as service
specifications and binaries from service providers are           8. References
stored in the service database. In order to use the service, a
mobile client needs to first find and subscribe to the           [1] Borning, A., Lin, R.K., and Marriott, K. Constraint-Based
service via UDDI registries. Then the service instructions       Document Layout for the Web. ACM Multimedia Systems
are generated and transferred from the management                Journal 8(3), 2000, 177-189.
servers to the service-enabled web caches that the               [2] Chen, L.Q., Xie, X., Fan, X., etc. A Visual Attention Model
subscriber is associated with. The service-enabled web           for Adapting Images on Small Displays. ACM Multimedia
cache determines if a message needs services according to        Systems Journal 9(4) 2003, 353-364.
                                                                 [3] Chen, L.Q., Xie X., Ma W.Y., etc. DRESS: A Slicing Tree
the service instructions. In our case, the instructions may
                                                                 Based Web Page Representation for Various Display Sizes.
simply be type comparison, i.e., whether the content is an       Poster Proc. WWW'03, Budapest, Hungary, May 2003.
image or a video.                                                [4] Chen, Y., Ma, W.Y., and Zhang, H.J. Detecting Web Page
                                                                 Structure for Adaptive Viewing on Small Form Factor Devices.
6. Experimental Results                                          Proc. WWW'03, Budapest, Hungary, May 2003.
                                                                 [5] Christopoulos, C., Skodras, A., and Ebrahimi, T. The
                                                                 JPEG2000 Still Image Coding System: An Overview. IEEE
We have developed a service-enabled web cache based on           Trans. on Consumer Electronics 46(4), 2000, 1103-1127.
Microsoft ISA Server 2000. A special web filter is               [6] Desimone, R. and Duncan, J. Neural Mechanisms of
implemented using ISAPI, which enables adaptation on             Selective Visual Attention. Annual Review of Neuroscience, vol.
HTTP messages containing HTML pages, images or                   18, 1995, 193-222.
videos. The processing is executed locally on the proxy          [7] Fan X., Xie X., Zhou H.Q., and Ma W.Y. Looking Into
which is a Windows XP system with P4 1.3 GHz CPU and             Video Frames on Small Displays. Poster Proc. ACM
256M memory.                                                     Multimedia'03, Berkeley, CA, USA, Nov. 2003.
   For web page adaptation, 16 web pages were collected          [8] Han, R., Bhagwat, P., Lamaire, R., etc. Dynamic Adaptation
                                                                 in an Image Transcoding Proxy for Mobile Web Access. IEEE
from several popular websites such as MSN, Yahoo! and
                                                                 Personal Communications 5(6), 1998, 8-17.
Google. The content model for each web page is manually          [9] ISO/IEC JTC1/SC29/WG11/N4242. ISO/IEC 15938-5 FDIS
created and the number of information objects in a web           Information Technology ­ Multimedia Content Description
page varies from 5 to 20. In the experiment, the average         Interface ­ Part 5: Multimedia Description Schemes. Sydney,
time cost for adapting the page is 18 microseconds with          Australia, July 2001.
variation from 2 to 56 microseconds.                             [10] Liu H., Xie X., Ma W.Y., and Zhang H.J. Automatic
   Currently, the time cost for automatic image modeling         Browsing of Large Pictures on Mobile Devices. Proc. ACM
process is a bit large. Averagely, it is 0.9 second for          Multimedia'03, Berkeley, CA, USA, Nov. 2003.
1200x800 images on our test bed. However, as mentioned           [11] Ma W.Y., Shen B., and Brassil J. Content Services Network:
                                                                 The Architecture and Protocols. Proc. WCW'01, Boston, USA,
before, the automatic modeling results can be saved with
                                                                 Jun. 2001.
the image files for reuse. Therefore, the content model is       [12] Ma W.Y., Xie X., Yuan C., etc. Enabling Multimedia
only computed once when the image is first acquired.             Adaptation Services in Content Delivery Networks. IMMCN'03,
   The performance of video adaptation can be improved           Cary, North Carolina, USA, Sep. 2003.
if we do not process every frame in the MPEG stream. In          [13] Mohan, R., Smith, J.R., and Li, C.S. Adapting Multimedia
our current implementation based on Microsoft                    Internet Content for Universal Access. IEEE Trans. on
DirectShow, we process one frame every three seconds. It         Multimedia 1(1), 1999, 104-114.
can run smoothly on the test bed without causing any jitter.