Information about http://project-iris.net/talks/msdtalk.pdf

Rateless Codes and Big Downloads Petar Maymounkov and David Mazi` res…

Tags: a3, a4, bandwidth utilization, composite message, department of computer science, digital fountain, fountain design, kademlia, lt, mazi, message blocks, motivation, nyu department, original message, protocols, reconciliation, source nodes, time block, waste bandwidth, x1,
Pages: 13
Language: english
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
Page 13
image
Rateless Codes and Big Downloads
 Petar Maymounkov and David Mazi` res
                                e
     http://kademlia.scs.cs.nyu.edu/
                    ­
    NYU Department of Computer Science
                       Motivation
· Downloading big files in p2p systems (e.g. movies)
· Problem truncated downloads
  - Transfer time of file > average uptime
                           >
  - Many more nodes with partial downloads than with complete
    file
  - Partial downloads tend to have overlapping information
  - Suboptimal reconciliation protocols waste bandwidth

· Objectives
  - Better bandwidth utilization = low overhead when reconciling
  - High file availability (when source nodes leave network)

· Key idea: Rateless codes
           Rateless codes

            File n blocks
Encoding                    Infinite encoding



                Lossy channel


                      Received (1 + )n blocks

Decoding
            Recovered n blocks w.h.p.
              Efficient rateless codes
· Public:
   - LT codes [Luby]
   - Online codes [Maymounkov]

                                 Online         LT
         Encoding time/block      O(1)       O(log n)
                                                  
            Blocks to decode     (1 + )n   n + O( n)
               Decoding           O(n)      O(n log n)



· Proprietary
   - Raptor codes [Shokrollahi, Digital Fountain]
           Design of on-line codes

            Original message blocks (n)
  OUTER
             Auxiliary blocks (0.01n)

                   Composite Message
  INNER

                                            Check blocks

             Lossy channel
                           Received check blocks (1 + )n
INNER -1
                Partially-recovered composite
                message blocks
OUTER -1

             Received check blocks
                 Auxiliary blocks

   Auxiliary blocks

   a1 a2 a3 a4        a1   = x1  x 2  x 3  x 4
                      a2   = x1  x 3  x 4  x 5
                      a3   = x2  x 5
                      a4   = x1  x 2  x 3  x 4  x 5
   x1 x2 x3 x4 x5

  Original message
       blocks

· Each message block is reflected in 3 random
  auxiliary blocks
                     Check blocks
   c1          c4          c7

                                      c1 = x 1  x 4
  x1 x2 x3 x4 x5 a 1 a 2              c4 = x 2  x 3  a 1
                                      c7 = x 5
        Composite Message

· Each check block generated independently
· To generate check block with ID i:
   - Seed pseudo-random generator with i
   - Choose deg(ci ) from a special distribution
   - Set ci to XOR of deg(ci ) random composite message blocks
                    Decoding algorithm
     c1 c2 c3                  c2 c3                      c3


                    a1                  a1                     a1
   1.                     2.                    3.

     x1 x2 x3              x1 x2 x3              x1 x2 x3

                    a1
   4.                    5.

     x1 x2 x3              x1 x2 x3
· Repeat until entire original message recovered:
  1. Find a check (auxiliary) block, s.t. all incorporated blocks are
     known, except for one
  2. Solve for it
                              Main idea
· Every transmitted block ID from source nodes is unique
   - Sufficient information to recover the file accumulates quickly in the
     network
   - High file availability

· Exploit large check block ID space
   - Observation: Nodes download many blocks from each other before
     aborting connections
   - Transmit data only in the form of check block streams
   - Each stream concisely described by its ID s:

                      hs (0) hs (1) hs (2) hs (3) hs (4) hs (5)
        Download state information
· Table of {stream, last pos} pairs
                                 last pos(s1)=5
          s1
                           last pos(s2)=3
          s2
                                            last pos(s3)=8
          s3
     Downloading from a source node
           Node v                  Source node s
       v              {v, last pos(v)=3}
       x
       y
                                           ···


· Source nodes can generate blocks from any stream
Downloading from a partial-knowledge
               node
    Node v                   Node w
v                        v
x                        x
y                        y
                         w
    {v, last pos(v)=5}
    {x, last pos(x)=3}
    {y, last pos(y)=6}
                            Conclusion
· Higher availability
   - Only way for a knowledge overlap is, if blocks with same IDs
     earlier came from the same non-source node
   - Unavoidable! Optimal?

· Simple reconciliation
   - Message cost = state table size
      # of pairs in table    # of streams within life-cycle of a download
                             # of truncated downloads within life-cycle

   - Number can be bound