Kalleitner, Franz | 3 Apr 2006 12:27
Picon

AW: Layered codecs: use of PT to distinguish layers

hi, stefan, all. 

The mapping of SVC layers to different RTP sessions opens the door for simple scaling of SVC streams.

Please note: The term "RTP translator", is used instead of "MANE". 
In principle it should not matter, which term is used for an initial discussion. I hope so :)

The proposal might be of special interest for:

demanding less system performance for RTP translators. 
A fairly, light-weighted solution for an environment with restricted computing performance 
to run a full RTP-Translator unit.

For SVC streams that traverses slow bit-rate links, with or without varying 
throughput condition. Moreover, this would facilitate to implement a RTP-Translator 
function much easier.  In case of sensing congestion, packets could be simply
discarded or  truncated in order to adjust the packet steam to the current network  
conditions, throughput respectively  - More or less unaware to the encoded media.

However, according to my understanding following restriction would appear:

Due to the proposed layer assignment the support of interleaving might be restricted. 
That is, only a single SVC packet (NALU) can be transported with one RTP packet per RTP session. 
Except, we think about combinations of spatio-temporal or temporary-SNR or spartio-SNR or any 
other combination (access units). 
However, this would require an enhanced negotiation effort between sender and receiver.
(signaling effort: out-of-band (SDP), SEI messages, others, ... )

Reduced SVC coding efficiency because interlayer-prediction need to be switched off.
In general, the extended NALU proposal does not provide direct information if a SVC packet 
is used for interlayer prediction. According to the definition of DID, it must be assumed 
that at any temporal location, a picture of a smaller dependency_id value may be used for 
inter-layer prediction. 
For example, if one particular, RTP stream will be discarded, all higher RTP streams, that belong 
to the same GoP, need to be discarded as well, because interlayer prediction "MAY" occur; 
even no prediction is done at all. In fact, if reference frame(s) will be discarded arbitrarily,
it would cause annoying artifacts at the decoder out.  

Please note, for the latter it was assume that the hierarchical layered representation of the 
SVC stream, NALU respectively are mapped to RTP streams with increasing PT value. 

Slightly better to handle multicast and even broadcast configurations, since the receiving client 
need not to register to each multicast-group for each layer it requests to receive. 
Rather reduced signaling overhead, due to the missing IGMP (multicast only).

Another question to answer would be the maximum number of supported layers for a single SVC stream. 
Bearing in mind, the number of dynamic payload format assignment is limited from 96 to 127. 
Hence, at most 32 layers could be assigned to a SVC stream that is,
i.e., 4 spatial layers + 4 temporary layers + 2 SNR layers ( = 32 layers)
Pointing to the example above: It must be assumed that four temporary layers might be too less for 
high quality SVC video.

The proposal requires negotiating PT-values between sender and receiver.
Does it allow to map layers arbitrarily to any PT- value? According to a rule?
i.e., that force to map all present spatial layers to the PT-values first or the base layer, followed 
by a number of temporary layers. Where to place the SNR-layers?

Furthermore, FGS packets could be truncated anywhere. 
Truncateability need to be visible to RTP session that transmits FGS packets.

By the way: In case of packet truncation it would be necessary to update the parameters below RTP. 
i.e. at network layer update of length information and transport layer, update of CRC checksum. 
Furthermore SRTP will not be able to handle truncated SVC packets.

Well, FGS packets are allowed to be discarded too, thus truncation could be address just as an 
additional option at a narrowed functional scope - i.e. without supporting secure RTP.

Well, this e-mail addresses just some thoughts, while writing this text. However, a look forward
for a more detailed discussion.

cheers, franz 

-----Ursprüngliche Nachricht-----
Von: Stephan Wenger [mailto:stewe <at> stewe.org] 
Gesendet: Montag, 03. April 2006 09:55
An: avt <at> ietf.org
Cc: Magnus Westerlund; Colin Perkins
Betreff: [AVT] Layered codecs: use of PT to distinguish layers

Folks,
I want to feel the temperature with you about the use of the RTP  
payload type field to distinguish layers of a layered codec.  That  
is, in the same RTP session (IP/Port/SSRC), several layers would be  
sent as independent RTP streams, distinguished by the PT.  The main  
rationale is minimization of firewall pinholes.  The advantage of  
this solution, over putting the layer id into the payload header, is  
that the RTP header is not encrypted when using SRTP, which allows  
meaningful layer discarding by middleboxes.
It is understood that this verges on the border of "RTP payload  
multiplexing", which is not p.c. in AVT; however, I got encouraged by  
Colin's recent draft in DCCP, which proposes something similar for  
RTP and corresponding RTCP traffic.  So let me boldly enter this  
minefield again.
I would appreciate quick comments (over the next two days or so), as  
an endorsement of the idea by this WG could perhaps influence the  
design choices being made here in the committee which is  
standardizing the SVC layered video codec.  Thanks very much for your  
reaction.
Regards,
Stephan

_______________________________________________
Audio/Video Transport Working Group
avt <at> ietf.org
https://www1.ietf.org/mailman/listinfo/avt

_______________________________________________
Audio/Video Transport Working Group
avt <at> ietf.org
https://www1.ietf.org/mailman/listinfo/avt


Gmane