3 Apr 2006 12:27
AW: Layered codecs: use of PT to distinguish layers
Kalleitner, Franz <franz.kalleitner <at> siemens.com>
2006-04-03 10:27:30 GMT
2006-04-03 10:27:30 GMT
hi, stefan, all. The mapping of SVC layers to different RTP sessions opens the door for simple scaling of SVC streams. Please note: The term "RTP translator", is used instead of "MANE". In principle it should not matter, which term is used for an initial discussion. I hope so :) The proposal might be of special interest for: demanding less system performance for RTP translators. A fairly, light-weighted solution for an environment with restricted computing performance to run a full RTP-Translator unit. For SVC streams that traverses slow bit-rate links, with or without varying throughput condition. Moreover, this would facilitate to implement a RTP-Translator function much easier. In case of sensing congestion, packets could be simply discarded or truncated in order to adjust the packet steam to the current network conditions, throughput respectively - More or less unaware to the encoded media. However, according to my understanding following restriction would appear: Due to the proposed layer assignment the support of interleaving might be restricted. That is, only a single SVC packet (NALU) can be transported with one RTP packet per RTP session. Except, we think about combinations of spatio-temporal or temporary-SNR or spartio-SNR or any other combination (access units). However, this would require an enhanced negotiation effort between sender and receiver. (signaling effort: out-of-band (SDP), SEI messages, others, ... ) Reduced SVC coding efficiency because interlayer-prediction need to be switched off. In general, the extended NALU proposal does not provide direct information if a SVC packet is used for interlayer prediction. According to the definition of DID, it must be assumed that at any temporal location, a picture of a smaller dependency_id value may be used for inter-layer prediction. For example, if one particular, RTP stream will be discarded, all higher RTP streams, that belong to the same GoP, need to be discarded as well, because interlayer prediction "MAY" occur; even no prediction is done at all. In fact, if reference frame(s) will be discarded arbitrarily, it would cause annoying artifacts at the decoder out. Please note, for the latter it was assume that the hierarchical layered representation of the SVC stream, NALU respectively are mapped to RTP streams with increasing PT value. Slightly better to handle multicast and even broadcast configurations, since the receiving client need not to register to each multicast-group for each layer it requests to receive. Rather reduced signaling overhead, due to the missing IGMP (multicast only). Another question to answer would be the maximum number of supported layers for a single SVC stream. Bearing in mind, the number of dynamic payload format assignment is limited from 96 to 127. Hence, at most 32 layers could be assigned to a SVC stream that is, i.e., 4 spatial layers + 4 temporary layers + 2 SNR layers ( = 32 layers) Pointing to the example above: It must be assumed that four temporary layers might be too less for high quality SVC video. The proposal requires negotiating PT-values between sender and receiver. Does it allow to map layers arbitrarily to any PT- value? According to a rule? i.e., that force to map all present spatial layers to the PT-values first or the base layer, followed by a number of temporary layers. Where to place the SNR-layers? Furthermore, FGS packets could be truncated anywhere. Truncateability need to be visible to RTP session that transmits FGS packets. By the way: In case of packet truncation it would be necessary to update the parameters below RTP. i.e. at network layer update of length information and transport layer, update of CRC checksum. Furthermore SRTP will not be able to handle truncated SVC packets. Well, FGS packets are allowed to be discarded too, thus truncation could be address just as an additional option at a narrowed functional scope - i.e. without supporting secure RTP. Well, this e-mail addresses just some thoughts, while writing this text. However, a look forward for a more detailed discussion. cheers, franz -----Ursprüngliche Nachricht----- Von: Stephan Wenger [mailto:stewe <at> stewe.org] Gesendet: Montag, 03. April 2006 09:55 An: avt <at> ietf.org Cc: Magnus Westerlund; Colin Perkins Betreff: [AVT] Layered codecs: use of PT to distinguish layers Folks, I want to feel the temperature with you about the use of the RTP payload type field to distinguish layers of a layered codec. That is, in the same RTP session (IP/Port/SSRC), several layers would be sent as independent RTP streams, distinguished by the PT. The main rationale is minimization of firewall pinholes. The advantage of this solution, over putting the layer id into the payload header, is that the RTP header is not encrypted when using SRTP, which allows meaningful layer discarding by middleboxes. It is understood that this verges on the border of "RTP payload multiplexing", which is not p.c. in AVT; however, I got encouraged by Colin's recent draft in DCCP, which proposes something similar for RTP and corresponding RTCP traffic. So let me boldly enter this minefield again. I would appreciate quick comments (over the next two days or so), as an endorsement of the idea by this WG could perhaps influence the design choices being made here in the committee which is standardizing the SVC layered video codec. Thanks very much for your reaction. Regards, Stephan _______________________________________________ Audio/Video Transport Working Group avt <at> ietf.org https://www1.ietf.org/mailman/listinfo/avt _______________________________________________ Audio/Video Transport Working Group avt <at> ietf.org https://www1.ietf.org/mailman/listinfo/avt
RSS Feed