Specifications for a simple RTSP client

Abstract

This document describes a simple implementation of the client side of an RTSP session, including the minimal requirements to complete a successful dialogue with the IFI extension of the Darwin Streaming Server RTSP implementation. Here, emphasis is made on making parsing and error control as simple as possible, not necessarily following all the aspects of the RTSP standard (RFC 2326) requirements.

Connecting

The client is initially supplied with an RTSP URL, on the form:
rtsp://server.address:port/object.sdp
The URL is either supplied within a simple text file, the file name being supplied as a command line parameter when the client is started, or hopefully, the client is able to accept user input in a text box in the client's graphical user interface.

There are a number of things the client should check while parsing the URL:

What comes after the last slash ('/'), we can leave to the server parser (in our case, however, it would be nice to check whether the object ends with ".sdp", since this is required to activate our reflector capabilities in the RTSP server).

After parsing the URL, the client should have a valid server address (at least lexically), and possibly a port number. If no port number is supplied, the default "well known port" for RTSP is assumed, which is 554.

As soon as the URL is parsed, a TCP connection should be made to the obtained server address/port. The client should then transmit the DESCRIBE request as follows:

Client:

DESCRIBE rtsp://server.address:port/object.sdp RTSP/1.0\r\n
CSeq: 1\r\n
\r\n
Here, of course, '\r\n' is not to be taken literally, but as return and newline characters (as in printf in the C programming language). The describe request should include the RTSP URL exactly as supplied by the user/caller. The "CSeq" header field is used in sequence numbering, and not really necessary to operate the RTSP server (it is mandatory according to the standard, however). It is mainly practical when not operating with an open TCP connection during the entire session, for the client to make sure no messages are lost. The server will blindly echo the sequence number, including a "CSeq" header in the response messages.

The TCP connection can remain open for the duration of the session, but there is no reason why the connection should be maintained after the first SETUP request. It is necessary, however, to keep the connection alive while doing the DESCRIBE and the following SETUP requests, because of the the way in which the server is implemented.

Receiving SDP information

The server may respond to the DESCRIBE request in any number of ways (or it might not respond at all), but it is safe to assume that any other response than "RTSP/1.0 200 OK" in the first line might indicate an error situation (and a good excuse to give up on this particular session). A successful response to the DESCRIBE might look as follows:

Server:

RTSP/1.0 200 OK\n
Server: QTSS(IFI)/v88\n
Cseq: 1\n
Content-Type: application/sdp\n
Content-Base: rtsp://bildeus.ifi.uio.no:8000/12.sdp/\n
Content-length: 785\n
\n
n=2236805513 2236805513 932036356 224.2.127.254 9875 127 trusted\n
v=0\n
o=yozo 3138827440 3138828177 IN IP4 aohakobe.ipc.chiba-u.ac.jp\n
s=Places all over the world\n
i=Low bandwidth video (10kb/s) with views from all over the world. Audio is primarily for feedback for the senders of video. (looks like there's some problem for the session announcement from sweden distributes to the whole MBone, I'm repeating the announcement from japan. -- yozo.)\n
e=Yozo TODA at IPC, Chiba University, JAPAN <yozo@ipc.chiba-u.ac.jp.>\n
p=+81-43-290-3539\n
c=IN IP4 224.2.213.113/127\n
t=3138827400 3141246600\n
a=tool:sdr v2.6.1\n
a=type:test\n
m=audio 20154 RTP/AVP 0\n
c=IN IP4 224.2.213.113/127\n
a=ptime:40\n
a=control:trackID=1\n
m=video 51482 RTP/AVP 31\n
c=IN IP4 224.2.172.238/127\n
a=control:trackID=2\n
It's not necessary to parse the SDP information entirely. It will suffice to extract the lines containing "a=control:trackID=x" and keep hold of these control strings. It will also be necessary to extract the associated "m=..." lines kbdceding the control attributes. These indicate the nature of the media stream, and hence which tool is required to render it ("video" and "audio" indicating vic and rat, respectively).

What to do if there are other media descriptors than "video" or "audio" is not decided, but the best thing to do might be to ignore them altogether, telling the user that "unknown media types were encountered". If no known media type is found, the client must terminate gracefully.

Setting up

To set up, the "SETUP" request must be issued. It might look like this:

Client:


SETUP rtsp://server.address:port/object.sdp/trackID=1 RTSP/1.0\r\n
CSeq: 2\r\n
Transport: RTP/AVP;unicast;client_port=9000-9001\r\n
\r\n

This request must be issued for each of the control strings obtained above (leaving out the ones rekbdsenting media types we can't handle, of course). Note that the original URL is used, augmented by a slash ('/') and the control string. For the first one of these requests, the server (if all goes well) will respond with:

Server:


RTSP/1.0 200 OK\n
Server: QTSS(IFI)/v88\n
Cseq: 2\n
Session: 1234567890;timeout=60
Transport: rtp/avp;source=129.240.65.208;server_port=9000-9001;client_port=9000-9001\n
\n

All consecutive SETUP requests must include the session header field. It looks like this:

Client:


SETUP rtsp://server.address:port/object.sdp/trackID=2 RTSP/1.0\r\n
CSeq: 3\r\n
Session: 1234567890\r\n
Transport: RTP/AVP;unicast;client_port=9000-9001\r\n
\r\n

Even though in this example, the session identifier contains only numbers, the standard allows alpha characters as well. So making the client able to accept 2Ak432zDQr just as well as 1234567890 would be a good idea (case-sensitive, of course). The standard requires the identifier to be at least 8 octets long.

Again, anything other than "RTSP/1.0 200 OK" might indicate error, and should result in termination. In the responses, the information worth making a note of is the source attribute in the transport header field, and the client_port attribute. These will be needed to start the media tools later on.

Setting up a reflector session is only required if the client host is not capable of multicast on the MBone. However, using the returned SDP information to start the Meccano tools will require a much more elaborate parsing. For the sake of simplicity, we might assume that the client is not MBone capable.

Otherwise we might issue the SETUP request with "multicast" instead of "unicast" in the transport header field. The server can then respond with the appropriate multicast address/port numbers instead of setting up the reflector. This, however, remains to be implemented in the server. Other problems might arise as well, since the multicast sessions visible to the reflector/listener system might not be visible to the client. A more complete client would parse the SDP information, determine whether the multicast session is reachable, and initiate rendering on its own.

Start playing

To start getting the media packets, all we have to do is to send a "PLAY" request to the server. It looks like this:

Client:

PLAY rtsp://server.address:port/object.sdp RTSP/1.0\r\n
CSeq: 4\r\n
Session: 1234567890\r\n
\r\n
Note that the control string is omitted, hence the request applies to all streams currently set up. Again, other responses than "RTSP/1.0 200 OK" is handled as a fatal error.

Now is the time for starting up the media tools, using the source address and port number obtained above. Note that the media tools only require one of the port numbers acquired, for an example, if the transport header from the server includes "client_port=9000-9001", the media tool only needs port number 9000 in its command line parameters.

Pausing

This is handled the same way as playing. These two actions might be implemented using a single button in the client's graphical user interface, toggling between play and pause.

Client:

PAUSE rtsp://server.address:port/object.sdp RTSP/1.0\r\n
CSeq: 5\r\n
Session: 1234567890\r\n
\r\n

Tearing down

The request "TEARDOWN" is only issued at the very end of the session. This makes the server release all its resources pertaining to the session, and new SETUP requests will have to be sent in order to restart the media streams. It looks like this:

Client:

TEARDOWN rtsp://server.address:port/object.sdp RTSP/1.0\r\n
CSeq: 6\r\n
Session: 1234567890\r\n
\r\n
The server will reply:

Server:

RTSP/1.0 200 OK\n
Server: QTSS(IFI)/v88\n
Cseq: 6\n
Session: 537103309\n
Connection: Close\n
\n
When the reply to this message is received, the client can safely disconnect. The sequence number is really not required. However, the Session: header field is mandatory.

Important: The TEARDOWN must be issued before the client application is allowed to terminate, otherwise the reflector is liable to swamp the client host with unwanted udp packets. Another thing which might pose a problem, is that of client bandwidth. If reflector output surpasses available bandwitdh between itself and the client, TCP traffic to the RTSP server might be blocked. This situation is extremely undisirable, because the client internet connection in effect becomes unusable. The solution to this problem (at the moment), has been to implement a refresh timeout in the server. The client is required to send "dummy" requests within certain intervals. The most suited request for this, is OPTIONS. The timeout period (in seconds) is indicated by the SETUP reply message, in the session header field. To keep the session alive, just send along an OPTIONS request every now and then. It's done like this:

Client:

OPTIONS rtsp://server.address:port/object.sdp RTSP/1.0\r\n
CSeq: 6\r\n
Session: 1234567890\r\n
\r\n
Failure to pass these refresh "requests" would in effect mean the same as a TEARDOWN request.

Summary

The client may operate in three distinct states, disconnected, playing and paused. When the application is started, the state is of course "disconnected". After starting up, the client issues the requests DESCRIBE, SETUP and PLAY, and enters the "playing" state. The play button changes into a pause button, and the media rendering tools are started. During this state, the client sends the OPTIONS request regularly.

When the user kbdsses the recently appeared pause button, the PAUSE request is sent, and state is transferred into "paused". The pause button changes into a play button. When the user kbdsses the exit button (or exits the client application), TEARDOWN is sent, the TCP channel is broken, and the media tools are terminated.

An example of how the client dialog might look like, is included below.

Written by Guy Thomas Kværnberg, guyk@ifi.uio.no.