AVB Network Latency

All devices in an AVB network share the same time. This allows the sending device (talker) to specify the precise point of time when its audio samples should be played out at the receiver side (listener). This is achieved by adding an offset to the current time and sending the resulting timestamp with each frame transmitted. The timestamp is called "presentation time" and has nanosecond precision. For comparison, a single sample at 48 kHz has a duration of over 20800 ns.

The receiver compares the incoming presentation time of each sample to the current time and buffers the sample until the presentation time is has come.

The offset (maximum transit time) is specified by the AVB standard as 2 ms for class A traffic, which is enough time for the signal to pass through a very large network under full load with up to seven 100 MBit/s switches along the way. By default, most AVB products will use this offset. In smaller networks with less hops or 1 GBit/s link speed, the offset can be adjusted to lower values, such as 0.3 ms, 0.6 ms or 1 ms. In the event that the chosen offset is too low, the audio stream may experience drop-outs or distortion.

If the digital outputs of networked devices should to be phase aligned, is necessary to choose a value that is a multiple of a sample length (1 second divided by sampling rate). For example, given a requirement of 1ms latency with a sampling rate of 44100 Hz: calculate the number of samples for the given latency (in this example, 44.1 samples), and then multiply the rounded value (44) with the sample length (1s/44100). Rounded to the next 100ns, the result is 997700ns.

Table 1. Recommended presentation time offsets ≤ 2ms, in nanoseconds

Rate

2ms

1ms

0.6ms

0.3ms

44100

1995500

997700

634900

317500

48000

2000000

1000000

625000

312500

88200

1995500

997700

623600

317500

96000

2000000

1000000

625000

312500

176400

2001100

997700

623600

311800

192000

2000000

1000000

625000

312500

The RME Digiface AVB shows the remaining offset ("input delay") for the first incoming stream, which is useful to verify that a shorter setting for the existing network can be used without risking dropouts.

The 12Mic offers a freely adjustable presentation time offset for each output stream. Using the web interface, this can be adjusted in samples to ensure phase alignment at different listeners across the network.

In AVB networks, the latency is always specified by the talker and guaranteed by the listener. This behavior is plug and play and does not require any user interaction or monitoring.