Introduction to WebRTC?

Web Real-Time Communication (WebRTC) is an open-source project that enables real-time audio, video, and data exchange directly between devices, typically using peer-to-peer connections. It works natively in all major web browsers via simple APIs and is also available on iOS and Android apps through libraries.

What makes WebRTC stand out

It handles many of the hard problems of real-time communication for you. It includes built-in support for adaptive bitrate, congestion control, echo cancellation, automatic gain control, noise suppression, and network traversal using ICE, STUN, and TURN. This means that we can build high quality video, voice, and even file-sharing features without reinventing the wheel.

Because it's backed by companies like Google, Apple, Microsoft, and Mozilla, WebRTC is widely supported and actively maintained. Applications such as Google Meet, Discord, and Facebook Messenger use WebRTC though heavily modified for their specific needs.

WebRTC's Core Concepts

WebRTC provides easy-to-use APIs, but under the hood, it handles a complex set of protocols and mechanisms. Understanding its core concepts is key to building reliable applications, debugging issues effectively, and making the most of the code examples that follow.

Session Description Protocol (SDP)

A standard format used in networking. Used in the context of WebRTC for describing various session-related details. Essentially it's just a string of your connectivity information.

Contents of SDP

SDP contains information about how two peers should communicate. This includes things like supported audio and video codecs, encryption details, IP addresses, ports, media types, and ICE candidates. In simple terms, it tells the other peer what your device is capable of and how it can be reached.

SDP in WebRTC

Establishing a WebRTC connection required a SDP to be generated at each end (by the respective users) and then exchange this SDP between the two parties, as it is the piece of information allowing them to discover each other. However, passing this SDP between parties is not handled in WebRTC, so you will need a signaling server externally.

ICE (Interactive Connectivity Establishment)

A protocol used in network communications to find all possible ways (known as ICE candidates) for establishing connectivity between peers, especially in environments behind a NAT.

Key Functions of Ice

ICE is responsible for discovering and testing all possible network paths between two peers. It tries local network addresses, public addresses discovered through STUN, and relay addresses provided by TURN servers. ICE will continuously test these candidates and pick the best possible route for the connection.

Gathering ICE Candidates

ICE collects different types of candidates representing the possible ways a device can be reached. These include.

Candidate Trickle

The process of gathering ICE candidates is known as *"trickling."* This can take time, as it involves discovering and compiling a list of all possible local, reflexive, and relayed addresses for the device. So trickling send them as they are discovered of waiting for all. This both gives us faster connection speed and gather new ones after connection is established.

Sharing via SDP

Once all ICE candidates are collected, they are sent to the remote peer using the SDP.

STUN (Session Traversal Utilities for NAT)

A server that each client ping to discover their public IP address and port assigned by a NAT device. It plays a vital role in facilitating communication between devices behind NAT and the wider Internet.

Key Features of STUN

Without STUN, devices behind a NAT would only know their local IP address which is useless outside the private network. A STUN server helps discover the public facing IP and port assigned by the router. This allows peers on different networks to attempt direct communication with each other.

TURN (Traversal Using Relays around NAT)

A protocol that facilitates communication in scenarios where direct peer-to-peer connectivity is not possible, particularly in the case of symmetric NAT.

How TURN Works

If a direct peer-to-peer connection fails, the TURN server acts as a middleman and relays all traffic between both users. Instead of sending data directly to each other, both peers send their packets to the TURN server which then forwards them to the other side. This guarantees connectivity but increases bandwidth usage and latency.

Considerations for TURN

TURN servers can become expensive because all traffic passes through them. Video calls especially can consume a large amount of bandwidth. Because of this, WebRTC will always try direct peer-to-peer communication first and only fall back to TURN when absolutely necessary.

Network Address Translation (NAT)

Used in networking to enable private networks to communicate with the internet using a single public IP address. Its a common scenario when you are connected via WIFI or a mobile network. Only the router or modem you are connected to has a public IP address. Each device on the private network, however, is assigned a local IP address (like 192.168.1.10). NAT plays a crucial role in managing and translating these IP addresses so your router or modem can communicate to the outside world on your behalf..

How NAT Works

When a device from your private network makes a request to the internet, the router creates a NAT table. This table is used to keep track of each devices connections and manage the translation between private and public IP addresses. Here is an example of what a NAT table might look like.

Internal IPInternal PortExternal IPExt. PortDest IPDest Port
192.168.1.1056234203.0.113.4544322172.217.16.78443
NAT Translation Methods

Some NAT types are more permissive than others. Full cone and restricted NAT usually work fine with WebRTC, while symmetric NAT is much harder to work with because it creates unique mappings for every connection. This makes direct peer-to-peer communication unreliable and is the main reason TURN servers are sometimes required.

Signaling in Networking

Signaling is a crucial process in networking, especially in the context of setting up communication sessions using the SDP.

What is Signaling?

Signaling is simply the process of exchanging connection information between peers. WebRTC itself does not define how signaling should work. Most applications use WebSockets, but you could also use HTTP requests, Socket.IO, or any other messaging system capable of exchanging SDP offers, answers, and ICE candidates.

WebRTC Demystified

  1. A wants to connect to B
  2. A creates an "offer", it finds all ICE candidates, security options, audio/video options and generates the SDP. The offer is basically the SDP.
  3. A signals the offer to B through websocket.
  4. B creates the "answer" after setting A's offer as it's local description.
  5. We have both a local and a remote SDP, one would be our own where the other one is other party's.
  6. B signals the "answer" to A
  7. Connection is created

Code examples

https://github.com/devjam1n/WebRTC