
In January 2019, a critical flaw was reported in Apple’s FaceTime group chat feature, which allowed users to initiate a FaceTime video call and listen to targets by adding their own number as a third person in a group chat even before the person on the other end accepted the incoming call.
The vulnerability was considered so severe that the iPhone maker completely removed the FaceTime group chats feature before the issue was resolved in a later iOS update.
Since then, a number of similar shortcomings have been discovered in several video chat applications, such as Signal, JioChat, Mocha, Google Duo and Facebook Messenger – all thanks to the work of Google Project Zero researcher Natalie Silvanovich.
“While [the Group FaceTime] the bug was soon fixed, the fact that such a serious and easily achievable vulnerability arose due to a logical error in a calling state machine – an attack scenario I have never seen on a platform – made me to wonder if other state cars had similar vulnerabilities as well, “Silvanovich wrote in a Tuesday’s depth of his one-year investigation.
How does WebRTC signaling work?
Although most messaging applications today rely on WebRTC for communication, the connections themselves are created by exchanging call setup information using Session Description Protocol (SDP) between colleagues in what is called signaling, which usually works by sending an offer. SDP at the end of the caller, to which the caller responds with an SDP response.
In other words, when a user initiates a WebRTC call to another user, a session description called “offer” is created that contains all the information needed to set up a connection – the type of media sent, its format, the transfer protocol used, and the IP address and wearing the end point, among others. The recipient then responds with a “reply”, including a description of its endpoint.
The whole process is a state machine, which indicates “where is the process of signaling the exchange of offer and response to the current connection”.
Also included, optionally included as part of the exchange of offers / answers, is the ability of the two colleagues to exchange SDP candidates with each other, so as to negotiate the real connection between them. It details the methods that can be used to communicate, regardless of the network topology – a WebRTC framework called Interactive Connectivity Establishment (ICE).
Once the two colleagues agree on a mutually compatible candidate, that candidate’s SDP is used by each colleague to build and open a connection, through which the media then begins to flow.
In this way, both devices share the information needed to exchange sound or video via the peer-to-peer connection. But before this relay can take place, the captured media data must be attached to the connection using a feature called tracks.

Although the caller’s consent is expected to be secured before audio or video transmission and no data is shared until the receiver has interacted with the application to answer the call (i.e. before adding parts to the connection), Silvanovich observed a behavior contrar.
Several affected messaging applications
Not only did the flaws in the applications allow calls to connect without interaction from the caller, but they also allowed the caller to force a called device to transmit audio or video data.
The common root cause? Logical bugs in signaling machines, which Silvanovich said were “a worrying and under-investigated area of video conferencing applications.”
- Signal (resolved in September 2019) – An audio call error in Signal’s Android application made it possible for the caller to hear the caller’s surroundings due to the fact that the application did not check if the device that received the connection message from the caller was the device caller .
- JioChat (fixed in July 2020) and Mocha (fixed in August 2020) – Adding candidates to offers created by the Mocha Reliance JioChat and Viettel Android apps that allowed a caller to force the target device to send audio (and video) without a user’s consent. The flaws stem from the fact that the peer-to-peer connection was established just before the caller answered the call, thus increasing the “WebRTC remote attack area”.
- Facebook Messenger (fixed in November 2020) – A vulnerability that could have caused an attacker who is connected to the application to simultaneously initiate a call and send a specially crafted message to a target that is connected to both the application and another Messenger client , as a web browser, and start receiving sound from the calling device.
- Google Duo (resolved in December 2020) – A race condition between disabling the video and setting up the connection, which in some cases could cause the caller to leak video packets from unanswered calls.
Other messaging applications such as Telegram and Viber were found not to have any of the above flaws, although Silvanovich noted that the significant reverse engineering challenges when analyzing Viber made the investigation “less rigorous” than the others.
“Most of the calling state machines we investigated had logical vulnerabilities that allowed audio or video content to be transmitted from caller to caller without the caller’s consent,” Silvanovich concluded. “This is clearly an area that is often overlooked when securing WebRTC applications.”
“Most errors did not appear to be caused by the developers’ misunderstanding of WebRTC features. Instead, they were due to errors in the way state machines are implemented. That being said, the lack of awareness of these types of issues was probably a factor, “she added.
“It is also worrying to note that we did not analyze any call groups of the features of these applications and all reported vulnerabilities were found in peer-to-peer calls. This is an area for future work that could reveal additional issues. “