Failover Capability

In any mission critical communications system it is imperative to have built in redundancy. Telex RDC supports this function with a Failover capability. Failover by definition is the ability to automatically switch over from a primary working server to a secondary backup server should there be any catastrophic failure in operation of the primary server or its associated network.

Implementation

In the primary server, the IP address of the secondary server is configured. When the primary server starts, a connection to the secondary server is immediately established. Through this connection the primary server shares its licensing information to the secondary server. Additionally, the primary server conveys any operational changes to the system configuration to secondary server in real time so that the system configurations remain synchronized. This connection will remain active as long as both servers are running. If this connection is lost, the secondary server will immediately assume it is an active server allowing Telex RDC clients to connect. However in some cases even if the connection is not lost, the complexities of some network failures may still warrant the secondary server becoming the active server. When any Telex RDC client logs into the primary server, the secondary server IP address is automatically provided to it. In the event that communications with the primary server is lost, the client will automatically attempt to connect to the secondary server. If the secondary server is available and active, the Telex RDC client will log into the secondary server.

Once the secondary server becomes the active server, switching back to the primary server generally will require a manual authorization as the condition that caused the failover would need to be properly evaluated to ensure there is no possibility for reoccurrence of that event which would unnecessarily disrupt active communications. The manual switchover can be controlled through the System Administration application.

For failover to work seamlessly, all Telex RDC clients must be using a security certificate for which the issuing Certificate Authority is trusted. If the certificate is not signed by a public Certificate Authority, as is the case with the default Bosch self-signed certificates, then the Bosch Certificate Authority (CA) certificate must be installed on all Telex RDC clients to allow the Telex RDC client to trust Bosch self-signed certificates. The process to do so is slightly different for each OS and browser. For all clients, the Bosch CA certificate can be downloaded from the menu available on the Login dialog. For Android and iOS, on screen instructions will detail the steps necessary to install the certificate. For the Windows OS, the certificate will specifically need to be installed into the store for “Trusted Root Certificate Authorities”. For the FireFox browser on the Windows OS, one additional step must be taken to enable the use of the Windows Certificate store which can be done by entering “about:config” in the FireFox browser bar, searching for the “security.enterprise_roots.enabled” option and setting this value to ‘true’.

Notice for FireFox Users: It has been observed that after a failover/failback the user may intermittently not be able to receive audio as a result of a FireFox specific implementation restriction. A logout/login will resolve the issue.

Notice for Android Users: It has been observed that after a failover/failback, the user interface, if open, will intermittently not reconnect to the active server but instead display a page showing “Webpage not found”. Returning to the Host Address dialog and re-connecting to the server will resolve the issue.

Failover Criteria

In normal operations, the primary server is always the active server. In general as long as the communications link between the primary and secondary server is connected, the primary server will remain as the active server. When a server is not the active server, logins will not be allowed. There are many different scenarios that can result in a failover event. The most common are as follows:

Communication link between primary and secondary servers lost due to primary server failure:
In this simplest scenario, the secondary server would recognize the loss of the primary server and immediately become the active server. All clients would also recognize the loss of the primary server and would immediately reconnect to the secondary server.

Communication link between primary and secondary servers lost due to failure of network infrastructure:
In this scenario, the secondary server would recognize the loss of the primary server and immediately become the active server. Since the primary server is still running, it too would also still consider itself to be the active server. However, if the network failure also resulted in the simultaneous loss of the majority of connected Telex RDC clients the primary server will deactivate itself forcing all remaining clients to connect to the secondary server.

Communication link between primary and secondary servers is not lost but partial failure of network infrastructure:
In this scenario, the partial network failure may result in the loss of a large portion of the Telex RDC clients. In this case the primary server will inform the secondary server to activate allowing connections to be made. If the secondary server reports the client connection were established the primary server will deactivate itself forcing all remaining clients to connect to the secondary server.

Summary

The Telex RDC Failover support is an integral part of any mission critical communications solution. While its operation will never be visible to the end user, it availability in the case of an unforeseen catastrophic primary server failure will quickly restore communications capability.