CommuniGate Pro
Version 6.3
 

Dynamic Clusters

While Static Clusters can be used to handle very large sites, they do not meet the carrier-grade uptime requirements.

Managing a set of loosely-coupled Server also becomes a problem as the number of Server grows.

The CommuniGate Pro Dynamic Clusters address these challenges. They exceed the "five-nine" (99.999%) uptime requirements, and their Single Service Image infrastructure allows System and Domain administrators manage a large Cluster System in the same way a smaller single-server CommuniGate Pro system is managed.

The main difference between Static and Dynamic Clusters is the Account hosting. While each Account in a Static Cluster has its Host Server, and only that Server can access the Account data directly, all Backend Servers in a Dynamic Cluster can access the Account data directly.

The most common method to implement a Dynamic Cluster shared Account Storage is to employ File Servers or Cluster File Systems. See the Storage section for more information about Shared File Systems.

Traditional File-Locking Approach

Many legacy Communication servers can employ file servers for account data storage. Since those servers are usually implemented as multi-process systems (under Unix), they use the same synchronization methods in both single-server and multi-server environments, such as file locks implemented on the Operating System/File System level.

This method has the following problems:
  • Every operation with Account/Mailbox data should be surrounded with file locking/unlocking operations, and additional File System operations are needed to ensure data consistency. As a result, the number of File System operations increases in 3-5 times, and (since the speed of file operation usually defines the speed of the site) the site performance suffers a lot.
  • Modern File Servers either do not support file locking mechanisms at all, or provide severely limited versions of those mechanisms, making the most important site component - account storage - unreliable and not fault-tolerant.
  • Malfunction of one of the servers can bring the entire site down (because of deadlocks), and makes fault recovery extremely painful.
  • Simultaneous access to the same Account/Mailbox by several clients is either prohibited or unreliable.

In the attempt to decrease the negative effect of file-locking, some legacy Messaging servers support the MailDir Mailbox format only (one file per message), and they rely on the "atomic" nature of file directory operations (rather than on file-level locks). This approach theoretically can solve some of the outlined problems (in real-life implementations it hardly solves any), but it results in wasting most of the file server storage, and overloads the file server internal filesystem tables.
The performance of File Servers severely declines when an application uses many smaller files instead of few larger files.

While simple clustering based on Operating System/File System multi-access capabilities works fine for Web servers (where the data is not modified too often), it does not work well for Messaging servers where the data modification traffic is almost the same as the data retrieval traffic.

Simple Clustering does not provide any additional value (like Single Service Image), so administering a 30-Server cluster is even more difficult than administering 30 independent Servers.

The CommuniGate Pro software supports the Legacy INBOX feature, so a file-based clustering can be implemented with the CommuniGate Pro, too. But because of the problems outlined above, it is highly recommended to avoid this type of solutions and use the real CommuniGate Pro Dynamic Cluster instead.


Cluster Controller

CommuniGate Pro Servers in a Dynamic Cluster do not use Operating System/File System locks to synchronize Account access operations. Like in a Static Cluster, only one Server in a Dynamic Cluster has direct access to any given Account at any given moment. All other Servers work through that Server if they want to access the same Account. But this assignment is not static: any Server can open any Account directly if that Account is not opened with some other Server.

This architecture provides the maximum uptime: if a Backend Server fails, all Accounts can be accessed via other Backend Servers - without any manual operator intervention, and without any downtime. The site continues to operate and provide access to all its Accounts as long as at least one Backend Server is running.

One of the Backend Servers in a Dynamic Cluster acts as the Cluster Controller. It synchronizes all other Servers in the Cluster and executes operations such as creating Shared Domains, creating and removing accounts in the shared domains, etc. The Cluster Controller also provides the Single Service Image functionality: not only a site user, but also a site administrator can connect to any Server in the Dynamic Cluster and perform any Account operation (even if the Account is currently opened on a different Server), as well as any Domain-level operations (like Domain Settings modification), and all modifications will be automatically propagated to all Cluster Servers.

Note: most of the Domain-level update operations, such as updating Domain Settings, Default Account Settings, WebUser Interface Settings, and Domain-Level Alerts may take up to 30 seconds to propagate to all Servers in the Cluster. Account-Level modifications come into effect on all Servers immediately.

The Cluster Controller collects the load level information from the Backend Servers. When a Frontend Server receives a session request for an Account not currently opened on any Backend Server, the Controller directs the Frontend Server to the least loaded Backend Server. This second-level load balancing for Backend Server is based on actual load levels and it supplements the basic first-level Frontend load balancing (DNS round-robin or traffic-based).

When a Dynamic Cluster has at least 2 backend Servers, the Cluster Controller assigns the Controller Backup duties to one of the other backend Servers. All other Cluster members maintain connections with the Backup Controller. If the Backup Controller fails, some other backend Server is selected as a Backup Controller.

If the main Controller fails, the Backup Controller becomes the Cluster Controller. All Servers send the resynchronization information to the Backup Controller and the Cluster continues to operate without interruption.

While the Dynamic Cluster can maintain a Directory with Account records, the Dynamic Cluster functionality does not rely on the Directory. If the Directory is used, it should be implemented as a Shared Directory.

A complete Frontend-Backend Dynamic Cluster configuration uses Load Balancers and several separate networks:

Dynamic Cluster

Since all Backend Servers in a Dynamic Cluster have direct access to Account data, they should run the operating systems using the same EOL (end-of-line) conventions. This means that all Backend Servers should either run the same or different flavors of the Unix OS, or they all should run the same or different flavors of the MS Windows OS. Frontend Servers do not have direct access to the Account data, so you can use any OS for your Frontend Servers (for example, a site can use some Unix OS for Backend Servers and Microsoft Windows for Frontend Servers).


Cluster File Systems and Cluster OSes

Some of the modern Operating Systems provide advanced Clustering capabilities themselves. Most of those Cluster features are designed to help porting "regular", non-clustered applications on these Cluster platforms. But some features provided with those Cluster OSes are very useful for the CommuniGate Pro Dynamic Cluster implementations.

These features include:
  • Cluster File System
  • IP Aliasing

A Cluster File System allows all Servers in an OS Cluster to mount and use the same file system(s) on shared devices. Unlike Network File Systems (NFS), Cluster File Systems do not require a dedicated server on the network. Cluster File Systems can utilize multiple SCSI connections provided with some high-end SCSI storage devices, and they can allow each Server to exchange the data directly with storage devices via a SAN (Storage Area Network). To ensure file system integrity, Cluster File Systems use high-speed server interconnects.

The SAN protocols are very effective for file transfers, and Cluster File Systems can provide better performance than Network File Systems.
The Cluster File Systems can also provide better reliability than single-server NFS solutions (where the NFS server is a single point of failure).
See the Storage section for more details.

The IP Aliasing feature allows the Cluster OS to distribute the network load between Cluster Servers without an additional Load Balancer unit.

A "backend-only" CommuniGate Pro Dynamic Cluster can utilize both features of a Cluster OS: the IP Aliasing is used to distribute the load between CommuniGate Pro Server, and CommuniGate Pro Servers use the Cluster File System to store all account data in shared Domains:

Dynamic - Symmetric

A Cluster OS can be used in a frontend/backend CommuniGate Pro Cluster configuration, too. In this case, one OS Cluster is used for CommuniGate Pro frontend Servers, utilizing the IP Aliasing load balancing, and the second OS Cluster is used for CommuniGate Pro backend Servers, where the Cluster File System is employed:

Dynamic - 2 OS Clusters

The Configuration of the CommuniGate Pro Dynamic Cluster does not depend on the type of the load balancing used (separate Load Balancers or IP Aliases), or on the type of the shared file system used (Network File System or Cluster File System).


Configuring Backend Servers

To install a Dynamic Cluster, follow these steps:
  • Install and configure CommuniGate Pro Software on all Servers that will take part in a Dynamic Cluster.
  • Open the WebAdmin Settings->Services page and modify the PWD service settings. Each Cluster member (Backend and Frontend) opens 2 PWD connections to the Cluster Controller, so the maximum number of channels should be increased at least by
    2*(number of Backend servers + number of Frontend servers)
    Since additional PWD connections can be opened by Frontend and Backend servers to serve administrator and user requests, it is better to increase the number of channels by:
    5*(number of Backend servers) + 3*(number of Frontend servers)
  • Open the WebAdmin Settings->General->Clusters page and enter the IP addresses of all backend and frontend Servers in the Cluster.
  • Stop all Servers.
  • Create a file directory that will contain Shared Domains. You should create that file directory on a storage unit that will be available for all Cluster Backend Servers (on a file server, for example). Place a link to that directory into the CommuniGate Pro base directory, and name that link SharedDomains. Make sure that all Backend Servers have all file access rights to create, remove, read, and modify files and directories inside the SharedDomains directory.

    Note: if creating symbolic links is problematic (as it is on MS Windows platforms), you should specify the location of the "mounted" file directory as the --SharedBase Command Line Option:

    --SharedBase H:\Base
  • If you are upgrading from a single-server configuration, you may want to make some of your existing Domain shared, so they will be served with the entire Cluster. In this case you should move the Domain file directory from the {base}/Domains file directory into the {base}/SharedDomains file directory (located on a shared storage unit).
  • Modify the Startup option for all your Backend Server, so they will include the --ClusterBackend Command Line Option.
  • Start one of the Backend Servers.
Use the WebAdmin Interface of this first Backend Server to verify that the Cluster Controller is running. Open the Domains page to check that:
  • all domains you have placed into the SharedDomains directory are visible;
  • the Create Domain button is now accompanied with the Create Shared Domain button.

Use the Create Shared Domain button to create additional Shared Domains to be served with the Dynamic Cluster.

When the Cluster Controller is running, the site can start serving clients (if you do not use Frontend Servers). If your configuration employs Frontend servers, at least one Frontend Server should be started.


Adding a Backend Server to a Dynamic Cluster

Additional Backend Server can be added to the Cluster at any moment. They should be pre-configured in the exactly the same way as the first Backend Server was configured.

To add a Backend Server to your Dynamic Cluster, start it with the --ClusterBackend Command Line option (it can be added to the CommuniGate Pro startup script). The Server will poll all specified Backend Server IP Addresses until it finds the active Cluster Controller.

Use the WebAdmin interface to verify that the Backend Server is running. Use the Domains page to check that all Shared Domains are visible and that you can administer Accounts in the Shared Domains.

When the Cluster Controller and at least one Backend Server are running, they both can serve all accounts in the Shared Domains. If you do not use Frontend Servers, load-balancing should be implemented using a regular load-balancer switch, DNS round-robin, or similar technique that distributes incoming requests between all Backend Servers.


Adding a Frontend Server to a Dynamic Cluster

You can add additional Frontend servers to the Cluster at any moment.

Install and Configure the CommuniGate Pro software on a Frontend Server computer. Since Frontend Servers do not access Account data directly, there is no need to make the SharedDomains file directory available ("mounted" or "mapped") to any Frontend Server.

Specify the addresses of all Backend Servers using the Frontend Server Settings->General->Cluster WebAdmin page.

To add a Frontend Server to your Dynamic Cluster, stop it, and restart it with the --ClusterFrontend Command Line option (it can be added to the CommuniGate Pro startup script). The Server will poll all specified Backend Server IP Addresses until it finds the active Cluster Controller.

Use the WebAdmin interface to verify that the Frontend Server is running. Use the Domains page to check that all Shared Domains are visible.

When Frontend Servers try to open one of the Shared Domain accounts, the Controller directs them to one of the running Backend Servers, distributing the load between all available Backend Servers.


Shared Settings

The Dynamic Cluster maintains a separate set of "Default settings" for Shared Domains.

These settings include:

  • Default Domain Settings for all Shared Domains
  • Default Account Settings and Default WebUser Preferences for all Accounts in Shared Domains
  • Cluster-Wide Alerts - these alerts are sent to all Accounts in Shared Domains

When the Server Administrator uses the WebAdmin Interface to modify these settings, the WebAdmin pages display the links that allow the Administrator to switch between the Server-wide settings (that work for all non-Shared Domains), and Cluster-wide settings. The Cluster-wide settings are automatically updated on all Cluster Members, and they work for all Shared Domains.

The Cluster-wide settings also include:

Shared Processing

The Dynamic Cluster Single Service Image component provides server synchronization beyond Account, Domain, and other Settings.

Additional "shared processing" functionality includes:

Withdrawing Servers from a Dynamic Cluster

Use the WebAdmin Interface to withdraw a Server from a Dynamic Cluster. Open the Cluster page in the Monitors realm, and click the Make Non-Ready button.

When a Frontend Server is in the Non-Ready state, all its UDP ports and all its TCP ports (except the HTTP Admin ports) are closed.
The load balancer delivering incoming connections to the Cluster Frontends should detect this and stop sending new connections and packets to this Frontend.

When a Backend Server is in the Non-Ready state, the Controller does not send any new sessions to this Server. Wait until all existing sessions end, and then shut down the Backend Server.

If this Backend Server is the currently active Controller, then making it Non-Ready causes the Controller to send all new sessions to the other Backend Servers. If there are no other Backend Servers in the Cluster, the Controller continues to serve all new sessions itself.

You can click the Make Ready button on the same page to re-enable the Server. If the Server is a Backend, the Controller starts to send new sessions to it.

You need to have the Can Control Cluster access right to make Cluster members Ready or Non-Ready

If a Backend Server fails, all Shared Domain Accounts that were open on that Server at the time of failure become unavailable. They become available again within 5-10 seconds, when the Cluster Controller detects the failure. A Backend Server failure does not cause any data loss.


Upgrading Servers in a Dynamic Cluster

The Dynamic Cluster is designed to support "rolling upgrades". To upgrade to a newer version of the CommuniGate Pro software, you should upgrade the Servers one-by-one: withdraw a server from the Cluster, upgrade the software, and add the server back to the Cluster. This procedure allows your site to operate non-stop during the upgrade.

Certain changes in CommuniGate Pro software can impose some restrictions on the "rolling upgrade" process. Always check the History section before you upgrade your Cluster, and see if any Cluster Upgrade restrictions are specified there.


CommuniGate Pro Guide. Copyright © 2024, AO SBK