Is an NT cluster in your future?

by David Strom

(appeared in the May 15, 1997 issue of Network Computing magazine as part of an NT advertising supplement)

When it comes to network servers, having a cluster means having more than one. Anything more specific than this is subject to lots of disagreement among the major server hardware vendors. Clusered servers have become a competitive world with claims, counter-claims, positions and even products. NT-based products have lots of subtle differences, and while products from Microsoft have slipped in their delivery dates, there is a wide range of offerings from other vendors to satisfy many mission-critical applications.

The essence of server clustering is a simple idea: take several servers (called nodes) and tie them together with a variety of hardware and software tricks to ensure either high availability or scaleable performance. Ideally, the goal is to have both: a collection of hardware that doesn't fail and doesn't run out of gas as applications grow and consume more processing horsepower.

Clustering makes a great deal of sense for database servers: typically, these are the applications that require both high availability and are the most troublesome to scale for increased demands. Most of the vendors with shipping NT clustering products offer support for at least one database server. However, each clustering implementation is different, and these differences could have drastic consequences for corporate network architects and application builders.

Clustering isn't the same as symmetric multiprocessing (SMP), which is putting more than one central processor unit (CPU) inside a single node. However, you can combine clustering with SMP hardware to deliver the ultimate in high-reliability and scaleable machines. With SMP, the operating system is aware of the multiple processors and divides its own tasks among the various CPUs. The multiple CPUs share memory, disk, and other machine resources. However, when one of these shared resources fail, the entire machine will stop working. Clustering gets around this problem by offering more redundant components, typically separate disk drives and processors.

While NT-based clustering is relatively new, it draws upon concepts and products that have long been available on mainframes for over a decade from IBM, NCR, and Tandem. Microsoft has taken these ideas and incorporated them into its Wolfpack initiative. Announced last year, Wolfpack was designed as a way to bring a standard series of programming interfaces to NT clustering. However, Wolfpack is still far from a shipping product, despite promises for delivery dates earlier this year.

Initially, Wolfpack was intended to support two-node clusters on all of NT's processor types. However, both MIPS and PowerPC support for NT are all but history, making the notion of having a standard interface less compelling. Tandem has committed to porting its ServerNet to the Wolfpack APIs, and a number of other vendors such as HP, NCR, and Intel are also behind the effort.

How does each clustering implementation differ? The most obvious difference is the number of nodes that can be connected together and the nature of the actual connection itself. Most vendors, like Microsoft, have opted to support two-node clusters initially. However, Stratus has come out with support for 24-node clusters in its RADIO product. This multiple processor support isn't cheap - prices begin at more than $65,000 for the hardware. Compaq and others have plans to support up to four nodes in future versions, and NCR and others also plan on going beyond two nodes for their future NT clustering solutions.

There are three types of connections to tie a cluster together. First, your clustered server is connected normally to the network. This is simple but isn't the best for reliability purposes. Second the cluster shares a common SCSI bus that connects disk drives to multiple servers. This is the method used by both Digital and Compaq's products. Third, the cluster can have its own dedicated link that is shared among the servers and is used in addition to the network connection.

Clusters also differ in their support for particular protocols and the types of client machines and applications. It pays to read the fine print to determine whether what is supported will match up with your application needs. You'll also need to do some research to find out what happens after a failure and whether or not every client/protocol/application combination will be reconnected or if the connection to the server has to be re-established manually. Obviously, the former method is much more desirable for users since it requires less effort.

For example, with Digital's first release of NT clustering, only applications that make use of Named Pipes and System Message Block (SMB) protocols will work. This means that all IP-related applications such as Web servers aren't supported on Digital's NT clusters. Compaq's Online Recovery Server supports two applications: Oracle 7.3 and SQL Server 6.5, and only on Windows clients: it won't support Macintosh or OS/2 clients at all and non-NT clients will have to manually reconnect. Stratus' RADIO Cluster just supports Microsoft's Internet Information Server and SQL Server applications, while IBM's products support Notes and Database/2 applications. NCR's LifeKeeper seems to have the widest support for all kinds of clients and applications: the product supports automatic reconnection of any NT client, for example.

Finally, there is the whole issue of how the cluster itself is managed. There are two basic options: either manage the entire cluster as a single entity or be able to view the individual nodes that make up the cluster. Depending on the application, they can work better or worse in either type of environment. Digital's cluster appears as a single entity, which is one of the reasons special client software is required, but some applications may not work well in this fashion. Other vendors switch the server name or IP address from the failed to working nodes, which can also be a problem for other kinds of applications.

Resouces:

For more information on clustering, see the following web pages: