Standby Servers Deliver (5/6/96)
by David Strom
Few things can be more troublesome to a network administrator than a server crash: servers lie at
the core and center of our network universes, and small changes in their operations can have large
effect on great numbers of end users. So improvements in reliable operations and "up time" can
make a big difference in terms of how a network -- and its administrative staff -- are perceived by
users and upper management.
Two products have been introduced in the past year that can help improve reliability: Standby
Server 32 from Vinca Corp. and Lantegrity from Network Integrity. Vinca has versions for both
OS/2 and NetWare at present with a version for NT promised by the end of this month. Lantegrity
is just for NetWare right now. To put them through their paces we tested both products on the
G.Neil Companies' network in Sunrise, Florida. G.Neil is a human resources direct marketing
vendor, selling a wide range of products from office forms to motivational posters to plaques and
video tapes. The company has seen a fairly steady growth in both its employees and revenues, and
the IS shop is stretched thin dealing with a number of issues, including the ability to supply reliable
server performance.
The company has an Ethernet network with a wide variety of servers on it, including a small FT250
Netframe and several other servers running NetWare, a Unisys mainframe and an IBM AS/400.
The company runs the usual suite of office applications including Lotus cc:Mail and PowerBuilder,
and has a mixed network of 200 Windows 3.1 PCs and Macintoshes.
G.Neil bought their Netframe on the promise of having better reliability, but found that it still wasn't
running perfectly. "A server abend would knock us out for an entire day sometimes," said Chip
DiComo, micosystems manager at the company. Abends can happen from a variety of factors:
badly behaved NetWare loadable modules (NLMs), power problems, and heavy network traffic
are all possible causes. They are the bane of most NetWare managers' existence: while the path to
server recovery is well understood, it can be time-consuming, especially if you have to restore or
repair large volumes of data. G.Neil's Netframe has 8 gigabytes of data, and at best can take a few
hours to bring back on line.
This is potential downtime that DiComo was looking to avoid. The two products come at the
solution to improving reliablity from very different perspectives: Vinca uses the concept of Novell's
disk mirroring and extends it to a complete redundant or mirrored server. This server is connected
both to the enterprise network and to its twin server via a separate communications link, and only
comes on-line if the primary server fails. "All that Vinca is doing is just mirroring disks, it just so
happens that they are in another box," said DiComo.
Network Integrity has another method which uses a spare NetWare 4.1 server to protect multiple
servers. This server should not be used for anything else, since the Lantegrity software takes over
complete control and runs it at close to 100 % utilization. NLM agents are loaded on each server to
be protected: when these servers go down, the spare 4.1 server sends out broadcasts mimicking the
failed servers so that users still think their server is on-line. For both products, once the protected
server comes back on line, you switch back over manually.
Before you dive into this swamp that represents redundant file servers, realize what problem you are
trying to solve. Are you dissatisfied with the level of reliability of your NetWare server itself? Is the
problem the hardware or software configuration of the server? Do you have mission and
time-critical applications running that will require constant availability of your file servers? Does it
take you too long to recover from file server abends because of the amount of data storage? Then
one of these products might be of help. But you'll need to spend a great deal of time in testing them
out, and then training your staff on the proper procedure to implement their advanced features.
DiComo wanted to see if both products delivered on their promises, and we set them up in a test
lab with several test servers, connected alternately to their own network and to the enterprise
backbone.For StandbyServer, we used two HP machines: a VE pentium 75 and XU pentium 120,
both with 1 gigabyte hard disks. For Lantegrity, we used an HP Netserver LC with two gigabytes
of disk running NetWare 4.1, along with an HP tape autoloader 12000 SureStore. You need the
tape autoloader since you store the protected files on tapes. We also had another NetWare server
that was the one we were protecting under Lantegrity: this was another HP pentium with a small
168 megabyte disk. And we had several Windows 3.1 and 95 workstations -- Lantegrity requires
one to run various administrative tasks, while StandbyServer is operated completely from the
server's own console.
The Lantegrity server needs an extra 1 gigabyte hard disk and an extra 16 megabytes of RAM than
the largest server you are protecting -- this is to handle the caching requirements. If you do the
math, that means to protect G.Neil's Netframe would require a machine with 9 gigabytes of disk
and 144 megabytes of RAM -- that comes out to about $15,000 worth of hardware, according to
DiComo. "It still is alot cheaper than buying an identical Netframe -- although they don't sell that
model anymore and we would have to shop around for a used one for something like $40,000.
Indeed, just buying Novell's own System Fault Tolerance Level 3 software would be $18,000 for
the software alone."
Our tests were relatively straightforward: we alternately pulled the power and network connections
from the servers under protection while we copying files from a Windows workstation and
observed what happened. G.Neil wanted more than just having their server up and running: they
wanted their end users to keep their network connection and continue working. That turned out to
be a more challenging situation.
Both products are a bear to install, and will require some calls to the vendor's support lines: part of
the problem is that they are complex products that require a deep understanding of different
portions of NetWare that aren't usually commonly known areas. With Vinca's StandbyServer, you'll
need to understand disk and server mirroring concepts and be able to use the Novell commands to
reconstruct the mirrored volumes in case of a problem. Think of the Vinca product as assembling
two mirrored and duplexed disk drives -- the only difference from this tried and tested situation is
that the drives happen to be housed in separate servers and are connected via two wires: their
ordinary network cabling and a special high-speed (160 M bps) cable provided by Vinca. This
means that for every server you wish to protect, you'll need to buy an additional standby server.
Included in the package is a copy of runtime NetWare 3.12 that is installed on the standby machine.
Unlike Novell's duplexing and mirroring requirements, you don't need the same exact equipment for
both servers: for example, a 3.12 server could stand in for a 4.1 server, and you don't need the
same network adapter and disk controllers in both machines. You do need to ensure that the
standby's hard disk is set up with the same volume structure as the protected server, however.
With Lantegrity, you'll need to have a solid understanding of Novell's Directory Services and trees,
and be able to manipulate bindery objects inside the directory trees. Here the idea is to build a huge
data repository, using a combination of more disk, a tape autoloader, and more RAM, to shadow
several different servers across a single network connection. You'll also need to understand how the
AUTOEXEC.NCF and STARTUP.NCF files work and where they are located: NetWare can
load these either from the DOS startup partition or the NetWare \SYSTEM partition, and that
needs to be sorted out before installing Lantegrity.
Lantegrity has some other caveats as well: for example, when the protected server is down, you
can't rename directories -- the folks at G.Neil liked this feature, which could prevent the servers
from synchronizing their file systems. And while it will protect the actual name spaces for OS/2 and
Mac clients, it doesn't transparently provide the files themselves: meaning that G.Neil's Mac users
will have to do another login to the Lantegrity server itself when the primary server fails.
Both NDS and mirroring skills were in short supply at G.Neil: they are just getting started with
NetWare 4.1, and only had begun to get training on directory trees. They had never put together a
mirrored server before, and needed to spend time learning how that was accomplished while we
were setting up the Vinca software. This could be typical of many NetWare shops.
Luckily, technical support from both vendors was very forthcoming and helpful: Ron Keindl was
able to get StandbyServer configured and Kelly Connor and I got Lantegrity up and running. In
both cases, vendor representatives knew that Infoworld was calling them, so you might not get the
same level of service. "However, I got lots of help from Vinca -- they were teaching me disk
mirroring," said Keindl, a consultant in the microsystems department. Both manuals are fairly dense
and will require careful reading to understand the various subtleties involved in setup. For example,
Lantegrity requires a NetWare 4.1 server to be setup with some non-standard parameters, such as
specifying NetWare file format when the server's volume is formatted. Connor, an analyst in the
microsystems department, didn't see this caveat and had to format her volume a second time.
We had other slight hiccups along the way: we needed to download an update of StandbyServer
from their BBS -- some syntax errors in the installation script that have since been corrected, and
we needed server drivers for the Kingston/Atlantic NE2000-style adapters. During the Network
Integrity installation, we had to make a run to the local computer superstore for some parts that
should have been supplied in our evaluation (but normal customers would have purchased
separately). Also vexing was a power outage right in the middle of the installation of NetWare 4.1
-- driving home how critical these products really are.
The test server we were running for the Lantegrity scenario was a real baby -- it only had a 168
megabyte hard disk and eight megabytes of RAM. Nevertheless, we found that even this was
inadequate -- we had to bring up the memory to 12 megabytes before Lantegrity would work
properly. This is because the NLM-based agents need lots of room to do their work.
We found out that they did work as intended: StandbyServer took about 20 seconds to switch
automatically from a failed primary to the standby machine. Lantegrity took about 50 seconds to do
the switch -- realizing that we were using an undersized server and didn't need to restore any files
from tape, which would take longer.
With the Vinca product, we weren't able to maintain a connection under Windows 3.1, even after
upgrading to the latest series of Virtual Loadable Module drivers (1.20). But under Windows95,
running the Microsoft network client, we were able to keep connected while the StanbyServer
switched over -- that was impressive.
We could not maintain a connection with either our Windows 3.1 or 95 client with Lantegrity during
the standby operations. According to their technical support, we should have been able to do this if
we were using Novell's 32 bit client on 95 and had configured our VLMs correctly. After pouring
over the Novell manuals, we still couldn't get it working for Windows 3.1 -- we think Network
Integrity should do more to document how this works for those customers like G.Neil that want to
maintain their connections. One thing we found annoying was that the administrator's screen doesn't
automatically refresh itself -- several times we started out to do something, only to realize that if had
pressed F5 we would have seen the current status of the server. This will be added in a future
upgrade, according to company representatives.
However, we found plenty of caveats on both products. For example: Vinca's product wouldn't
work to protect G.Neil's Netframe because this server has its own proprietary hardware bus and
can't make use of standard ISA or EISA adapters. Vinca sells its own MCA, ISA or EISA adapter
to connect the mirrored servers: unfortunately, none of these cards will fit inside a Netframe. (On its
NT and OS/2 products, Vinca uses a standard 100 megabit network adapter, making these
products more flexible.) Network Integrity's product isn't all that useful for Macintosh usersas
another example.
Vinca's product documentation doesn't mention anything about protecting print queues, but in theory
it should work since you are just duplicating the queue on the standby machine. We didn't have time
to test our theory, however. Network Integrity has all sorts of information about how to replicate
the queues on its server.
One problem we had with StandbyServer is that there is no information about the condition of one
critical link: the network connection of the standby server itself. The software monitors the network
link of the primary server, as well as the proprietary link between the two servers -- if either of these
go down, the software will automatically switch operations over from the primary to the standby
machine. Vinca representatives said that several third-party products were available to monitor this
link, although DiComo and his crew clearly weren't happy with the notion of having to look around
and test yet another add-on.
Another issue for StandbyServer is that you need to keep a cool head when it comes time to restore
operations. To bring the protected server back online, you need to type in a few commands at both
the standby and protected server's consoles: the commands are slightly different. If you type the
wrong one, then the servers won't synch up properly and you have to delete the mirrored partition.
"For our environment, the hardware dependency of Vinca is show stopper, since we can't use it
with our Netframe.Lantegrity is not hardware dependent, and something we definitely want to
pursue," said DiComo.
Data box
Standby Server 32 v 1.60
Vinca Corp.
Orem UT 84058
801 223 310
801 223 3107 fax
price: $2599 to $2999, depending on type of network adapter required
Lantegrity 3.22c
Network Integrity
Marlboro, MA 01752
508 460 6670
508 460 6771 fax
price: $4950 for 100 users, $1600 for an additional 100 users
© Infoworld Publishing Co.