Saturday, September 1, 2012

Server Configuration: How to make Uptime 99.999%

Related Terms

Advanced ECC Memory Protection
Chipkill
Hardware RAID / DRBD
Multichannel SCSI Controllers
External Drive Array
Hot-Swappable PCI card
Hot-Swappable Power Supply
Monitoring (OS Process, Network)
Disk Mirroring

General Infrastructure

When designing a high availability solution, it should generally be remembered that even the installation of all key servers at a single location can be a potential single point of failure (SPOF) if this location is hit by disaster or power failures. The environmental conditions of the servers should also be taken into account (redundant) air conditioning systems are essential.

Hardware

Even the most sophisticated software cannot produce a high availability system without the greatest possible security from failure on a hardware level. The key hardware components that should be considered and laid out with the greatest possible redundancy are:

Power Supply : If possible, secure your servers using a UPS (uninterpretable power supply) to ensure that a brief power failure can be bridged and the systems can be shut down correctly in the event of a longer power failure. The power supply should also be configured for redundancy. Hot-Swappable Power Supply are standard on any server that claims to offer increased levels of reliability or availability.

Network Interfaces : Make sure each of your systems has several network interfaces. If      one interface fails, another must automatically take over the address and task of the failed component. Redundancy expressly relates to the two interface directions. There is no harm planning an active and backup interface for both the internal and external interfaces.    

Hard Disks : Assign several hard disks to your system and arrange the data backup (e.g., using RAID or DRBD) in such a way that if one of these disks is lost, the others always contain the intact data record. It must be possible to replace a faulty disk with a new one without stopping the system.    

Applications

All important data and applications that form the outer face of your systems must be arranged in such a way that they will not prevent a restart. If an application does not release its lock files after a crash, this prevents the relevant process from restarting. This means that the application is not suitable for a high availability environment. Ideally, the "health" of certain applications, operating system processes, and network connections should be monitored with a suitable monitoring tool.


Data

After a system fails, all key data must be available to the fail over system complete and intact. This type of high availability is achieved by distributing stored data over several systems or hard disks. For this, the contents of a disk are regularly mirrored on another disk (or several disks), which can take over with the intact data record if a failure occurs. Use a file system to ensure that a file system restarts in a consistent state after a system crash.

Network

All network infrastructure should be configured for redundancy, from the router and switch infrastructure down to the simple network cable.

No comments:

Post a Comment