How To Burn In A Server Under Linux

From time to time we all need to install new hardware. The problem is, how can you be sure that all of the components of your server are working well before you put it into production? For other operating systems there are diagnostic tools which allow you to run burn-in tests to exercise your hardware before going to productions. This article will show you how to do a similar process on Linux.



There are 4 main things that you need to test on a new system to ensure reliable operation. The processor can be tested by using the "/dev/urandom" device because it uses lots of very complicated mathematics to perform it's function. The disks can be tested by creating, moving and deleting large files repeatedly. Additionally, you will wish to test the network devices to ensure reliability. Finally, the memory can be tested by creating, using and destroying RAM disks. If you put them all together, you may already have an idea where this article is going.


For my purposes, I have 2 identical servers which I am planning to put into a load-balanced, active-active configuration. Since they are identical, I figured that I would test them against each other to verify that the network cards are also working. Here's my process



  1. Create a RAM disk and mount it

  2. Use "dd" to create a 1GB file from the /dev/urandom device on the RAM disk

  3. Copy the 1GB file from the RAM disk to the physical disk

  4. Copy the physical file via SCP to the other server

  5. Rinse and repeat every 10 minutes for 48 hours


The first step in making this happen is to write a script which will accomplish the tasks desired. Listed below is the script I used for this purpose. Be aware that I have connected to the two servers to a switch and they have no additional connectivity except to one another. The first device is 192.168.1.1 and the second device is 192.168.1.2.



#!/bin/bash

rm -f /tmp/copiedFile

/bin/mount -t tmpfs none /media/ramdisk -o size=1536m
/bin/dd if=/dev/urandom of=/media/ramdisk/testFile bs=1024 count=1048576 >> /home/deven/test.log
mv /media/ramdisk/testFile /tmp/

rm -f /media/ramdisk/testFile
/bin/umount /media/ramdisk

CURRIP=`ifconfig | grep "inet addr" | grep 192 | awk -F":" '{print $2}' | awk -F" " '{print $1}'`
if [ "${CURRIP}" == "192.168.1.1" ] ; then
echo "Copying file to 192.168.1.2"
scp /tmp/testFile deven@192.168.1.2:/tmp/copiedFile >> /home/deven/test.log
else
echo "Copying file to 192.168.1.1"
scp /tmp/testFile deven@192.168.1.1:/tmp/copiedFile >> /home/deven/test.log
fi

rm -f /tmp/testFile

After creating this script and naming it "test.sh", I created CRONTAB entries to execute it every 10 minutes. I did the same thing on the second server, but offset the times by 5 minutes as shown below:



0,10,20,30,40,50 * * * * root /home/deven/test.sh

OR

5,15,25,35,45,55 * * * * root /home/deven/test.sh

That's it. You're done. Let that run for 48 hours and then check the system logs to see if there are any disk/memory/CPU/network errors. If there are, then you may have a hardware issue you will need to address before going to production with that server.


I hope you found this article useful. If you have other ideas and or comments, feel free to add them below!!!


Comments

Popular Posts