So the recent problems with a server at my job and a few issues I’ve seen of recent gave me an idea for a new Common Issues post. This one is based on the common issue of the main drive being full and various mounting issues. As always, this email comes with all the disclaimers previously mentioned, as well as the biggy below:
Issue (Note: This only applies to our shared and reseller servers so we can assume they have a separate backup drive):
Client contacts us saying that they cannot receive email, their site is down, and FTP is throwing errors. A good thing to check is the drive usage of the server to ensure that it is not full. This can be done by using the command “df -h” which returns similar output as below:
[email protected] [~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda5 104G 11G 88G 11% / /dev/hda1 99M 32M 63M 34% /boot /dev/hda2 2.9G 80M 2.7G 3% /tmp /dev/mapper/nvidia_dbechcfep1 917G 27G 844G 4% /home /dev/mapper/nvidia_dbechcfep2 917G 72G 799G 9% /backup
This output can be interpreted as follows:
- Filesystem – This is the Block location of the partition in question. This can be read as follows:
- – /dev/ (Device) h (Drive Interface; h = ata; s = sata) d (Drive) a (Drive Letter; can be a,b,c,etc.. depending on number of drives) 1(Partition Number; can be 1,2,3,etc… depending on number of partitions)
- Size – Total size of the partition
- Used – Total amount of disk space used on the partition
- Avail – Total amount of disk space available on the partition
- Use% – Total amount of disk space used/Total amount of disk space in percentage form
- Mounted On – Location the partition is mounted on the filesystem
If you see a 0 in Avail or a 100% in Use% next to the drive mounted on / then you have found your issue. Now to solve it. The best thing to check for this is backups. First, check if there is a backup drive on the server using the following command:
cat /etc/fstab | grep -r backup
If the output of this command is similar to below (or it outputs anything really) then there is a backup drive:
[email protected] [~]# cat /etc/fstab | grep -r backup /dev/mapper/nvidia_dbechcfep2 /backup ext3 defaults,noauto 0 0
Also, check and see if the backup drive is currently mounted. If it is, go ahead and unmount it so you can get a better look at the / filesystem. You can do this by using the ‘umount’ command as below:
If this outputs “This Drive is busy”, then you get to the more complex part. Finding out why it’s busy. To do this, we will use the ‘lsof’ command. This command is run on a block device such as below:
[email protected] [~]# lsof /dev/hda2 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME mysqld 5294 mysql 4u REG 3,2 0 13 /tmp/ibb2mUbP (deleted) mysqld 5294 mysql 5u REG 3,2 69 14 /tmp/ibWiEoN3 (deleted) mysqld 5294 mysql 6u REG 3,2 0 15 /tmp/ib0O7Soi (deleted) mysqld 5294 mysql 7u REG 3,2 0 16 /tmp/ibuN6K0w (deleted) mysqld 5294 mysql 11u REG 3,2 0 17 /tmp/ibxm1GDL (deleted) jsvc 6106 tomcat mem REG 3,2 32768 351650 /tmp/hsperfdata_tomcat/6106 cpdavd 8729 root 0r REG 3,2 16671 72 /tmp/sh-thd-1238113224 (deleted)
As you can see, in my example I used my /tmp/ partition and the lsof command told me that there are three separate unique processes using the partition; MySQL, JSVC (Java Server Virtual Console), and cpdavd (cPanel). In order to unmount this partition successfully, I will need to end these processes. When available, use the stop scripts available to you such as the following in my case:
/etc/init.d/mysql stop /etc/init.d/jsvc stop /etc/init.d/cpanel stop
Once these processes are stopped, you should be able to unmount. If the process doesn’t have a stop script or perhaps if it is cpbackup or a similar script running, you will need to kill the process using the ‘kill’ command. Find the PID of the process from the output of lsof and use it as accordingly:
kill -9 5294
In my example I killed the MySQL process that was running on the server with PID 5294. Obviously, this number will differ depending on the server, time, etc. Once lsof is showing no output, you should be able to unmount and proceed with cleaning the main drive. This can be done by just removing the backups on the main drive with rm -rf. Once the rm has been running a few minutes, go ahead and open a second SSH session and start restarting services that had been failing due to no disk space such as cPanel, MySQL, Exim, and HTTPD. Once the rm is complete, go ahead and remount the /backup/ drive.
Please note: The drive mounting and unmounting portion of this email can be used in more cases then just the drive being full. Sometimes, /tmp/ can be having an issue that unmounting and mounting can solve and a similar process as above would be required.
Also, Note this is just one solution for a main drive being full. Though this is the most common issue, it is not the only problem that could arise. Other issues like core dumps from PHP can quickly fill a drive or just plain usage. For core dumps, you can use a script such as below to locate core dumps to remove:
find /home/ -iname "core.*"
This will locate all core dumps in user directories so you can delete them and inform the client of their broken script. Also, if the drive is as far as you can see, legitimately full (backups on backup drive and no core dumps), please do not hesitate to email [email protected] and CC the office and the NOC about the issue so Abuse can check for abusive users to remove or the NOC can look into restoring the server as soon as possible.
As with all of my Common Issues posts, please do not hesitate to ask any questions on this email by replying all to it so that all can benefit from the question and answer. If you are unsure on anything in this email also, do not hesitate to ask me or any other senior staff about the issue and possible resolutions.
Thanks for reading and have a great day!