Common Issues – Main Drive Full (Mounting Issues)

So the recent problems with a server at my job and a few issues I’ve seen of recent gave me an idea for a new Common Issues post. This one is based on the common issue of the main drive being full and various mounting issues. As always, this email comes with all the disclaimers previously mentioned, as well as the biggy below:

Issue (Note: This only applies to our shared and reseller servers so we can assume they have a separate backup drive):

Client contacts us saying that they cannot receive email, their site is down, and FTP is throwing errors. A good thing to check is the drive usage of the server to ensure that it is not full. This can be done by using the command “df -h” which returns similar output as below:

root@server [~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda5             104G   11G   88G  11% /
/dev/hda1              99M   32M   63M  34% /boot
/dev/hda2             2.9G   80M  2.7G   3% /tmp
/dev/mapper/nvidia_dbechcfep1     917G   27G  844G   4% /home
/dev/mapper/nvidia_dbechcfep2     917G   72G  799G   9% /backup

This output can be interpreted as follows:

  • Filesystem – This is the Block location of the partition in question. This can be read as follows:
    • – /dev/ (Device) h (Drive Interface; h = ata; s = sata) d (Drive) a (Drive Letter; can be a,b,c,etc.. depending on number of drives) 1(Partition Number; can be 1,2,3,etc… depending on number of partitions)
    • Size – Total size of the partition
    • Used – Total amount of disk space used on the partition
    • Avail – Total amount of disk space available on the partition
    • Use% – Total amount of disk space used/Total amount of disk space in percentage form
    • Mounted On – Location the partition is mounted on the filesystem

If you see a 0 in Avail or a 100% in Use% next to the drive mounted on / then you have found your issue. Now to solve it. The best thing to check for this is backups. First, check if there is a backup drive on the server using the following command:

cat /etc/fstab | grep -r backup

If the output of this command is similar to below (or it outputs anything really) then there is a backup drive:

root@server [~]# cat /etc/fstab | grep -r backup
/dev/mapper/nvidia_dbechcfep2        /backup                  ext3    defaults,noauto        0 0

Also, check and see if the backup drive is currently mounted. If it is, go ahead and unmount it so you can get a better look at the / filesystem. You can do this by using the ‘umount’ command as below:

umount /backup

If this outputs “This Drive is busy”, then you get to the more complex part. Finding out why it’s busy. To do this, we will use the ‘lsof’ command. This command is run on a block device such as below:

root@server [~]# lsof /dev/hda2
mysqld  5294  mysql    4u   REG    3,2     0     13 /tmp/ibb2mUbP (deleted)
mysqld  5294  mysql    5u   REG    3,2    69     14 /tmp/ibWiEoN3 (deleted)
mysqld  5294  mysql    6u   REG    3,2     0     15 /tmp/ib0O7Soi (deleted)
mysqld  5294  mysql    7u   REG    3,2     0     16 /tmp/ibuN6K0w (deleted)
mysqld  5294  mysql   11u   REG    3,2     0     17 /tmp/ibxm1GDL (deleted)
jsvc    6106 tomcat  mem    REG    3,2 32768 351650 /tmp/hsperfdata_tomcat/6106
cpdavd  8729   root    0r   REG    3,2 16671     72 /tmp/sh-thd-1238113224 (deleted)

As you can see, in my example I used my /tmp/ partition and the lsof command told me that there are three separate unique processes using the partition; MySQL, JSVC (Java Server Virtual Console), and cpdavd (cPanel). In order to unmount this partition successfully, I will need to end these processes. When available, use the stop scripts available to you such as the following in my case:

/etc/init.d/mysql stop
/etc/init.d/jsvc stop
/etc/init.d/cpanel stop

Once these processes are stopped, you should be able to unmount. If the process doesn’t have a stop script or perhaps if it is cpbackup or a similar script running, you will need to kill the process using the ‘kill’ command. Find the PID of the process from the output of lsof and use it as accordingly:

kill -9 5294

In my example I killed the MySQL process that was running on the server with PID 5294. Obviously, this number will differ depending on the server, time, etc. Once lsof is showing no output, you should be able to unmount and proceed with cleaning the main drive. This can be done by just removing the backups on the main drive with rm -rf. Once the rm has been running a few minutes, go ahead and open a second SSH session and start restarting services that had been failing due to no disk space such as cPanel, MySQL, Exim, and HTTPD. Once the rm is complete, go ahead and remount the /backup/ drive.

Please note: The drive mounting and unmounting portion of this email can be used in more cases then just the drive being full. Sometimes, /tmp/ can be having an issue that unmounting and mounting can solve and a similar process as above would be required.

Also, Note this is just one solution for a main drive being full. Though this is the most common issue, it is not the only problem that could arise. Other issues like core dumps from PHP can quickly fill a drive or just plain usage. For core dumps, you can use a script such as below to locate core dumps to remove:

find /home/ -iname "core.*"

This will locate all core dumps in user directories so you can delete them and inform the client of their broken script. Also, if the drive is as far as you can see, legitimately full (backups on backup drive and no core dumps), please do not hesitate to email and CC the office and the NOC about the issue so Abuse can check for abusive users to remove or the NOC can look into restoring the server as soon as possible.

As with all of my Common Issues posts, please do not hesitate to ask any questions on this email by replying all to it so that all can benefit from the question and answer. If you are unsure on anything in this email also, do not hesitate to ask me or any other senior staff about the issue and possible resolutions.

Thanks for reading and have a great day!