Discussion:
nfs error: Device or resource busy
(too old to reply)
Rahul
2008-12-06 00:19:06 UTC
Permalink
On our cluster I found several huge logs having messages of this sort:

rm: cannot remove `backnc-Iter54/.nfs0000000005e7001100000381': Device or
resource busy

I can see that this is some sort of nfs malfunction but I can't seem to see
where or what's going wrong. Furthermore this doesn't happen all the time.
Are there any guidelines about where I should be sniffing to get to the
root of these troubles? They happen only occassionally and so its hard to
find what exactly causes these.

Have other people noticed similar errors?
--
Rahul
Gnack Nol
2008-12-06 01:55:31 UTC
Permalink
Post by Rahul
rm: cannot remove `backnc-Iter54/.nfs0000000005e7001100000381': Device or
resource busy
I can see that this is some sort of nfs malfunction but I can't seem to
see where or what's going wrong. Furthermore this doesn't happen all the
time. Are there any guidelines about where I should be sniffing to get to
the root of these troubles? They happen only occassionally and so its hard
to find what exactly causes these.
Have other people noticed similar errors?
I use usb flash drives and hard drives regularly and trying to remove the
drive you have written to without first unmounting it and waiting for the
system to finish writing to it usually results in this kind of error.
So I would guess you are trying to remove an external drive / flash unit that
You have recently written to without unmounting it and waiting for the
write behind cache to finish writing to it.

Linux always uses write behind to record the changes to drives and must be
allowed to flush the changes to disc on exit or you risk disc damage due
to incorrect directory FAT information.

Gnack
The Natural Philosopher
2008-12-06 11:19:41 UTC
Permalink
Post by Gnack Nol
Post by Rahul
rm: cannot remove `backnc-Iter54/.nfs0000000005e7001100000381': Device or
resource busy
I can see that this is some sort of nfs malfunction but I can't seem to
see where or what's going wrong. Furthermore this doesn't happen all the
time. Are there any guidelines about where I should be sniffing to get to
the root of these troubles? They happen only occassionally and so its hard
to find what exactly causes these.
Have other people noticed similar errors?
I use usb flash drives and hard drives regularly and trying to remove the
drive you have written to without first unmounting it and waiting for the
system to finish writing to it usually results in this kind of error.
So I would guess you are trying to remove an external drive / flash unit that
You have recently written to without unmounting it and waiting for the
write behind cache to finish writing to it.
Linux always
by default, but can be configured otherwise
Post by Gnack Nol
uses write behind to record the changes to drives and must be
allowed to flush the changes to disc on exit or you risk disc damage due
to incorrect directory FAT information.
sync; unmount <disk>
works well.
Post by Gnack Nol
Gnack
Sam
2008-12-06 02:14:48 UTC
Permalink
Post by Rahul
rm: cannot remove `backnc-Iter54/.nfs0000000005e7001100000381': Device or
resource busy
I can see that this is some sort of nfs malfunction but I can't seem to see
where or what's going wrong.
This is not a malfunction. This occurs when a deleted file is still open by
some process. It's an artifact of how NFS works behind the scenes. An NFS
server cannot actually remove a file if something still has it open. The
Linux kernel can easily do it with local disk files -- the inode still
remains even after its unlinked from all directories, and the inode gets
freed when the last process that has the file open terminates. However this
does not work with NFS, so the NFS server keeps this fake directory entry
that represents an open file, and it will be automatically removed when
whatever process has this file open terminates.
Post by Rahul
Furthermore this doesn't happen all the time.
Are there any guidelines about where I should be sniffing to get to the
root of these troubles? They happen only occassionally and so its hard to
find what exactly causes these.
You can use the lsof command to find out which process has this file open,
and kill this process. But this works only if the process that has this file
open is on the same machine. If the file is open by some process on another
machine, you have to figure out which machine it is and run lsof on that
machine.
Rahul
2008-12-08 17:43:21 UTC
Permalink
Post by Sam
You can use the lsof command to find out which process has this file
open, and kill this process. But this works only if the process that
has this file open is on the same machine. If the file is open by some
process on another machine, you have to figure out which machine it is
and run lsof on that machine.
Thanks! That makes sense.

But my solution is trickier. This occurs with a batch job run on my system.
There several machines running this job in parallel and all accessing their
files mounted via NFS on a central server. Not really sure how to handle
it. The file could have remained open on any one machine. Eventually I
discover it when the logs expand so much to make the partition 100% full
due to the annoying NFS error messages.

Going in a different direction is there a way to disable this logging from
NFS. NFS seems to echo a line every second or so when this happens and that
is really bloating my logs!
--
Rahul
Loading...