NFS server flakiness

18 March 2023, 19:25:09

I've got Artix, running s6, on two laptops. I'm trying to mount an NFS share from one on the other. When i start the nfs server:

Quote

sudo s6-rc -u change nfs-server

everything looks good except there are no 'nfsd' processes running and /proc/fs/nfsd isn't mounted. So, obviously, on the client side, i'm just getting "$ mount.nfs: Connection refused". Looking at what's running:

Quote

lap76:[craig]:~$ show nfs
root    261    1 0 Mar17 ?    00:00:00 s6-supervise nfs-server-log
root    301    1 0 Mar17 ?    00:00:00 s6-supervise nfs-server-srv
root 1169    2 0 Mar17 ? 00:00:00 [nfsiod]
s6log 20413   261 0 11:44 ? 00:00:00 s6-log -d3 -b -- n3 s2000000 T /var/log/nfs-server
lap76:[craig]:~$ show rpc
root    249    1 0 Mar17 ?    00:00:00 s6-supervise rpcbind-srv
root    303    1 0 Mar17 ?    00:00:00 s6-supervise rpcbind-log
s6log    1124   303 0 Mar17 ?   00:00:00 s6-log -d3 -b -- n3 s2000000 T /var/log/rpcbind
rpc    1142   249 0 Mar17 ? 00:00:00 rpcbind -f
rpcuser   1147   306 0 Mar17 ? 00:00:00 rpc.statd -F -d
root 1158    2 0 Mar17 ? 00:00:00 [rpciod]
root    20419   301 0 11:44 ? 00:00:00 rpc.mountd --foreground

Looking at /etc/s6/sv/nfs-server-srv/run, i notice that one process i'm not seeing in the above is 'rpc.nfsd'.
If i just run that, as root, /proc/fs/nfsd is mounted:

Quote

lap76:[craig]:~$ mount|grep nfs
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)

8 [nfsd] processes are being run by root, and the client mount then works. At this point, rpc.nfsd still isn't running, though, so i guess it doesn't hang around after starting up the nfsd processes.

Throughout all this -- and i tried many things -- /var/log/nfs-server/current was always empty. The only logging i saw was that when i started the nfs server, a message (from memory) about neither subtree_check nor no_subtree_check being specified in /etc/exports appeared in /run/uncaught-logs/current (which doesn't happen anymore, after i specified no_subtree_check) and when i stop the nfs server, i see this in the same logfile:

Quote

@400000006415e804280004fb umount: /proc/fs/nfsd: not mounted.
@400000006415e80428159800 foreground: warning: unable to spawn umount-l: No such file or directory

Which makes sense, as /proc/fs/nfsd isn't being mounted (nor nfsd processes being run), unless i run rpc.nfsd manually.

Anyone have any idea what is going wrong?

Re: NFS server flakiness

Reply #1 – 19 March 2023, 20:03:27

Quote from: craigduncan – on 18 March 2023, 19:25:09

Code: [Select]
@400000006415e80428159800 foreground: warning: unable to spawn umount-l: No such file or directory
Which makes sense, as /proc/fs/nfsd isn't being mounted (nor nfsd processes being run), unless i run rpc.nfsd manually.

Anyone have any idea what is going wrong?

Haven't used NFS for 2 decades, but that 'umount-l' should be 'umount -l', notice the space. Or I don't know what I'm talking about.

Re: NFS server flakiness

Reply #2 – 20 March 2023, 04:22:47

I assume that is just a typo in the error output. I'd check the source code if i knew where it was. I don't believe it has anything to do with the problem, though. If you haven't used NFS for decades, is this for lack of wanting to share filesystems or you know of a better way? I'd take an alternative that works properly, but... all i know of is samba, and i don't see that as better.

Re: NFS server flakiness

Reply #3 – 20 March 2023, 12:07:36

Can you get it to work entirely manually ?
IE. no services , just in a terminal.

Re: NFS server flakiness

Reply #4 – 20 March 2023, 12:29:51

If, after starting the server (sudo s6-rc -u change nfs-server) i run:

Code: [Select]

~$ sudo rpc.nfsd
rpc.nfsd: knfsd is currently down
rpc.nfsd: Writing version string to kernel: -2 +3
rpc.nfsd: Created AF_INET TCP socket.
rpc.nfsd: Created AF_INET6 TCP socket.
~$

everything then works fine. As i said, nfsd is now mounted on /proc/fs/nfsd and 8 nfsd processes are running.
If i just bring the nfs-server service down and back up again, nfsd is not mounted and there are no nfsd processes running.

Code: [Select]


~$ cat /etc/s6/sv/nfs-server-srv/run
#!/bin/execlineb -P
foreground { modprobe sunrpc }
foreground { modprobe nfs }
foreground { modprobe nfsd }
foreground { mountpoint -q /var/lib/nfs/rpc_pipefs }
redirfd -w 1 /dev/null
foreground { exportfs -ra }
foreground { rpc.nfsd -- = 4 }
fdmove -c 2 1
exec rpc.mountd --foreground
~$

When i start nfs-server, rpc.mountd is running, when i stop it, it's not. But it doesn't appear that the above is running rpc.nfsd.
I assume the above runs things (like rpc.nfsd) as root. I don't know what the "=4" is about on that line.

Re: NFS server flakiness

Reply #5 – 20 March 2023, 12:39:18

My apologies you had already stated what I asked.
If it's a problem with the S6 service that's beyond me. Very little experience.

Re: NFS server flakiness

Reply #6 – 20 March 2023, 15:04:38

Quote from: craigduncan – on 20 March 2023, 12:29:51

I assume the above runs things (like rpc.nfsd) as root. I don't know what the "=4" is about on that line.

Uh yeah, that's just a typo that no one ever noticed until now. It should have been just "rpc.nfsd -- 4" (copied from the runit script). I pushed a new version of nfs-utils-s6 to testing so hopefully that does the trick. The daemon appears to run over here at least.

Re: NFS server flakiness

Reply #7 – 20 March 2023, 17:11:16

Quote from: craigduncan – on 20 March 2023, 04:22:47

If you haven't used NFS for decades, is this for lack of wanting to share filesystems or you know of a better way? I'd take an alternative that works properly, but... all i know of is samba, and i don't see that as better.

I've found samba to be enough for my home media sharing needs, but I'm not saying it's any better nor easier to setup - NFS is pretty straightforward anyway. I also tend to use sshfs a lot, which is dead simple and only requires sshd on the serving side.

Re: NFS server flakiness

Reply #8 – 20 March 2023, 21:44:27

Code: [Select]

...that's just a typo that no one ever noticed until now. It should have been just "rpc.nfsd -- 4"

I changed that bit of code in the script to just "4" as the only arg, and then *no* args. Either way, rpc.nfsd still didn't appear to run. What i don't understand (among many things) is why there is no log output. I've been reading up on execline to try to figure out how to get some logging output. Before the call to rpc.nfsd (in the script) stdout is redirected to /dev/null ("redirfd -w 1 /dev/null"), *immediately* after which "exportfs -ra" is called, then "rpc.nfsd".
I'm not conversant enough with execline, though, to know how this redirection affects the running of rpc.nfsd. I haven't entirely wrapped my mind around how execline handles the chaining of execution in this regard.

Tangentially, as nous pointed out "umount-l" is wrong. I'd thought that was in relation to trying to unmount nfsd. But then i noticed what looks like another typo -- this in /etc/s6/sv/nfs-server-srv/finish -- which seems to be responsible for this. Note:

Code: [Select]

foreground { umount-l /var/lib/nfs/rpc_pipefs }

Just fmi, how do you get a quote into the post that is prefaced by something like "Quote from: craigduncan...",
as i see many of the quotes above have done?

Also, i'm not familiar with sshfs. Thanks. I'll have to check that out.

Re: NFS server flakiness

Reply #9 – 20 March 2023, 21:55:49

Quote from: craigduncan – on 20 March 2023, 21:44:27

Just fmi, how do you get a quote into the post that is prefaced by something like "Quote from: craigduncan...",
as i see many of the quotes above have done?

Quote button bottom right of a post.

Quote

/dev/null

Semi no idea what I'm talking about but can you not just temporarily edit that to point at a file ?

Quote

/tmp/nfslog.txt

See if there's anything useful written into there ?

Re: NFS server flakiness

Reply #10 – 20 March 2023, 22:31:55

Quote from: craigduncan – on 20 March 2023, 21:44:27

I changed that bit of code in the script to just "4" as the only arg, and then *no* args. Either way, rpc.nfsd still didn't appear to run. What i don't understand (among many things) is why there is no log output. I've been reading up on execline to try to figure out how to get some logging output. Before the call to rpc.nfsd (in the script) stdout is redirected to /dev/null ("redirfd -w 1 /dev/null"), *immediately* after which "exportfs -ra" is called, then "rpc.nfsd".
I'm not conversant enough with execline, though, to know how this redirection affects the running of rpc.nfsd. I haven't entirely wrapped my mind around how execline handles the chaining of execution in this regard.

Unless you rebuilt and updated the s6-rc database, merely changing the script isn't going to have any affect. Admittedly, I don't really remember why I put the redirfd line in the middle there. I would expect that silences all log output after that point which seems wrong. I'll revisit it later.

Quote

Tangentially, as nous pointed out "umount-l" is wrong. I'd thought that was in relation to trying to unmount nfsd. But then i noticed what looks like another typo -- this in /etc/s6/sv/nfs-server-srv/finish -- which seems to be responsible for this. Note:
Code: [Select]
foreground { umount-l /var/lib/nfs/rpc_pipefs }

Ah yep, duh. That's wrong too. I just pushed -2 to testing for this.

SOLVED: Re: NFS server flakiness

Reply #11 – 21 March 2023, 00:47:30

I did *not* reload the s6 database. Things just hadn't clicked re that until you mentioned it. But... it *was* the '=' that was causing rpc.nfsd to fail. Once i started running s6-db-reload results improved greatly. :-) I couldn't get any error output, though, when i inserted "redirfd -w 2 /tmp/nfs.log" right before the call to rpc.nfsd (leaving '=' as an arg)... even when reloading the db. Need to do more studying of execline i guess.

Anyway, that solves that. Thanks for fixing those scripts. It's nice to have nfs working properly.