Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 3745

Troubleshooting • crazy issues with NFS and 2 pi's

$
0
0
Hi,

I've fallen into NFS hell again. I recently added a pi5 with 4xSATA hat to my homelab and moved the 2 SSDs I had previously connected to other linux servers via USB enclosures to this new one. It worked nicely for 1-2 weeks. Tonight, the weirdest of issues happened. This was ultimately resolved by a reboot, but I'd like to understand.

Setup:
- 1 pi 4 running many things including Kodi, with a projector connected via HDMI, and jellyfin
- 1 pi 5 with the SATA hat and the drives, running nfs-kernel-server and samba (the pi4 is connected to it via nfs)
- other linux servers (x86)
- all the above connected via a tp-link switch, itself connected to my ISP's modem/router/wifi point
- a macbook connected to the router over wifi

The issue as I went through it:

1. watching a movie over jellyfin from the macbook in the late afternoon worked, but was very slow to start, when normally everything is super snappy; didn't think too much of it at first

2. later, trying to watch another one from the pi4 with kodi, it would load extremely slowly, stutter, buffer... it was unusable, so I started wondering if something was wrong with the file server

3. connected to the pi5, I found thousands of dmesg log lines (typically ~50 within the same second) like

Code:

Oct 17 20:49:23 pi5 kernel: rpc-srv/tcp: nfsd: sent 1045898 when sending 1045896 bytes - shutting down socket
The exact same line, with those exact numbers, every time.
I tried to see what could be wrong on the server side. I restarted nfs, rebooted several times, tried to tune some parameters a bit randomly based on what I could see at https://serverfault.com/questions/88049 ... ng-1048708, apt upgrade'd (triggering a kernel upgrade) and rebooted once more, rebooted my ISP's router just in case... nothing solved it other than shutting down nfs-kernel-server, which wasn't very useful as all my linux servers depend on this.

4. it occurred to me to ssh into the pi4 as well, to see if it had interesting nfs logs on its side. It didn't. I had a number of HDMI CEC timeout errors, which is probably a separate issue that I need to solve. But nothing related to nfs. However, I tried ls-ing the nfs mount and it was very slow to respond (though it did), so even such a basic operation was being affected.

5. I thought of connecting to the other linux servers I have and doing the same ls command (on the same mount that they all share). It was super snappy, no issue on those!

6. That's when I thought: well, maybe I should reboot the pi4. I did. Problem solved, I watched the whole movie, no single nfsd error line on the pi5.


So... rebooting the pi4, ONE of the several nfs clients connected to the pi5's nfs server, solved the issue that was not being logged anywhere on the pi4 itself but was being logged in the pi5's logs. The issue that apparently had also affected my macbook earlier on - but not the other linux servers. My brain explodes.

While googling the rpc-srv/tcp error, I found only fairly old posts (2016-2018). I found no single one where, like in my case, the complaint was that the bytes send were MORE than the bytes "sending" -- all the posts I found were about "sent ONLY x bytes when sending y bytes" where y > x. And nothing at all in that vein from the last few years.

=> does anyone have any idea of what happened here?
=> can I do anything to avoid this happening again in the future?

Feel free to ask for any additional details such as configurations etc. I can't think of anything relevant to include, to be honest.

Thanks!
P.

Statistics: Posted by pierric — Thu Oct 17, 2024 8:21 pm



Viewing all articles
Browse latest Browse all 3745

Trending Articles