CPU load over 70 means I can't even ssh into my server

PlutoniumAcid@lemmy.world · edit-2 2 years ago

CPU load over 70 means I can't even ssh into my server

Neuromancer@lemm.ee · 2 years ago

run top and paste the output the top portion of the screen.

I would suspect it is IO wait. You can get into disk contention if you have multiple containers fighting for disk. You will notice the IO queue is building up and that is shows you are waiting for IO transactions.

%Cpu(s): 67.4 us, 13.0 sy, 0.0 ni, 19.4 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st

See the field labeled WA, that is wait time. Basically time you are waiting for IO to complete.

If that is high, you can increase the cache used by Linux BUT if the system crash you are at risk of losing saves.

atzanteol@sh.itjust.works · 2 years ago

“load” is not “CPU usage.” It’s “system usage” and includes disk and network activity. Including swapping if you’re low on memory.

vmstat can tell you what your disk io looks like. Iotop can help with narrowing it down to a process.

Black Xanthus@lemmy.world · edit-2 2 years ago

The last time I saw this was on a slow-failing HDD.

Check a quick fsck might get you a few answers. You can find more info in the Linux manual. It could just be one or two bad blocks that you can recover and fix the problem (though, ofc, it’s time to backup your data).

The other, slightly unusual time I’ve seen it is with mixed RAM. 16gb made of 2x6g and then 2x4gb did some real odd things to the system. If it’s not the disk, and your box will boot with one stick of ram, try it to see if it fixes the issue. It could be that your RAM speeds are off (or your like me and just put two sticks you had lying around, and it basically worked until it didn’t).

An outlier, that I’ve not seen on modern machines is io/wait for a CD-ROM to spin up, even if your not accessing the CD-ROM. Normally caused by bad cabling. Based on the age of your machine, this is unlikely, but it might be worth unplugging devices to see if one is bad and not reporting properly.

This is, if course, assuming dmsg is empty

Final thought: see if your running SELinux. If you are, turn it off and try again. Those policies are complex, and something installed in a non-standard place could be causing SELinux to slow IO as it fills your logs with warnings.

Hope that helps,

PlutoniumAcid@lemmy.world · 2 years ago

Do not run fsck on a mounted device

So how do I run this on /dev/sda? I can’t very well unmount the OS drive…

PuppyOSAndCoffee@lemmy.ml · 2 years ago

many people aren’t running containers on RBpi … while feasible, it was notoriously poor until the 8GB pi4, and still is easily bounded by SD card I/o. are there docker stats so you can see the disk + net I/o of each container?

Xianshi@lemm.ee · 2 years ago

I’d try each application one by one. Maybe write a script to monitor load and stop the program if it goes past your desired threshold and notify you.

It could also be a setting in some app like photoprism or immich … I think one of them uses tensorflow to classify images. That would increase the load if thats running in the background.

Maybe try them with an empty directory so there is no data to process and see if you encounter the error. Then add some data and see how the load is.