[SOLVED] runit hell #1

Topic: [SOLVED] runit hell #1 (Read 1585 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

[SOLVED] runit hell #1

17 March 2022, 04:05:22

I can't get services of my own making to run at boot most of the time.

Some work, some don't, and I can find no explanation.

I have multiple reverse ssh tunnels that start up with a watchdog.

The reverse ssh tunnels work, but the watchdog function is ignored no matter what I do.

So, I created a crude/sloppy workaround thing that will make the machine reboot itself if it can't reach itself via the tunnel 3 times in a row, timer, etc... This works, sort of... I cannot put the command directly in the "/sv/run" file. If I do, it tells me:

Code: [Select]

timeout: down: [service name]: 1s, normally up, want up

...and refuses to do it's job. But, if I make a shell script and tell it to run that, it works.

I have no clue why the exact same thing works as a shell script and not directly in the /sv/run file. Don't ask me why. I've read the docs a thousand times and I know nothing more now than when I first heard of runit. I'm not even going to bother looking at it again.

Further down the list of troubles, I have two more services I need to launch at boot, but I have found no way to force runit to do it's job. Tried the "make it a scrupt and then put the script in the runit file" trick like above, but it doesn't work. It simply will not run them. I have to manually run a startup script every time the machine is rebooted. The startup script works fine. Same exact thing will not run when called by runit, and will not run from the /sv/run file... Manual execution, fine, no problem...

I wish I could give you some useful diagnostic output, logs, etc.. But there's nothing of note, anywhere... It simply tells me:

Code: [Select]

timeout: down: [service name]: 1s, normally up, want up

I have never managed to get any other information out of runit for any of these things.

Runit seems to be a total mess. It will tell me nothing more than

Code: [Select]

timeout: down: [service name]: 1s, normally up, want up

for any and all of it's failures to function. It seems to be totally random whether it will do it's job or not.

Re: runit hell #1

Reply #1 – 17 March 2022, 09:53:44

runit works perfectly with the scripts I write myself and you give no info about your scripts which are most likely the root of the problem and not runit
> Taking a minimal service manager
> Complaining about the fact it is minimal

Re: runit hell #1

Reply #2 – 17 March 2022, 13:15:36

Quote from: camosoul – on 17 March 2022, 04:05:22

Runit seems to be a total mess. It will tell me nothing more than
Code: [Select]
timeout: down: [service name]: 1s, normally up, want up
for any and all of it's failures to function. It seems to be totally random whether it will do it's job or not.

runit uses shell scripts. How those scripts perform and what they do, including debug output, is completely up to the person writing the scripts within the environment of that init system.

Writing services for any init system is not something an average user would need to or know how to do. It requires knowledge of at least the basics of that init system and the shell.

Re: runit hell #1

Reply #3 – 17 March 2022, 13:59:41

Posting whatever scripts you are trying to run would be helpful.

Re: runit hell #1

Reply #4 – 17 March 2022, 21:24:58

Quote from: Dudemanguy – on 17 March 2022, 13:59:41

Posting whatever scripts you are trying to run would be helpful.

This works, sort of:

Code: [Select]

$ sudo cat /etc/runit/sv/autotunnel-[port]/run 
#!/bin/sh
exec chpst -u autotunnel /usr/bin/autossh -M 0 -i /home/autotunnel/.ssh/[sshkey] -NTR [muhport]:localhost:[muhport] [user]@[ip.ad.dre.ss] -o ServerAliveInterval=10 -o ServerAliveCountMax=5

While this runs and establishs the tunnel on any port I see fit, the ServerAliveCountMax option is ignored.

The intention was that it will self-terminate if 5 "ping" failures occur, and it will be restarted by the process monitor, reestablishing the reverse ssh tunnel. I never got to test that because autossh fails to obey ServerAliveCountMax. It never self-terminates no matter how many "ping" failures there are, so I have no idea if a monitor would or would not restart it. Not a runit problem. Yet... If it ever self-terminates as directed, I fully expect that runit will not restart the service. I have no faith in runit actually doing it's job. I've come to expect a cascade of failures/dysfunction and I'll have another nonsensical hell to sort out...

So, the crude/sloppy/stupid workarounds begin...

I created a second service to function as a watchdog. It attempts to curl through the port being forwarded. If this fails a number of times, the machine reboots itself, and this causes the autotunnel to run at boot and re-establish. Yes. I know that this is a terrible idea, but since doing it correctly isn't an option (autossh refuses to obey ServerAliveCountMax), I settled on it as a dumb way to at least get something working, sort of. It's also double terrible because it has to be run as root. The whole point of having an isolated service user for autossh was to avoid this kind of stupidity. But, I am forced to find a work around because, again, autossh doesn't obey it's own ServerAliveCountMax directive.

It looks like this:

Code: [Select]

$ sudo cat /etc/runit/sv/check-connectivity/run 
#!/bin/sh
exec chpst -u root /root/check.sh

Code: [Select]

$ sudo cat /root/check.sh
#!/bin/bash
while true
do
sleep 5m
curl -m 5 -L --silent https://[domain].com/ > /dev/null
fail=$?
if [ "$fail" -ne "0" ]
then
  /usr/bin/reboot
fi
done

I don't like this, but it gets the job done.

This has been runit hell #1.

Runit hell #2 consists of those things that runit simply will not do at all no matter what.

1) start a VM at boot. It just won't.

Code: [Select]

$ cat ./startvms.sh 
VBoxManage startvm [vm01] --type headless
VBoxManage startvm [vm02] --type headless ...

There also exists a manual VM shutdown shell script.

2) recently, I was tinkering with ssh-chat. I got it working, manually, as I see fit. However, I have the same problem with it as I do the VMs. Runit simply won't do it no matter what.

Code: [Select]

$ sudo cat /root/ssh-chat.sh
runuser -u ssh-chat -- ssh-chat --bind=[ip.ad.dre.ss]:[port] --identity=/path/to/key --motd=/path/to/motd.txt --allowlist=/path/to/allowlist

The simple shell scripts (really just a single command) run fine when manually executed. But, nothing can convince runit to execute these commands. Yes. I understand that the syntax isn't going to work in a runit script. I tried the correct stuff (exec chpst -u etc...) to the working autotunnel scripts, which do work, and it still gives me the useless:

Code: [Select]

timeout: down: [service name]: 1s, normally up, want up

It also refuses to do the cheesy "execute the shell script" work around as seen in the crude curl/reboot example above. So, the exact same thing works there, but not here... [shrug].

Since R-ing the FM has been pointless and I have learned absolutely nothing from it, I looked at several other repo-provided runit service files to see if I could learn from the examples, but they don't help. It's incredibly simple, as intended. I'm left holding the bag of "What works there doesn't work here; no explanation. It just doesn't."

Also, I have showed both of these:

Code: [Select]

#!/bin/bash
#!/bin/sh

It either works, or it doesn't, and this line makes no difference.

Re: runit hell #1

Reply #5 – 17 March 2022, 21:54:32

http://smarden.org/runit/faq.html#userservices

Quote from: camosoul – on 17 March 2022, 21:24:58

But, I am forced to find a work around because, again, autossh doesn't obey it's own ServerAliveCountMax directive.

Then blame autossh instead of runit.

Quote from: camosoul – on 17 March 2022, 21:24:58

Code: [Select]

$ sudo cat /root/check.sh
#!/bin/bash
while true
do
sleep 5m
curl -m 5 -L --silent https://[domain].com/ > /dev/null
fail=$?
if [ "$fail" -ne "0" ]
then
  /usr/bin/reboot
fi
done

If you need to periodically run some script, it is more efficient to use crond(8). About this shell script, it has some unnecessary code and can be made more compact:

Code: [Select]

#!/bin/sh
# (no need for bash as no bash-specific features are used)
while true; do
    sleep 5m

    # === this part could be made into a script and executed by cron every 5min ===
    if ! curl -m 5 -L --silent https://[domain].com/ > /dev/null; then
        /usr/bin/reboot    # need to be root for this
    fi
    # === end of part ===

done

And that's without getting into why is a reboot needed at all.

Re: runit hell #1

Reply #6 – 18 March 2022, 00:20:57

Meh, nevermind. I'll accept that runit doesn't work.

My more detailed posts keep getting deleted and it's pissing me off even more.

SOLVED: RUNIT DOESN'T WORK. THE DOCS ARE BOGUS. UNHELPFUL TROLLING IN RESPONSE TO INQUIRY.

Re: runit hell #1

Reply #7 – 18 March 2022, 00:57:04

Well you obviously aren't interested in solving the problem, but for future reference you can redirect stdout/stderr output to a file when testing these scripts to see what is going on. I don't recall if runit works like this or not, but for example s6 executes everything in a clean environment. This might mess up some programs that require certain environment variables to be present. It could explain why something appears to work when you run it locally in shell and not in runit (if it indeed works like this).

Re: runit hell #1

Reply #8 – 18 March 2022, 09:10:26

Thing is that runit does exactly what is described in the manual, there is no reason to blame it because whatever you are executing as a service doesn't work as expected.

I guess you also never read the manuals seeing your check script that you don't even make a check script as defined in runsvdir(8 ).

Just looks like a pebkac moment for me

Re: runit hell #1

Reply #9 – 20 March 2022, 01:37:38

We all know how this ends.