supervisor: Retry, with backoff, to connect to supervisor socket.

If `zulip-puppet-apply` is run during an upgrade, it will immediately
try to re-`stop-server` before running migrations; if the last step in
the puppet application was to restart `supervisor`, it may not be
listening on its UNIX socket yet.  In such cases, `socket.connect()`
throws a `FileNotFoundError`:

```
Traceback (most recent call last):
  File "./scripts/stop-server", line 53, in <module>
    services = list_supervisor_processes(services, only_running=True)
  File "./scripts/lib/supervisor.py", line 34, in list_supervisor_processes
    processes = rpc().supervisor.getAllProcessInfo()
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1116, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1456, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1160, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1172, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1285, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.9/xmlrpc/client.py", line 1315, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python3.9/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/lib/python3.9/http/client.py", line 950, in send
    self.connect()
  File "./scripts/lib/supervisor.py", line 10, in connect
    self.sock.connect(self.host)
FileNotFoundError: [Errno 2] No such file or directory
```

Catch the `FileNotFoundError` and retry twice more, with backoff.  If
it fails repeatedly, point to `service supervisor status` for further
debugging, as `FileNotFoundError` is rather misleading -- the file
exists, it simply is not accepting connections.
This commit is contained in:
Alex Vandiver 2023-05-12 18:15:26 +00:00 committed by Tim Abbott
parent b8f53ab6e8
commit 0da62e7cda
1 changed files with 14 additions and 1 deletions

View File

@ -1,4 +1,5 @@
import socket import socket
import time
from http.client import HTTPConnection from http.client import HTTPConnection
from typing import Dict, List, Optional, Tuple, Union from typing import Dict, List, Optional, Tuple, Union
from xmlrpc import client from xmlrpc import client
@ -7,7 +8,19 @@ from xmlrpc import client
class UnixStreamHTTPConnection(HTTPConnection): class UnixStreamHTTPConnection(HTTPConnection):
def connect(self) -> None: def connect(self) -> None:
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.sock.connect(self.host) connected = False
for i in range(0, 2):
try:
self.sock.connect(self.host)
connected = True
break
except FileNotFoundError:
# Backoff and retry
time.sleep(2**i)
if not connected:
raise Exception(
"Failed to connect to supervisor -- check that it is running, by running 'service supervisor status'"
)
class UnixStreamTransport(client.Transport): class UnixStreamTransport(client.Transport):