I have a standard droplet (Ubuntu 20.04) and a managed MySQL 8 database. Both are in the same VPC. On the droplet an application is connecting to the database via the hostname suitable to be used within the VPC: private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
The application reports infrequent errors about not able to resolve that VPC hostname, resulting in error message: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution
Here are some timings to get an idea about the frequency. Most of the time, the duration of the issue is only a couple of seconds.
2022/05/08 19:32:14 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:15 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:19 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:19 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:21 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:25 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:25 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:30 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:31 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:34 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:35 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:36 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:37 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:38 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:10 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:14 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:16 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:17 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:22 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:23 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:24 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:32 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:33 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:33 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:34 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:38 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:39 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:42 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:43 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:43 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:44 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:44 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:54 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:55 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:57 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:05 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:09 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:11 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:15 getaddrinfo failed: Temporary failure in name resolution
The file /etc/resolv.conf contains:
nameserver 127.0.0.53
options edns0 trust-ad
which has been untouched since creating the droplet.
So I made 2 scripts to investigate the issue. One script is endlessly pinging the database VPC hostname, writing the pong to a file with datetime in front of it.
ping private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com | while read pong; do echo "$(date): $pong"; done > "/root/ping-result.log"
The other script is endlessly querying the DNS server to resolve that database VPC hostname. The interval is 0.5 seconds so I won’t miss a downtime spot.
while true; do
sleep 0.5
dig private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com &>> ~/dig-$(date +"%Y%m%d-%H").log
done
Now I only had to wait for the following failure, which I now share with you. The latest message (from the log above - 2022/05/10 00:26:15) is taken as sample.
ping-result.log doesn’t show any problems. The database machine is ping-able, which was expected, since it doesn’t involve any DNS issue once the ping-loop is running. Only on initialization (starting the ping), the DNS is queried for its IP address. Below the time-span from within the error ‘Temporary failure in name resolution’ was recorded.
Tue May 10 00:26:10 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15891 ttl=64 time=0.636 ms
Tue May 10 00:26:11 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15892 ttl=64 time=0.660 ms
Tue May 10 00:26:12 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15893 ttl=64 time=0.455 ms
Tue May 10 00:26:13 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15894 ttl=64 time=0.632 ms
Tue May 10 00:26:14 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15895 ttl=64 time=0.622 ms
Tue May 10 00:26:15 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15896 ttl=64 time=0.645 ms
Tue May 10 00:26:16 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15897 ttl=64 time=0.549 ms
Tue May 10 00:26:17 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15898 ttl=64 time=0.630 ms
Moving on to the ‘dig’ DNS query log.
The dig-20220510-00.log shows the interesting piece. Now, i’m posting this in separate blocks of text, to make more clear what each ‘dig’ request output was.
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51792
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. IN A
;; ANSWER SECTION:
private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. 0 IN A 10.110.64.5
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue May 10 00:25:29 CEST 2022
;; MSG SIZE rcvd: 116
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27130
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. IN A
;; ANSWER SECTION:
private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. 14 IN A 10.110.64.5
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue May 10 00:26:16 CEST 2022
;; MSG SIZE rcvd: 116
I included 2 positive ‘dig’ queries (the first and last one). In between 3 connections timed out. Now, as the ‘dig’ query is very fast, all successful queries are recorded with an interval of 0.5 seconds. Those 2 positive ‘dig’ queries may look like that the DNS server is unreachable for about 47 seconds (00:25:29 > 00:26:16).
Now, I would like to know, is this an issue that is only resolvable by DigitalOcean. Is there more to research? Is there more I can share with you, which may lead to the real root cause of the getaddrinfo failed: Temporary failure in name resolution errors?
In the meantime, I will set up another fresh droplet within that VPC and run the same scripts.
Based on the current research results, I think it’s safe to think that the DigitalOcean DNS services have network or stability issues which need to be resolved.
Click below to sign up and get $100 of credit to try our products over 60 days!