diff options
author | Suren A. Chilingaryan <csa@suren.me> | 2019-10-06 05:00:55 +0200 |
---|---|---|
committer | Suren A. Chilingaryan <csa@suren.me> | 2019-10-06 05:00:55 +0200 |
commit | ba144fab071258a97cf3c42a0defeb0aae41a353 (patch) | |
tree | 2e738d4e4774d754b56d79021cc8781b3c0835a5 /docs/logs.txt | |
parent | efe4b9bbe3c9cb950378de9697eed2030ac49ca2 (diff) | |
download | ands-ba144fab071258a97cf3c42a0defeb0aae41a353.tar.gz ands-ba144fab071258a97cf3c42a0defeb0aae41a353.tar.bz2 ands-ba144fab071258a97cf3c42a0defeb0aae41a353.tar.xz ands-ba144fab071258a97cf3c42a0defeb0aae41a353.zip |
Document latest problems with docker images and resource reclaimation, add docker performance checks in the monitoring scripts, helpers to filter the logs
Diffstat (limited to 'docs/logs.txt')
-rw-r--r-- | docs/logs.txt | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/docs/logs.txt b/docs/logs.txt index e27b1ff..d33ef0a 100644 --- a/docs/logs.txt +++ b/docs/logs.txt @@ -2,6 +2,10 @@ ================= - Various RPC errors. ... rpc error: code = # desc = xxx ... + + - PLEG is not healthy: pleg was last seen active 3m0.448988393s ago; threshold is 3m0s + This is severe and indicates communication probelm (or at least high latency) with docker daemon. As result the node can be marked + temporary NotReady and cause eviction of all resident pods. - container kill failed because of 'container not found' or 'no such process': Cannot kill container ###: rpc error: code = 2 desc = no such process" Despite the errror, the containers are actually killed and pods destroyed. However, this error likely triggers @@ -25,10 +29,14 @@ There are no adverse effects to this. It is a potential kernel issue, but should be just ignored by the customer. Nothing is going to break. https://bugzilla.redhat.com/show_bug.cgi?id=1425278 - - E0625 03:59:52.438970 23953 watcher.go:210] watch chan error: etcdserver: mvcc: required revision has been compacted seems fine and can be ignored. + - E0926 09:29:50.744454 93115 mount_linux.go:172] Mount failed: exit status 1 + Output: Failed to start transient scope unit: Connection timed out + It seems caused by too many parallel mounts (about 500 per-node) may cause systemd to hang. + Details: https://github.com/kubernetes/kubernetes/issues/79194 + * Suggested to use 'setsid' to mount volumes instead of 'systemd-run' /var/log/openvswitch/ovs-vswitchd.log ===================================== |