Troubleshooting Oracle OCI Cluster Startup Failures¶

Diagnosing and Fixing Problems
Preventing Problems

Diagnosing and Fixing Problems¶

The table that follows lists some common error messages that may be logged when a cluster fails to start, describes the underlying causes, and provides remedies:

Error message text Cause What to do

``Hadoop Bring up failed. File: <filename> could only be replicated to 0 nodes…``

Coordinator daemon cannot talk to

worker daemon, or worker is down or out of disk space.

Make sure you have configured the subnet so as to allow communication among all nodes: see Configuring Oracle OCI Resources.

The limit for this tenancy has been exceeded Bringing up this cluster would exceed this tenancy’s limit for instances of this type. Decrease the cluster size, or change the instance type, and try again. If that fails, ask Oracle support for a higher limit.

HEALTH-CHECK-FAILED. Reason: Failed to create socks proxy for cluster... QDS cannot contact the cluster coordinator node via SSH. Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209); use the subnet’s security list to do this.

Preventing Problems¶

Here are some guidelines to help you prevent similar problems in the future.

Make sure you’ve read and understood the relevant Qubole and Cloud documentation, in particular:
Make sure you have configured each subnet so as to allow communication among all nodes.
Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209).
Make sure that starting the cluster will not put you over the limit for your tenancy.