Skip to content

Staging Debugging Guide

Jorge Silva edited this page Mar 30, 2016 · 11 revisions

So you yell and you yell and no one is fixing staging. Well, now you can debug yourself :)

  1. Go to runnable.io.
  2. Ensure rabbitMQ, redis, and mongo are up. If any of these are down, simply click start container (DO NOT REBUILD THESE CONTAINERS). But if you did ....
  3. mongo flushed
  4. redis flushed
  5. Ensure supporting services are up:
  • api
  • api-worker WORKER-do-not-delete branch (updated to latest code git fetch && git merge origin/master) and restart container
  • runnable-angular
  • mavis ** hit http://mavis-staging-codenow.runnableapp.com/docks to ensure docks are available
  • optimus
  1. to get navi to work you have to do something special using navi on staging To ensure things are up do the following:
  2. Go to terminal of the service
  3. curl the respective port, example: curl localhost:80 for api: 1. If you do not see any output of curl (or it hangs) then start/stop or click latest commit.
  4. Run ip addr | grep ethwe and ensure the output contains ethwe. If not, rebuild container.
  5. Inspect the logs by opening a terminal and running: 1. npm install bunyan -g 2. LOG_LEVEL_STDOUT=trace npm start | bunyan

Possible Problems

1. Builds Are Not Working

Things to check:

1.1 Is the WORKER-DO-NOT-DELETE branch up and not throwing any errors?

  1. Go to runnable.io
  2. Look for WORKER-DO-NOT-DELETE branch in api.
  3. Make sure it's green.
  4. Go into CMD logs and see if there are any errors.

1.2 Are all of the following services up in Runnable?

  1. Go to runnable.io
  2. Look for following repos to be green:
  • Swarm
  • Sauron
  • docker-listener
  • Neo4j

1.2 Are all of the following services up in delta-staging-data?

  1. SSH into delta-staging-data (ssh delta-staging-data)
  2. Run sudo docker ps
  3. Check that all of the following are running
  • MongoDB
  • Redis
  • RabbitMQ
  • Consul
  • Vault

1.2 Error: Container action start failed: connect EHOSTUNREACH

[2016-03-30T18:47:10.179Z]  WARN: api/12 on c7263d6ab6fa (/api/node_modules/ponos/lib/worker.js:231 in unknownErrRetry): Task failed, retrying (environment=staging, module=lib/models/rabbitmq/index.js, queue=on-image-builder-container-create, nextAttemptDelay=1048576)
    Error: Container action start failed: connect EHOSTUNREACH
        at Object.exports.create (/api/node_modules/dat-middleware/node_modules/boom/lib/index.js:21:17)
        at Docker.<anonymous> (/api/lib/models/apis/docker.js:1112:17)
        at Object.callback (/api/lib/models/apis/docker.js:1067:49)
        at /api/node_modules/dockerode/lib/container.js:180:10
        at done (/api/node_modules/dogerode/index.js:28:7)
        at Modem.buildPayload (/api/node_modules/docker-modem/lib/modem.js:225:19)
        at ClientRequest.<anonymous> (/api/node_modules/docker-modem/lib/modem.js:210:10)
        at ClientRequest.EventEmitter.emit (events.js:95:17)
        at CleartextStream.socketErrorListener (http.js:1547:9)
        at CleartextStream.EventEmitter.emit (events.js:95:17)
        at Socket.onerror (tls.js:1445:17)
        at Socket.EventEmitter.emit (events.js:117:20)
        at net.js:440:14
        at process._tickDomainCallback (node.js:463:13)

If you see this error, this is probably an error in the API worker trying to connect to docker. Either swarm might be done or there might be some kind of DNS issue where the swarm container url (swarm-staging-codenow.runnableapp.com).

1.3 Is Sauron connected to Swarm?

  1. Go to the swarm container in runnable.io
  2. Go to Terminal
  3. Type docker info
  4. You should see a list similar to this with more than one container and more than one image:
root@7785a0fe6162:/sauron# docker info                                          
Containers: 156                                                                 
Images: 54                                                                      
Storage Driver:                                                                 
Role: primary                                                                   
Strategy: spread                                                                
Filters: health, port, dependency, affinity, constraint                         
Nodes: 4                                                                        
 ip-10-8-166-119.2335750: 10.8.166.119:4242                                     
  └ Status: Healthy                                                             
  └ Containers: 60                                                              
  └ Reserved CPUs: 0 / 2                                                        
  └ Reserved Memory: 0 B / 8.187 GiB                                            
  └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-79-generic, operati
ngsystem=Ubuntu 14.04.4 LTS, org=2335750, storagedriver=aufs                    
  └ Error: (none)                                                               
  └ UpdatedAt: 2016-03-30T18:44:59Z                                             
...     

1.2 Is the dock connected to Swarm?

  1. Using docks CLI, list all the docks for staging docks list -e stage or docks aws list -e stage
  2. SSH into one of those docks
  3. Type sudo docker info
  4. You should see a list similar to this with more than one container and more than one image:
root@7785a0fe6162:/sauron# docker info                                          
Containers: 156                                                                 
Images: 54   

URLs are not being redirect

Things to check

Clone this wiki locally