====== Internal Environment issue ====== The purpose of this page is to gather a list of resolutions which can be used by anyone to recover an OpenStack environment to keep the system up. As the environment is not monitored as a production environment, there can be situations like disk space usage which are alerted in the slack channels but not acted upon in a timely manner. [[https://errigaloperations.slack.com/archives/CRD88TB71|Watchdog Internal Slack Channel]] [[https://errigaloperations.slack.com/archives/CMR7XSYDA|Prometheus Internal Slack Channel]] ===== Troubleshooting ===== Check space, typically start with the IDMS Loadbalancer host and work our way through Apps1, Apps2, DB1, DB2 ssh scotty@hostlb1.err sudo su - cd / du -hs | sort -h Example output 1.2G run 3.2G root 3.3G home 4.0G usr 5.0G swapfile 14G var ====== RabbitMQ Space resolution - Internal Env only ====== **NOTE** This will wipe all data so apply with care and only on Internal environment. The RMQ data is stored in the ''/var/lib/rabbitmq'' so above we can see 14G locked in the var folder. As this is an internal environment, we can clean out space by removing the persistent store ''/var/lib/rabbitmq/mnesia/HOSTHERE/msg_stores/vhosts/UUIDFOLDER/msg_store_persistent''' Find the largest folder store, and delete all files present ====== CAS / You do not have permission to access this. ====== When all normal user profile issues are checked (username, password, account active) checking the CAS log can be a useful start ''logs/grails/cas.log'' If the following is present ''[org.jasig.cas.CentralAuthenticationServiceImpl] - ServiceManagement: Unauthorized Service Access. Service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] is not found in service registry.'' Verify the URL is resolving by a simple ''ping qascoapps1.err'' if this fails to render, then the CAS authentication cannot succeed, and points to a DNS issue. **Configurations** Check CAS services and make sure they contain the correct urls. You'll find these on the handlers at ///usr/local/conf/cas/services// If you're seeing in the CAS logs, that the service url isn't matching the supplied service urls, like in the example below, it might be a configuration issue in HTTPD on the loadbalancer ERROR [org.jasig.cas.CentralAuthenticationServiceImpl] - Service ticket [ST-62-LcMoIN7pmxTjPxI9eNkb-qascoapps1] with service [https://sco.errigal.com/ReportingManager/shiro-cas] does not match supplied service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] As you can see, the supplied service is using **qascoapps1.err** but the existing service is for **sco.errigal.com**. So, ssh into the loadbalancer and navigate to ///etc/httpd/conf/// You'll need to check and potentially edit mod-jk.conf and workers.properties **mod-js.conf** For the Grails applications, we don't use ProxyPass and ProxyPassReverse (those are for the springboot applications). Add the **JkMount** lines at the bottom for the relevant applications: JkMount /ReportingManager/* ReportingManagerLoadBalancer JkMount /ReportingManager ReportingManagerLoadBalancer **workers.properties** At the very top, make sure your application LoadBalancer entry is in the list # Create virtual workers worker.list=jkstatus,SnmpManagerLoadBalancer,NocPortalLoadBalancer,ReportingManagerLoadBalancer,SupportPageLoadBalancer,casLoadBalancer,rdfLoadBalancer,SnmpManagerEMSLoadBalancer,TicketerLoadBalancer Next, add the lines to configure the loadbalancer instances #Configure ReportingManager load balanced instances worker.ReportingManagerLoadBalancer.type=lb worker.ReportingManagerLoadBalancer.sticky_session=1 # Declare Tomcat server workers 1 through n worker.ReportingManagerWorker1.reference=worker.ajptemplate worker.ReportingManagerWorker1.host=qascoapps1.err worker.ReportingManagerWorker1.port=8011 worker.ReportingManagerWorker1.reply_timeout=600000 Finally, at the end of the file add those instances to the application loadbalancer worker worker.ReportingManagerLoadBalancer.balance_workers=ReportingManagerWorker1 Save those files, and restart the **httpd** service sudo service httpd restart