The purpose of this page is to gather a list of resolutions which can be used by anyone to recover an OpenStack environment to keep the system up.
As the environment is not monitored as a production environment, there can be situations like disk space usage which are alerted in the slack channels but not acted upon in a timely manner.
Watchdog Internal Slack Channel Prometheus Internal Slack Channel
Check space, typically start with the IDMS Loadbalancer host and work our way through Apps1, Apps2, DB1, DB2
ssh scotty@hostlb1.err sudo su - cd / du -hs | sort -h Example output 1.2G run 3.2G root 3.3G home 4.0G usr 5.0G swapfile 14G var
NOTE This will wipe all data so apply with care and only on Internal environment.
The RMQ data is stored in the /var/lib/rabbitmq so above we can see 14G locked in the var folder.
As this is an internal environment, we can clean out space by removing the persistent store
/var/lib/rabbitmq/mnesia/HOSTHERE/msg_stores/vhosts/UUIDFOLDER/msg_store_persistent'
Find the largest folder store, and delete all files present
When all normal user profile issues are checked (username, password, account active) checking the CAS log can be a useful start logs/grails/cas.log
If the following is present
[org.jasig.cas.CentralAuthenticationServiceImpl] - ServiceManagement: Unauthorized Service Access. Service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] is not found in service registry.
Verify the URL is resolving by a simple ping qascoapps1.err
if this fails to render, then the CAS authentication cannot succeed, and points to a DNS issue.
Configurations Check CAS services and make sure they contain the correct urls. You'll find these on the handlers at /usr/local/conf/cas/services
If you're seeing in the CAS logs, that the service url isn't matching the supplied service urls, like in the example below, it might be a configuration issue in HTTPD on the loadbalancer
ERROR [org.jasig.cas.CentralAuthenticationServiceImpl] - Service ticket [ST-62-LcMoIN7pmxTjPxI9eNkb-qascoapps1] with service [https://sco.errigal.com/ReportingManager/shiro-cas] does not match supplied service [http://qascoapps1.err:8081/ReportingManager/shiro-cas]
As you can see, the supplied service is using qascoapps1.err but the existing service is for sco.errigal.com.
So, ssh into the loadbalancer and navigate to /etc/httpd/conf/
You'll need to check and potentially edit mod-jk.conf and workers.properties
mod-js.conf For the Grails applications, we don't use ProxyPass and ProxyPassReverse (those are for the springboot applications).
Add the JkMount lines at the bottom for the relevant applications:
JkMount /ReportingManager/* ReportingManagerLoadBalancer JkMount /ReportingManager ReportingManagerLoadBalancer
workers.properties At the very top, make sure your application LoadBalancer entry is in the list
# Create virtual workers worker.list=jkstatus,SnmpManagerLoadBalancer,NocPortalLoadBalancer,ReportingManagerLoadBalancer,SupportPageLoadBalancer,casLoadBalancer,rdfLoadBalancer,SnmpManagerEMSLoadBalancer,TicketerLoadBalancer
Next, add the lines to configure the loadbalancer instances
#Configure ReportingManager load balanced instances worker.ReportingManagerLoadBalancer.type=lb worker.ReportingManagerLoadBalancer.sticky_session=1 # Declare Tomcat server workers 1 through n worker.ReportingManagerWorker1.reference=worker.ajptemplate worker.ReportingManagerWorker1.host=qascoapps1.err worker.ReportingManagerWorker1.port=8011 worker.ReportingManagerWorker1.reply_timeout=600000
Finally, at the end of the file add those instances to the application loadbalancer worker
worker.ReportingManagerLoadBalancer.balance_workers=ReportingManagerWorker1
Save those files, and restart the httpd service
sudo service httpd restart