====== Internal Environment issue ======
The purpose of this page is to gather a list of resolutions which can be used by anyone to recover an OpenStack environment to keep the system up.
As the environment is not monitored as a production environment, there can be situations like disk space usage which are alerted in the slack channels but not acted upon in a timely manner.
[[https://errigaloperations.slack.com/archives/CRD88TB71|Watchdog Internal Slack Channel]]
[[https://errigaloperations.slack.com/archives/CMR7XSYDA|Prometheus Internal Slack Channel]]
===== Troubleshooting =====
Check space, typically start with the IDMS Loadbalancer host and work our way through Apps1, Apps2, DB1, DB2
ssh scotty@hostlb1.err
sudo su -
cd /
du -hs | sort -h
Example output
1.2G run
3.2G root
3.3G home
4.0G usr
5.0G swapfile
14G var
====== RabbitMQ Space resolution - Internal Env only ======
**NOTE** This will wipe all data so apply with care and only on Internal environment.
The RMQ data is stored in the ''/var/lib/rabbitmq'' so above we can see 14G locked in the var folder.
As this is an internal environment, we can clean out space by removing the persistent store
''/var/lib/rabbitmq/mnesia/HOSTHERE/msg_stores/vhosts/UUIDFOLDER/msg_store_persistent'''
Find the largest folder store, and delete all files present
====== CAS / You do not have permission to access this. ======
When all normal user profile issues are checked (username, password, account active) checking the CAS log can be a useful start ''logs/grails/cas.log''
If the following is present
''[org.jasig.cas.CentralAuthenticationServiceImpl] - ServiceManagement: Unauthorized Service Access. Service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] is not found in service registry.''
Verify the URL is resolving by a simple ''ping qascoapps1.err''
if this fails to render, then the CAS authentication cannot succeed, and points to a DNS issue.
**Configurations**
Check CAS services and make sure they contain the correct urls. You'll find these on the handlers at ///usr/local/conf/cas/services//
If you're seeing in the CAS logs, that the service url isn't matching the supplied service urls, like in the example below, it might be a configuration issue in HTTPD on the loadbalancer
ERROR [org.jasig.cas.CentralAuthenticationServiceImpl] - Service ticket [ST-62-LcMoIN7pmxTjPxI9eNkb-qascoapps1] with service [https://sco.errigal.com/ReportingManager/shiro-cas] does not match supplied service [http://qascoapps1.err:8081/ReportingManager/shiro-cas]
As you can see, the supplied service is using **qascoapps1.err** but the existing service is for **sco.errigal.com**.
So, ssh into the loadbalancer and navigate to ///etc/httpd/conf///
You'll need to check and potentially edit mod-jk.conf and workers.properties
**mod-js.conf**
For the Grails applications, we don't use ProxyPass and ProxyPassReverse (those are for the springboot applications).
Add the **JkMount** lines at the bottom for the relevant applications:
JkMount /ReportingManager/* ReportingManagerLoadBalancer
JkMount /ReportingManager ReportingManagerLoadBalancer
**workers.properties**
At the very top, make sure your application LoadBalancer entry is in the list
# Create virtual workers
worker.list=jkstatus,SnmpManagerLoadBalancer,NocPortalLoadBalancer,ReportingManagerLoadBalancer,SupportPageLoadBalancer,casLoadBalancer,rdfLoadBalancer,SnmpManagerEMSLoadBalancer,TicketerLoadBalancer
Next, add the lines to configure the loadbalancer instances
#Configure ReportingManager load balanced instances
worker.ReportingManagerLoadBalancer.type=lb
worker.ReportingManagerLoadBalancer.sticky_session=1
# Declare Tomcat server workers 1 through n
worker.ReportingManagerWorker1.reference=worker.ajptemplate
worker.ReportingManagerWorker1.host=qascoapps1.err
worker.ReportingManagerWorker1.port=8011
worker.ReportingManagerWorker1.reply_timeout=600000
Finally, at the end of the file add those instances to the application loadbalancer worker
worker.ReportingManagerLoadBalancer.balance_workers=ReportingManagerWorker1
Save those files, and restart the **httpd** service
sudo service httpd restart