======  Internal Environment issue ======

The purpose of this page is to gather a list of resolutions which can be used by anyone to recover an OpenStack environment to keep the system up.

As the environment is not monitored as a production environment, there can be situations like disk space usage which are alerted in the slack channels but not acted upon in a timely manner. 

[[https://errigaloperations.slack.com/archives/CRD88TB71|Watchdog Internal Slack Channel]]
[[https://errigaloperations.slack.com/archives/CMR7XSYDA|Prometheus Internal Slack Channel]]


===== Troubleshooting =====

Check space, typically start with the IDMS Loadbalancer host and work our way through Apps1, Apps2, DB1, DB2

<code>

ssh scotty@hostlb1.err
sudo su -
cd /
du -hs | sort -h

Example output 
1.2G	run
3.2G	root
3.3G	home
4.0G	usr
5.0G	swapfile
14G	var
</code>

====== RabbitMQ Space resolution - Internal Env only ======

**NOTE** This will wipe all data so apply with care and only on Internal environment. 

The RMQ data is stored in the ''/var/lib/rabbitmq'' so above we can see 14G locked in the var folder.

As this is an internal environment, we can clean out space by removing the persistent store

''/var/lib/rabbitmq/mnesia/HOSTHERE/msg_stores/vhosts/UUIDFOLDER/msg_store_persistent'''

Find the largest folder store, and delete all files present

====== CAS / You do not have permission to access this. ======

When all normal user profile issues are checked (username, password, account active) checking the CAS log can be a useful start ''logs/grails/cas.log''

If the following is present
''[org.jasig.cas.CentralAuthenticationServiceImpl] - ServiceManagement: Unauthorized Service Access. Service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] is not found in service registry.''

Verify the URL is resolving by a simple ''ping qascoapps1.err''

if this fails to render, then the CAS authentication cannot succeed, and points to a DNS issue.

**Configurations**
Check CAS services and make sure they contain the correct urls. You'll find these on the handlers at ///usr/local/conf/cas/services// 


If you're seeing in the CAS logs, that the service url isn't matching the supplied service urls, like in the example below, it might be a configuration issue in HTTPD on the loadbalancer

<code>
ERROR [org.jasig.cas.CentralAuthenticationServiceImpl] - Service ticket [ST-62-LcMoIN7pmxTjPxI9eNkb-qascoapps1] with service [https://sco.errigal.com/ReportingManager/shiro-cas] does not match supplied service [http://qascoapps1.err:8081/ReportingManager/shiro-cas]
</code>

As you can see, the supplied service is using **qascoapps1.err** but the existing service is for **sco.errigal.com**.

So, ssh into the loadbalancer and navigate to ///etc/httpd/conf///

You'll need to check and potentially edit mod-jk.conf and workers.properties

**mod-js.conf**
For the Grails applications, we don't use ProxyPass and ProxyPassReverse (those are for the springboot applications).

Add the **JkMount** lines at the bottom for the relevant applications:

<code>
JkMount /ReportingManager/* ReportingManagerLoadBalancer
JkMount /ReportingManager ReportingManagerLoadBalancer
</code>

**workers.properties**
At the very top, make sure your application LoadBalancer entry is in the list 

<code>
# Create virtual workers
worker.list=jkstatus,SnmpManagerLoadBalancer,NocPortalLoadBalancer,ReportingManagerLoadBalancer,SupportPageLoadBalancer,casLoadBalancer,rdfLoadBalancer,SnmpManagerEMSLoadBalancer,TicketerLoadBalancer
</code>

Next, add the lines to configure the loadbalancer instances
<code>
#Configure ReportingManager load balanced instances
worker.ReportingManagerLoadBalancer.type=lb
worker.ReportingManagerLoadBalancer.sticky_session=1

# Declare Tomcat server workers 1 through n

worker.ReportingManagerWorker1.reference=worker.ajptemplate
worker.ReportingManagerWorker1.host=qascoapps1.err
worker.ReportingManagerWorker1.port=8011
worker.ReportingManagerWorker1.reply_timeout=600000
</code>

Finally, at the end of the file add those instances to the application loadbalancer worker

<code>
worker.ReportingManagerLoadBalancer.balance_workers=ReportingManagerWorker1
</code>

Save those files, and restart the **httpd** service 

<code>
sudo service httpd restart
</code>