User Tools
This is an old revision of the document!
Table of Contents
Openstack Internal Environment issue
The purpose of this page is to gather a list of resolutions which can be used by anyone to recover an OpenStack environment to keep the system up.
As the environment is not monitored as a production environment, there can be situations like disk space usage which are alerted in the slack channels but not acted upon in a timely manner.
Watchdog Internal Slack Channel Prometheus Internal Slack Channel
Troubleshooting
Check space, typically start with the IDMS Loadbalancer host and work our way through Apps1, Apps2, DB1, DB2
ssh scotty@hostlb1.err sudo su - cd / du -hs | sort -h Example output 1.2G run 3.2G root 3.3G home 4.0G usr 5.0G swapfile 14G var
RabbitMQ Space resolution - Internal Env only
NOTE This will wipe all data so apply with care and only on Internal environment.
The RMQ data is stored in the /var/lib/rabbitmq so above we can see 14G locked in the var folder.
As this is an internal environment, we can clean out space by removing the persistent store
/var/lib/rabbitmq/mnesia/HOSTHERE/msg_stores/vhosts/UUIDFOLDER/msg_store_persistent'
Find the largest folder store, and delete all files present
CAS / You do not have permission to access this.
When all normal user profile issues are checked (username, password, account active) checking the CAS log can be a useful start logs/grails/cas.log
If the following is present
[org.jasig.cas.CentralAuthenticationServiceImpl] - ServiceManagement: Unauthorized Service Access. Service [http://qascoapps1.err:8081/ReportingManager/shiro-cas] is not found in service registry.
Verify the URL is resolving by a simple ping qascoapps1.err
if this fails to render, then the CAS authentication cannot succeed, and points to a DNS issue.