User Tools
Table of Contents
AirScan
AirScan devices are cellular-based network monitoring units deployed at customer sites. They connect to the Errigal platform over a WireGuard VPN tunnel using a cellular modem for backhaul, run an RDF Agent to execute discovery tasks from the orchestrator, and report results back for visualization and alarming.
Each device runs two main applications:
- AirScan Modem Manager — manages the cellular modem via AT commands, maintains internet connectivity, and auto-reconnects on failure.
- RDF Agent — polls the RDF Orchestrator for discovery tasks (topology, performance, alarms, etc.), executes them, and pushes results back.
Devices are configured and deployed through a Jenkins pipeline that reads a Google Sheet, generates Ansible inventory, and runs deployment playbooks.
Architecture
Google Sheet (config source)
│
▼
Jenkins Pipeline (airscanautoconfiguration)
│
├── Generates Ansible inventory from sheet
├── Configures WireGuard VPN tunnels
├── Deploys AirScan Modem Manager
├── Registers elements in DB (airscan_load_elements)
└── Deploys RDF Agent
│
▼
AirScan Device ──WireGuard VPN──► OAT Server ──► Orchestrator
│ ▲
└── Cellular modem (APN) heartbeat (SNMP) ────┘
Connectivity Chain
- Cellular modem connects to the carrier network via an APN (managed by Modem Manager).
- WireGuard tunnel runs from the device through
rdflb_server(jump host) tooat_server. - RDF Agent on the device communicates with the orchestrator through the tunnel.
- RDF Agent sends SNMP heartbeat traps to SnmpManager, which monitors them and manages the
network_elemententry. - Orchestrator manages the
elemententry linked viaentry_point_id.
Component Interaction
┌─────────────────────────────────────────────────────┐
│ AirScan Device │
│ │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ Modem Manager │◄───│ RDF Agent │ │
│ │ (Flask :5000) │ │ (Spring Boot :8081) │ │
│ │ │ │ │ │
│ │ AT commands to │ │ Polls orchestrator │ │
│ │ cellular modem │ │ Runs discovery tasks │ │
│ │ Auto-reconnect │ │ Sends SNMP heartbeats │ │
│ └───────┬──────────┘ └──────────┬─────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ Cellular │ │ WireGuard VPN │ │
│ │ Modem (usb0) │ │ Tunnel │ │
│ └──────┬───────┘ └────────┬────────┘ │
└─────────┼──────────────────────────┼────────────────┘
│ │
▼ ▼
Carrier Network rdflb_server (jump host)
│
▼
oat_server
│
tasks ↓ ↑ results + heartbeats
│
┌──────────────┐
│ Orchestrator │
│ SnmpManager │
└──────────────┘
Components
AirScan Modem Manager
Repository: errigal/apps/airscanmodemmanager
Language: Python 3.12 / Flask 2.3
Registry: registry.errigal.com/airscan/airscanmodemmanager
Runs on: Port 5000 (host network, privileged container)
The Modem Manager controls the cellular modem on AirScan devices using AT commands over a serial interface. It disables ModemManager and prevents NetworkManager from managing the modem, using pure AT commands for the most reliable carrier connectivity.
How It Works
- Device discovery: Scans
/dev/ttyUSB*, sendsATto each, uses the first responding device. - APN configuration: Sets PDP context with
AT+CGDCONT=1,“IP”,“{SIM_APN}”and activates withAT+CGACT=1,1. - Carrier selection: Auto-select with
AT+COPS=0or specific carrier withAT+COPS=1,2,“{PLMN}”. - Auto-reconnect: Background job runs every
AUTO_RECONNECT_INTERVALseconds. PingsPING_TEST_HOSTon eth0, wlan0, and the modem interface. If all fail, performs a network scan, band unlock, reconnect, and PDP reconfiguration. - Supported modems: Quectel (RG50xQ, RM5xxQ) and Simcom (SIM7500, SIM7600).
- Band unlock: Simcom modems require
AT+CNBP=…for 4G/5G band unlock. Quectel is a no-op.
Environment Variables
Application-level defaults are defined in the Dockerfile and app/app.py.
Deployment-time overrides are set by the Ansible role templates:
Health Check and Recovery
Docker healthcheck: ls /dev/ttyUSB* every 10 seconds — checks modem device is present. An autoheal container automatically restarts the Modem Manager if the healthcheck fails.
Recovery script: A cron job runs every airscanmodemmanager_device_recovery_interval_mins minutes (default 5). It calls http://localhost:5000/status and checks last_connectivity_timestamp. If the device is unreachable for airscanmodemmanager_device_unreachable_interval_hours (default 6) and the last reboot was more than airscanmodemmanager_device_reboot_interval_hours (default 6) ago, the device is rebooted.
Recovery logs are at /var/log/airscanmodemmanager_recovery/airscanmodemmanager_recovery.log (10MB rotation, 5 files).
Source code: bitbucket.org/errigal/airscanmodemmanager
RDF Agent
Repository: errigal/apps/rdf_agent
Language: Java 17 / Spring Boot 3.3
Registry: registry.errigal.com/rdf_agent
Runs on: Port 8081 (bound to 127.0.0.1), management on port 8080 (Actuator/Prometheus)
The RDF Agent polls the RDF Orchestrator for discovery tasks, executes them against target devices, and pushes results back. On AirScan devices it runs in privileged Docker with host networking. It has no inbound API requirement — it only needs outbound connectivity to the orchestrator.
How It Works
- Task polling:
DiscoveryTaskPollerGETs fromapi/v2/taskeveryPOLL_INTERVAL_MS(default 5000ms). - Permanent tasks:
PermanentTaskPollerGETs fromapi/v1/permanent/taskseveryPOLL_FOR_PERMANENT_TASKS_MS(default 60s). - Task routing:
IncomingRequestProcessorroutes tasks to the correct processor based on discovery type and technology. - Result submission:
OutgoingMessagePusherPOSTs results toapi/v2/task. - Status reporting:
StatusReporterPOSTs toapi/v1/agent/statusevery 20 seconds with version and hostname. - SNMP heartbeat:
SnmpTrapListenersends heartbeat traps everyHEARTBEAT_INTERVAL_MS(default 60s) to SnmpManager using OID.1.3.6.1.4.1.33582.1.1.2.5.1.
AirScan-Specific Behavior
When IS_AIRSCAN=true, the agent:
- Talks to the Modem Manager at
MODEM_MANAGER_URL(defaulthttp://localhost:5000) for cellular metrics, handoff tests, and carrier operations. - Collects local metrics from Prometheus Node Exporter (
NODE_EXPORTER_URL) viaprom2json. - Measures bandwidth with
vnstat(modem, eth0, wlan0 interfaces). - Runs
iperf3speed tests against a configured server. - Uses
AirScanPerformanceProcessorandCellularProcessorfor performance discovery.
Configuration
Application-level defaults are in src/main/resources/application.properties.
Deployment-time overrides are set by the Ansible role templates:
Source code: bitbucket.org/errigal/rdf_agent
WireGuard VPN
WireGuard provides the encrypted tunnel from AirScan devices to the platform infrastructure.
Topology
AirScan Device ──► rdflb_server (jump host / WireGuard server) ──► oat_server
IP Calculation
Each device's WireGuard IP is calculated as:
wireguard_ip = {internal_subnet base}.{wireguard_peer + 1}
Example: internal_subnet=10.13.20.0, wireguard_peer=20 → wireguard_ip=10.13.20.21
SSH Access
AirScan devices are not directly reachable from the corporate network. SSH access goes through rdflb_server as a jump host, then over the WireGuard tunnel to the device's internal IP.
┌──────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ Your │ SSH │ rdflb_server │ SSH │ AirScan Device │ │ Workstation ├─────────►│ (jump host) ├─────────►│ │ │ │ │ │ via WG │ │ │ │ │ Public/private IP │ tunnel │ WireGuard IP │ │ │ │ e.g. 10.0.87.50 │ │ e.g. 10.13.20.21│ └──────────────┘ └─────────────────────┘ └──────────────────┘
Manual SSH with -J (ProxyJump):
ssh -J {rdflb_user}@{rdflb_host} {device_user}@{wireguard_ip} # Example: ssh -J admin@10.0.87.50 root@10.13.20.21
Ansible equivalent (auto-generated in inventory):
The pipeline sets ansible_ssh_common_args with -o ProxyCommand=“ssh -W %h:%p -q {rdflb_user}@{rdflb_host}”, which achieves the same jump transparently for all playbook runs.
Ansible Roles
wireguard_server— Runs WireGuard in Docker onrdflb_server, generates peer configs, distributes to clients.wireguard_client— Installs WireGuard on the device, copies peer config, starts the service, writeswireguard_ipback to the Google Sheet.
Google Sheet Configuration
Sheet ID: 1j7rOK5vZhmIj84YJOGzkQ3u4_dUUftE57IbuOh3bnHo
URL: https://docs.google.com/spreadsheets/d/1j7rOK5vZhmIj84YJOGzkQ3u4_dUUftE57IbuOh3bnHo
Service account: scotty@environment-app-versions.iam.gserviceaccount.com
Tab Naming Convention
Tabs are named {customer}/{sheet_name}, where {customer} maps to a folder in env-configuration/. The tab name is used as the Jenkins source parameter.
Current tabs:
cts/productionqaatc/productionprodatc/productionqanova/errigal_demo_airscanprodsco/errigalprodsco/shared_accessblackbox/airscan
Column Reference
| Column | Maps To | Used In |
|---|---|---|
hostname | Ansible inventory hostname | Inventory generation |
configure | If “yes”, host is added to airscan/rdfagent/wireguard_client groups | Inventory generation |
name_in_platform | snmp_manager.network_element.name | DB registration |
private_ip | ansible_host for infrastructure servers | Inventory generation |
wireguard_ip | ansible_host for airscan devices (via ProxyJump) | Inventory + DB registration |
ssh_user / ssh_pass | SSH credentials for the device | Inventory generation |
wireguard_peer | WireGuard peer ID (IP = internal_subnet base + peer + 1) | WireGuard config |
apn | apn_name for cellular APN config | Modem Manager deployment |
cluster_name | Cluster assignment in snmp_manager | DB registration |
site_name | Site assignment in snmp_manager + orchestrator | DB registration |
iperf3_port | Port for iperf3 testing | iperf3 config |
rdf_agent_version | Target RDF Agent version (Docker tag) | RDF Agent deployment |
airscan_modem_manager_version | Target Modem Manager version (Docker tag) | Modem Manager deployment |
Special Rows
| Row | Behavior |
|---|---|
| GLOBAL | Non-empty columns become all.vars (e.g. wireguard_network, wireguard_port, internal_subnet) |
| wireguard_server | WireGuard VPN server; uses private_ip as ansible_host |
| iperf3_server | iperf3 test server; uses private_ip as ansible_host |
| oat_server | Constructed from vars_for_airscan.yml (extracted from hosts.ini); wireguard_peer from the sheet |
Jenkins Pipeline
Jenkinsfile: airscanautoconfiguration/Jenkinsfile
Parameters
| Parameter | Default | Description |
|---|---|---|
source | (job config) | Google Sheet tab name, e.g. cts/production |
CONFIGURE_WIREGUARD | false | Configure WireGuard VPN on clients |
CONFIGURE_AIRSCAN_MODEM_MANAGER | false | Deploy AirScan Modem Manager |
CONFIGURE_RDF_AGENT | false | Deploy RDF Agent |
CONFIGURE_IPERF3 | false | Configure iperf3 server |
Derived Variables
envVar = source.split('/')[0] // e.g. "cts"
invFile = source.split('/')[1] // e.g. "production"
worksheet_name = source // e.g. "cts/production"
extraVarsLocation = "env-configuration/{envVar}/vars_for_airscan.yml"
envVar determines:
- Which
hosts.inito use:env-configuration/{envVar}/hosts.ini - Which vault password credential:
{envVar}_ansible_vault_pass
Pipeline Stages
| # | Stage | Condition | Description |
|---|---|---|---|
| 1 | Preparation | Always | Clone env-configuration (master) and deployment-playbooks (branch). Set build display name. |
| 2 | Build Docker Image | Always | Build registry.errigal.com/airscanautoconfiguration:{tag} from airscan_config/Dockerfile. |
| 3 | Generate Extra Vars | Always | Run airscan_extract_vars_for_ansible_autoconfig.yml against hosts.ini to produce vars_for_airscan.yml with OAT/RDFLB host, user, password, DB hosts, etc. Uses vault credential. |
| 4 | Generate Inventory | Always | Run google_sheet_to_ansible_inv.py in Docker to read Google Sheet tab and produce {invFile}.yml under env-configuration/{envVar}/. |
| 5 | Remove WireGuard from OAT | Always | Run remove_wireguard_from_oat.yml on oat_server. Sets WIREGUARD_INTERFACE_EXISTS flag. |
| 6 | Configure WireGuard on RDFLB | WIREGUARD_INTERFACE_EXISTS == true | Configure WireGuard on wireguard_server and rdflb_server. |
| 7 | Configure WireGuard | CONFIGURE_WIREGUARD == true | Configure WireGuard for all clients except oat_server. |
| 8 | Deploy Modem Manager | CONFIGURE_AIRSCAN_MODEM_MANAGER == true | Deploy Modem Manager to airscan hosts via airscanmodemmamanger-deploy.yml. |
| 9 | Deploy RDF Agent | CONFIGURE_RDF_AGENT == true | Run airscan_load_elements (DB registration) then deploy RDF Agent via rdf-agent-docker-deploy.yml. |
| 10 | Configure iperf3 | CONFIGURE_IPERF3 == true | Configure iperf3 server via generate_ansible_iperf3_config.yml. |
Credentials
- Vault password: Jenkins credential
{envVar}_ansible_vault_passfor Ansible vault decryption. - Google API:
service_account.jsonbaked into the Docker image (service accountscotty@environment-app-versions.iam.gserviceaccount.com). - Docker registry:
errigal_docker_registry_username/errigal_docker_registry_passwordforregistry.errigal.com.
Inventory Generation (google_sheet_to_ansible_inv.py)
The Python script:
- Authenticates with Google Sheets API via
service_account.json. - Opens the sheet tab matching
SHEET_NAME(e.g.cts/production). - Reads all rows; first row = headers.
- GLOBAL row: Non-empty columns become
all.vars. - Special rows (
wireguard_server,iperf3_server): Useprivate_ipasansible_host. - oat_server / rdflb_server: Built from
vars_for_airscan.yml(OAT/RDFLB credentials fromhosts.ini). - Device rows: Use
wireguard_ipasansible_hostwith ProxyJump viardflb_server. Ifconfigure == “yes”, add host toairscan,rdfagent,wireguard_clientgroups. - Writes YAML inventory to
{invFile}.yml.
Ansible Roles Reference
airscan_load_elements
Path: deployment-playbooks/roles/airscan_load_elements/
Purpose: Registers AirScan devices in the SNMP Manager and Orchestrator databases.
Runs as part of: rdf-agent-docker-deploy.yml (before RDF Agent deployment, only on airscan hosts).
Database Operations
SNMP Manager (snmp_manager schema):
- Check if
network_elementexists byip_address - Insert
siteif missing - Insert
network_element(technology=AirScan, ne_type=Controller) - Delete + re-insert
site_network_element - Insert
expected_heartbeat(15-minute interval)
Orchestrator (orchestrator schema):
- Insert
customer_site - Insert
agentanduser_role - Insert or update
element(links to SNMP Manager viaentry_point_id) - Insert
schedule(hourly) - Delete old
schedule_configfor PERFORMANCE/POLL
API calls:
- Login to Orchestrator
- Get short install code for the agent
- Fetch agent install script to extract
rdf_access_token
Variables
Defaults and the full list of variables used by this role are in roles/airscan_load_elements/defaults/main.yml. The SQL operations and variable usage can be seen in roles/airscan_load_elements/tasks/main.yml.
Variables come from two sources:
- Google Sheet (via inventory) — device IP, name, cluster, site
vars_for_airscan.yml(generated fromhosts.ini) — DB hosts, orchestrator URL, credentials
airscan_modem_manager
Path: deployment-playbooks/roles/airscan_modem_manager/
Purpose: Deploys the Modem Manager application and configures networking on the device.
Deployment Steps
- Stop and disable ModemManager
- Configure systemd-networkd for modem interface
- Configure NetworkManager to leave
usb0andeth0unmanaged - Create
/opt/services/airscan/ - Render
docker-compose.ymland.envfrom templates - Docker login and pull image
docker compose up -d- Wait for port 5000
- Verify modem responds:
curl http://localhost:5000/modem/atwith bodyAT— expectOK - Optionally write version to Google Sheet
- Install recovery cron job
- Install vnstat for bandwidth monitoring
Variables
Defaults and the full list of variables are in roles/airscan_modem_manager/defaults/main.yml.
rdf-agent
Path: deployment-playbooks/roles/rdf-agent/
Purpose: Deploys the RDF Agent application on AirScan (and non-AirScan) hosts.
Deployment Steps
- Create
/opt/services/rdfagent/ - Render
docker-compose.ymland.envfrom templates - Docker login and pull image
docker compose up -d- Optionally write version to Google Sheet
AirScan vs Non-AirScan
| Aspect | AirScan | Non-AirScan |
|---|---|---|
| Network mode | host | Bridge (ports 8080, 162/udp) |
| Privileged | Yes | No |
IS_AIRSCAN | true | false |
| Volumes | /var/lib/vnstat mounted | None |
| SNMP listener IP | wireguard_ip | Default |
Variables
Defaults and the full list of variables are in roles/rdf-agent/defaults/main.yml.
airscan_write_to_google_sheet
Path: deployment-playbooks/roles/airscan_write_to_google_sheet/
Purpose: Writes deployment results back to the Google Sheet.
Runs the update_google_sheet.py script in a one-off Docker container (airscanautoconfiguration image). Updates a single row by hostname with key-value pairs.
Used by:
wireguard_client— writeswireguard_ipairscan_modem_manager— writesairscan_modem_manager_version_actualrdf-agent— writesrdf_agent_version_actual
Database Element Registration
Initial Registration (Ansible)
When the Jenkins pipeline runs with RDF Agent deployment, the airscan_load_elements role performs:
- Check by IP:
SELECT id FROM snmp_manager.network_element WHERE ip_address = '{wireguard_ip}' - If not found: INSERT new
network_elementwithname = '{name_in_platform}',ip_address = '{wireguard_ip}',technology = 'AirScan' - Link to site: DELETE + re-INSERT
site_network_element - Add heartbeat: INSERT
expected_heartbeatfor monitoring (15-minute interval) - Orchestrator: INSERT IGNORE
customer_site,agent,elementwithentry_point_id = {ne_id}
The matching key is IP address (wireguard_ip), not device name. If a network_element already exists for that IP, the INSERT is skipped.
Ongoing Sync (RDFElementSyncJob)
SnmpManager runs a scheduled sync job every 60 seconds:
NetworkElement.afterUpdate()/afterInsert()writes tonetwork_element_change_syncRDFElementSyncJobreads change records (< 5 days old, with valid IP)- For each change, POSTs to orchestrator
/api/v1/element/update - Orchestrator
ElementService.updateElement()finds element byentry_point_id - If found: updates IPs, credentials, technology, onAir status
- If not found: creates new element
Key code paths:
- Change trigger:
snmpmanager_grails3/…/domain/…/NetworkElement.groovy(afterUpdate,afterInsert,addChangeSyncRecord) - Sync service:
snmpmanager_grails3/…/services/…/RdfElementSyncService.groovy - Orchestrator handler:
rdf_orchestrator/…/service/element/ElementService.java
Known Duplicate Element Issue
The orchestrator's unique constraint is on (entry_point_id, customer_site_id) — not just entry_point_id. For AirScan elements, task_processing_agent_override should always be set — this ensures a direct correlation between the agent and the element so that tasks for the element are processed on the correct agent running on that device. When the override is set, customer_site_id is not used for agent routing. This can cause problems:
- The sync job's
findByEntryPointId()does not filter bycustomer_site_id - If multiple elements exist for the same
entry_point_idwith differentcustomer_site_id, the lookup may return an unpredictable one - The Ansible
INSERT IGNOREcan silently fail if a row already exists with different data - If the
customer_site_idmapping changes (e.g. site renamed), a new element can be created alongside the old one
Modem Manager API Reference
All endpoints are served on port 5000. Routes are defined using Flask-Classful across three files in the airscanmodemmanager repository:
| Route base | Source file | Description |
|---|---|---|
/ | app/app.py | Root routes: health check (/), status (/status) |
/modem/ | app/modem/modemApi.py | Modem operations: AT commands, carrier connect, handoff, ICCID, signal info, band unlock |
/system/ | app/system/systemApi.py | System utilities: ping via specific interface |
Deployment
Deploying a New AirScan Device
- Add device to Google Sheet: Add a row in the appropriate tab (e.g.
cts/production) with hostname,configure=yes,wireguard_peer,apn,name_in_platform,cluster_name,site_name, and desired versions. - Run Jenkins pipeline with
sourcematching the sheet tab. Enable all relevant parameters:CONFIGURE_WIREGUARD=true— sets up VPN tunnelCONFIGURE_AIRSCAN_MODEM_MANAGER=true— deploys modem managerCONFIGURE_RDF_AGENT=true— registers DB elements and deploys agent
- Verify:
- WireGuard tunnel is up:
sudo wg showonrdflb_server - Modem Manager responds:
curl http://{wireguard_ip}:5000/modem/at(via tunnel) - RDF Agent container running:
docker ps | grep rdfon the device - Element exists in DB: check
snmp_manager.network_elementandorchestrator.element
Updating Application Versions
- Update
rdf_agent_versionorairscan_modem_manager_versionin the Google Sheet row. - Run Jenkins pipeline with the appropriate parameter enabled (
CONFIGURE_RDF_AGENTorCONFIGURE_AIRSCAN_MODEM_MANAGER). - The role pulls the new image, restarts the container, and writes the actual deployed version back to the sheet (
rdf_agent_version_actual/airscan_modem_manager_version_actual).
Docker Images
| Image | Registry Path | Build |
|---|---|---|
| Modem Manager | registry.errigal.com/airscan/airscanmodemmanager:{version} | Jenkins (jenkinsCommon), multi-arch (amd64, arm64, arm/v8) |
| RDF Agent | registry.errigal.com/rdf_agent:{version} | Drone CI, JAR uploaded to S3 |
| Autoconfiguration | registry.errigal.com/airscanautoconfiguration:{tag} | Built during pipeline run |
| Ansible runner | registry.errigal.com/ansibledockerimage:latest | Pre-built image for running playbooks |
Troubleshooting
Device Not Connecting
Check in order:
1. WireGuard Tunnel
SSH to rdflb_server and check if the device's peer is active:
sudo wg show # Look for the device's peer — check "latest handshake" time
The device's WireGuard IP: {internal_subnet base}.{wireguard_peer + 1}
Example: internal_subnet=10.13.20.0, wireguard_peer=20 → IP 10.13.20.21
2. Google Sheet
Check the sheet tab for the customer:
- Is the device listed with
configure = yes? - Is
name_in_platformcorrect and matching the DB? - Are
wireguard_ipandwireguard_peercorrect?
3. Database: snmp_manager.network_element
-- Find the device by IP SELECT id, name, ip_address, on_air, cluster_name, site_name FROM snmp_manager.network_element WHERE ip_address = '{wireguard_ip}'; -- Check for duplicates by name SELECT id, name, ip_address, on_air, cluster_name FROM snmp_manager.network_element WHERE name LIKE '%{device_identifier}%';
4. Database: orchestrator.element
-- Find element by entry_point_id (= network_element.id) SELECT e.id, e.entry_point_id, e.external_ip, e.internal_ip, e.on_air, e.customer_site_id, e.task_processing_agent_override FROM orchestrator.element e WHERE e.entry_point_id = {ne_id}; -- Check for duplicate elements by IP SELECT e.id, e.entry_point_id, e.external_ip, cs.name AS site_name FROM orchestrator.element e JOIN orchestrator.customer_site cs ON e.customer_site_id = cs.id WHERE e.external_ip = '{wireguard_ip}';
6. Container Status on Device
SSH to the device (via ProxyJump through rdflb_server):
# Check both containers docker ps # RDF Agent logs docker logs rdfagent # Modem Manager logs docker logs airscanmodemmanager
7. Heartbeat Monitoring
SELECT * FROM snmp_manager.expected_heartbeat WHERE network_element_id = {ne_id};
Device Name Changed — Element Mismatch
Symptom: Customer changed the device name in the platform UI. Device may stop working or show stale data.
What happens:
network_elementname changesafterUpdate()fires, creating anetwork_element_change_syncrecord- Sync job pushes updated data to orchestrator
- Orchestrator matches by
entry_point_id(not name), so the element updates correctly
However, if the pipeline is re-run with a different name_in_platform:
- The playbook checks by IP address, not name
- If the IP exists, it skips the INSERT (existing element is reused)
- The name in
network_elementis NOT updated by the playbook
If duplicates exist:
-- Find duplicate network_elements SELECT id, name, ip_address, on_air, cluster_name FROM snmp_manager.network_element WHERE technology = 'AirScan' AND (name LIKE '%{old_name}%' OR name LIKE '%{new_name}%' OR ip_address = '{wireguard_ip}'); -- Find duplicate orchestrator elements SELECT e.id, e.entry_point_id, e.external_ip, e.on_air, e.customer_site_id, cs.name AS site_name FROM orchestrator.element e LEFT JOIN orchestrator.customer_site cs ON e.customer_site_id = cs.id WHERE e.external_ip = '{wireguard_ip}' OR e.entry_point_id IN ( SELECT id FROM snmp_manager.network_element WHERE name LIKE '%{old_name}%' OR name LIKE '%{new_name}%' );
The correct element should have:
entry_point_idmatching thesnmp_manager.network_element.idfor that IPcustomer_site_idmatching the correct sitetask_processing_agent_overridepointing to the correct agent
Update name_in_platform in the Google Sheet to match, then re-run the pipeline if needed.
Common Pipeline Failures
| Failure | Cause | Fix |
|---|---|---|
| Vault password error | {envVar}_ansible_vault_pass missing in Jenkins | Add credential in Jenkins |
| Sheet access denied | Service account lacks access | Share sheet with scotty@environment-app-versions.iam.gserviceaccount.com |
| Tab not found | source param doesn't match sheet tab name | Verify tab name matches {customer}/{sheet_name} exactly |
| Missing host groups | hosts.ini lacks rdf-orchestrator, rdf-lb, etc. | Update env-configuration/{envVar}/hosts.ini |
| WireGuard timeout | Peer unreachable or interface down | Check wireguard_server, peer config, firewall |
| Element INSERT fails | Cluster or site doesn't exist in DB | Create cluster/site first, or check cluster_name/site_name in sheet |
| Docker pull fails | Registry auth or image not found | Check errigal_docker_registry credentials and image tag |
Useful Log Locations
| Component | Location |
|---|---|
| SnmpManager | Application logs — search for RDFElementSync entries |
| Orchestrator | Application logs — search for Updating Element with EntryPointId |
| Jenkins | Build console output — includes debug from inventory generation |
| WireGuard | sudo wg show on wireguard_server or journalctl -u wg-quick@{interface} |
| Modem Manager | docker logs on device, or /var/log/airscanmodemmanager_recovery/ for recovery |
| RDF Agent | docker logs on device, or file at LOGGING_FILE_PATH |
Customer Environment Reference
Each customer has:
env-configuration/{customer}/hosts.ini— main infrastructure inventoryenv-configuration/{customer}/group_vars/all/30_all.yml— environment variables- Google Sheet tab
{customer}/…— dynamic AirScan configuration
The pipeline generates a dynamic inventory at env-configuration/{customer}/{invFile}.yml from the Google Sheet.
CTS Example
| Item | Value |
|---|---|
| hosts.ini | env-configuration/cts/hosts.ini — defines ctsapps1/2, ctslb1, ctsoat1/2, ctsesk1/2 |
| Google Sheet tab | cts/production — dynamic config with EAS-prefixed hostnames |
| OAT servers | ctsoat1 (10.0.87.65), ctsoat2 (10.0.87.115) |
| DB host | cts-master-prod.cl0y2kknu458.us-east-1.rds.amazonaws.com |
| WireGuard | port 51822, network “cts”, subnet 10.13.20.0 |
| Orchestrator URL | http://10.13.20.2:8079 |