====== AirScan ======
AirScan devices are cellular-based network monitoring units deployed at customer sites. They connect to the Errigal platform over a WireGuard VPN tunnel using a cellular modem for backhaul, run an RDF Agent to execute discovery tasks from the orchestrator, and report results back for visualization and alarming.
Each device runs two main applications:
* **AirScan Modem Manager** — manages the cellular modem via AT commands, maintains internet connectivity, and auto-reconnects on failure.
* **RDF Agent** — polls the RDF Orchestrator for discovery tasks (topology, performance, alarms, etc.), executes them, and pushes results back.
Devices are configured and deployed through a Jenkins pipeline that reads a Google Sheet, generates Ansible inventory, and runs deployment playbooks.
----
===== Architecture =====
Google Sheet (config source)
│
▼
Jenkins Pipeline (airscanautoconfiguration)
│
├── Generates Ansible inventory from sheet
├── Configures WireGuard VPN tunnels
├── Deploys AirScan Modem Manager
├── Registers elements in DB (airscan_load_elements)
└── Deploys RDF Agent
│
▼
AirScan Device ──WireGuard VPN──► OAT Server ──► Orchestrator
│ ▲
└── Cellular modem (APN) heartbeat (SNMP) ────┘
==== Connectivity Chain ====
- Cellular modem connects to the carrier network via an APN (managed by Modem Manager).
- WireGuard tunnel runs from the device through ''rdflb_server'' (jump host) to ''oat_server''.
- RDF Agent on the device communicates with the orchestrator through the tunnel.
- RDF Agent sends SNMP heartbeat traps to SnmpManager, which monitors them and manages the ''network_element'' entry.
- Orchestrator manages the ''element'' entry linked via ''entry_point_id''.
==== Component Interaction ====
┌─────────────────────────────────────────────────────┐
│ AirScan Device │
│ │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ Modem Manager │◄───│ RDF Agent │ │
│ │ (Flask :5000) │ │ (Spring Boot :8081) │ │
│ │ │ │ │ │
│ │ AT commands to │ │ Polls orchestrator │ │
│ │ cellular modem │ │ Runs discovery tasks │ │
│ │ Auto-reconnect │ │ Sends SNMP heartbeats │ │
│ └───────┬──────────┘ └──────────┬─────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ Cellular │ │ WireGuard VPN │ │
│ │ Modem (usb0) │ │ Tunnel │ │
│ └──────┬───────┘ └────────┬────────┘ │
└─────────┼──────────────────────────┼────────────────┘
│ │
▼ ▼
Carrier Network rdflb_server (jump host)
│
▼
oat_server
│
tasks ↓ ↑ results + heartbeats
│
┌──────────────┐
│ Orchestrator │
│ SnmpManager │
└──────────────┘
----
===== Components =====
==== AirScan Modem Manager ====
**Repository:** ''errigal/apps/airscanmodemmanager''\\
**Language:** Python 3.12 / Flask 2.3\\
**Registry:** ''registry.errigal.com/airscan/airscanmodemmanager''\\
**Runs on:** Port 5000 (host network, privileged container)
The Modem Manager controls the cellular modem on AirScan devices using AT commands over a serial interface. It disables ModemManager and prevents NetworkManager from managing the modem, using pure AT commands for the most reliable carrier connectivity.
=== How It Works ===
* **Device discovery:** Scans ''/dev/ttyUSB*'', sends ''AT'' to each, uses the first responding device.
* **APN configuration:** Sets PDP context with ''AT+CGDCONT=1,"IP","{SIM_APN}"'' and activates with ''AT+CGACT=1,1''.
* **Carrier selection:** Auto-select with ''AT+COPS=0'' or specific carrier with ''AT+COPS=1,2,"{PLMN}"''.
* **Auto-reconnect:** Background job runs every ''AUTO_RECONNECT_INTERVAL'' seconds. Pings ''PING_TEST_HOST'' on eth0, wlan0, and the modem interface. If all fail, performs a network scan, band unlock, reconnect, and PDP reconfiguration.
* **Supported modems:** Quectel (RG50xQ, RM5xxQ) and Simcom (SIM7500, SIM7600).
* **Band unlock:** Simcom modems require ''AT+CNBP=...'' for 4G/5G band unlock. Quectel is a no-op.
=== Environment Variables ===
Application-level defaults are defined in the [[https://bitbucket.org/errigal/airscanmodemmanager/src/master/Dockerfile|Dockerfile]] and [[https://bitbucket.org/errigal/airscanmodemmanager/src/master/app/app.py|app/app.py]].
Deployment-time overrides are set by the Ansible role templates:
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_modem_manager/templates/docker-compose.yml.j2|roles/airscan_modem_manager/templates/docker-compose.yml.j2]]
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_modem_manager/templates/.env.j2|roles/airscan_modem_manager/templates/.env.j2]]
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_modem_manager/defaults/main.yml|roles/airscan_modem_manager/defaults/main.yml]]
=== Health Check and Recovery ===
**Docker healthcheck:** ''ls /dev/ttyUSB*'' every 10 seconds — checks modem device is present. An ''autoheal'' container automatically restarts the Modem Manager if the healthcheck fails.
**Recovery script:** A cron job runs every ''airscanmodemmanager_device_recovery_interval_mins'' minutes (default 5). It calls ''http://localhost:5000/status'' and checks ''last_connectivity_timestamp''. If the device is unreachable for ''airscanmodemmanager_device_unreachable_interval_hours'' (default 6) and the last reboot was more than ''airscanmodemmanager_device_reboot_interval_hours'' (default 6) ago, the device is rebooted.
Recovery logs are at ''/var/log/airscanmodemmanager_recovery/airscanmodemmanager_recovery.log'' (10MB rotation, 5 files).
Source code: [[https://bitbucket.org/errigal/airscanmodemmanager|bitbucket.org/errigal/airscanmodemmanager]]
----
==== RDF Agent ====
**Repository:** ''errigal/apps/rdf_agent''\\
**Language:** Java 17 / Spring Boot 3.3\\
**Registry:** ''registry.errigal.com/rdf_agent''\\
**Runs on:** Port 8081 (bound to 127.0.0.1), management on port 8080 (Actuator/Prometheus)
The RDF Agent polls the RDF Orchestrator for discovery tasks, executes them against target devices, and pushes results back. On AirScan devices it runs in privileged Docker with host networking. It has no inbound API requirement — it only needs outbound connectivity to the orchestrator.
=== How It Works ===
* **Task polling:** ''DiscoveryTaskPoller'' GETs from ''api/v2/task'' every ''POLL_INTERVAL_MS'' (default 5000ms).
* **Permanent tasks:** ''PermanentTaskPoller'' GETs from ''api/v1/permanent/tasks'' every ''POLL_FOR_PERMANENT_TASKS_MS'' (default 60s).
* **Task routing:** ''IncomingRequestProcessor'' routes tasks to the correct processor based on discovery type and technology.
* **Result submission:** ''OutgoingMessagePusher'' POSTs results to ''api/v2/task''.
* **Status reporting:** ''StatusReporter'' POSTs to ''api/v1/agent/status'' every 20 seconds with version and hostname.
* **SNMP heartbeat:** ''SnmpTrapListener'' sends heartbeat traps every ''HEARTBEAT_INTERVAL_MS'' (default 60s) to SnmpManager using OID ''.1.3.6.1.4.1.33582.1.1.2.5.1''.
=== AirScan-Specific Behavior ===
When ''IS_AIRSCAN=true'', the agent:
* Talks to the Modem Manager at ''MODEM_MANAGER_URL'' (default ''http://localhost:5000'') for cellular metrics, handoff tests, and carrier operations.
* Collects local metrics from Prometheus Node Exporter (''NODE_EXPORTER_URL'') via ''prom2json''.
* Measures bandwidth with ''vnstat'' (modem, eth0, wlan0 interfaces).
* Runs ''iperf3'' speed tests against a configured server.
* Uses ''AirScanPerformanceProcessor'' and ''CellularProcessor'' for performance discovery.
=== Configuration ===
Application-level defaults are in [[https://bitbucket.org/errigal/rdf_agent/src/master/src/main/resources/application.properties|src/main/resources/application.properties]].
Deployment-time overrides are set by the Ansible role templates:
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/rdf-agent/templates/docker-compose.yml.j2|roles/rdf-agent/templates/docker-compose.yml.j2]]
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/rdf-agent/templates/.env.j2|roles/rdf-agent/templates/.env.j2]]
* [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/rdf-agent/defaults/main.yml|roles/rdf-agent/defaults/main.yml]]
Source code: [[https://bitbucket.org/errigal/rdf_agent|bitbucket.org/errigal/rdf_agent]]
----
==== WireGuard VPN ====
WireGuard provides the encrypted tunnel from AirScan devices to the platform infrastructure.
=== Topology ===
AirScan Device ──► rdflb_server (jump host / WireGuard server) ──► oat_server
=== IP Calculation ===
Each device's WireGuard IP is calculated as:
wireguard_ip = {internal_subnet base}.{wireguard_peer + 1}
Example: ''internal_subnet=10.13.20.0'', ''wireguard_peer=20'' → ''wireguard_ip=10.13.20.21''
=== SSH Access ===
AirScan devices are not directly reachable from the corporate network. SSH access goes through ''rdflb_server'' as a jump host, then over the WireGuard tunnel to the device's internal IP.
┌──────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ Your │ SSH │ rdflb_server │ SSH │ AirScan Device │
│ Workstation ├─────────►│ (jump host) ├─────────►│ │
│ │ │ │ via WG │ │
│ │ │ Public/private IP │ tunnel │ WireGuard IP │
│ │ │ e.g. 10.0.87.50 │ │ e.g. 10.13.20.21│
└──────────────┘ └─────────────────────┘ └──────────────────┘
**Manual SSH with ''-J'' (ProxyJump):**
ssh -J {rdflb_user}@{rdflb_host} {device_user}@{wireguard_ip}
# Example:
ssh -J admin@10.0.87.50 root@10.13.20.21
**Ansible equivalent (auto-generated in inventory):**
The pipeline sets ''ansible_ssh_common_args'' with ''-o ProxyCommand="ssh -W %h:%p -q {rdflb_user}@{rdflb_host}"'', which achieves the same jump transparently for all playbook runs.
=== Ansible Roles ===
* **''wireguard_server''** — Runs WireGuard in Docker on ''rdflb_server'', generates peer configs, distributes to clients.
* **''wireguard_client''** — Installs WireGuard on the device, copies peer config, starts the service, writes ''wireguard_ip'' back to the Google Sheet.
----
===== Google Sheet Configuration =====
**Sheet ID:** ''1j7rOK5vZhmIj84YJOGzkQ3u4_dUUftE57IbuOh3bnHo''\\
**URL:** [[https://docs.google.com/spreadsheets/d/1j7rOK5vZhmIj84YJOGzkQ3u4_dUUftE57IbuOh3bnHo]]\\
**Service account:** ''scotty@environment-app-versions.iam.gserviceaccount.com''
==== Tab Naming Convention ====
Tabs are named ''{customer}/{sheet_name}'', where ''{customer}'' maps to a folder in ''env-configuration/''. The tab name is used as the Jenkins ''source'' parameter.
**Current tabs:**
* ''cts/production''
* ''qaatc/production''
* ''prodatc/production''
* ''qanova/errigal_demo_airscan''
* ''prodsco/errigal''
* ''prodsco/shared_access''
* ''blackbox/airscan''
==== Column Reference ====
^ Column ^ Maps To ^ Used In ^
| ''hostname'' | Ansible inventory hostname | Inventory generation |
| ''configure'' | If ''"yes"'', host is added to airscan/rdfagent/wireguard_client groups | Inventory generation |
| ''name_in_platform'' | ''snmp_manager.network_element.name'' | DB registration |
| ''private_ip'' | ''ansible_host'' for infrastructure servers | Inventory generation |
| ''wireguard_ip'' | ''ansible_host'' for airscan devices (via ProxyJump) | Inventory + DB registration |
| ''ssh_user'' / ''ssh_pass'' | SSH credentials for the device | Inventory generation |
| ''wireguard_peer'' | WireGuard peer ID (IP = internal_subnet base + peer + 1) | WireGuard config |
| ''apn'' | ''apn_name'' for cellular APN config | Modem Manager deployment |
| ''cluster_name'' | Cluster assignment in snmp_manager | DB registration |
| ''site_name'' | Site assignment in snmp_manager + orchestrator | DB registration |
| ''iperf3_port'' | Port for iperf3 testing | iperf3 config |
| ''rdf_agent_version'' | Target RDF Agent version (Docker tag) | RDF Agent deployment |
| ''airscan_modem_manager_version'' | Target Modem Manager version (Docker tag) | Modem Manager deployment |
==== Special Rows ====
^ Row ^ Behavior ^
| **GLOBAL** | Non-empty columns become ''all.vars'' (e.g. ''wireguard_network'', ''wireguard_port'', ''internal_subnet'') |
| **wireguard_server** | WireGuard VPN server; uses ''private_ip'' as ''ansible_host'' |
| **iperf3_server** | iperf3 test server; uses ''private_ip'' as ''ansible_host'' |
| **oat_server** | Constructed from ''vars_for_airscan.yml'' (extracted from ''hosts.ini''); ''wireguard_peer'' from the sheet |
----
===== Jenkins Pipeline =====
**Jenkinsfile:** ''airscanautoconfiguration/Jenkinsfile''
==== Parameters ====
^ Parameter ^ Default ^ Description ^
| ''source'' | (job config) | Google Sheet tab name, e.g. ''cts/production'' |
| ''CONFIGURE_WIREGUARD'' | ''false'' | Configure WireGuard VPN on clients |
| ''CONFIGURE_AIRSCAN_MODEM_MANAGER'' | ''false'' | Deploy AirScan Modem Manager |
| ''CONFIGURE_RDF_AGENT'' | ''false'' | Deploy RDF Agent |
| ''CONFIGURE_IPERF3'' | ''false'' | Configure iperf3 server |
==== Derived Variables ====
envVar = source.split('/')[0] // e.g. "cts"
invFile = source.split('/')[1] // e.g. "production"
worksheet_name = source // e.g. "cts/production"
extraVarsLocation = "env-configuration/{envVar}/vars_for_airscan.yml"
''envVar'' determines:
* Which ''hosts.ini'' to use: ''env-configuration/{envVar}/hosts.ini''
* Which vault password credential: ''{envVar}_ansible_vault_pass''
==== Pipeline Stages ====
^ # ^ Stage ^ Condition ^ Description ^
| 1 | Preparation | Always | Clone ''env-configuration'' (master) and ''deployment-playbooks'' (branch). Set build display name. |
| 2 | Build Docker Image | Always | Build ''registry.errigal.com/airscanautoconfiguration:{tag}'' from ''airscan_config/Dockerfile''. |
| 3 | Generate Extra Vars | Always | Run ''airscan_extract_vars_for_ansible_autoconfig.yml'' against ''hosts.ini'' to produce ''vars_for_airscan.yml'' with OAT/RDFLB host, user, password, DB hosts, etc. Uses vault credential. |
| 4 | Generate Inventory | Always | Run ''google_sheet_to_ansible_inv.py'' in Docker to read Google Sheet tab and produce ''{invFile}.yml'' under ''env-configuration/{envVar}/''. |
| 5 | Remove WireGuard from OAT | Always | Run ''remove_wireguard_from_oat.yml'' on ''oat_server''. Sets ''WIREGUARD_INTERFACE_EXISTS'' flag. |
| 6 | Configure WireGuard on RDFLB | ''WIREGUARD_INTERFACE_EXISTS == true'' | Configure WireGuard on ''wireguard_server'' and ''rdflb_server''. |
| 7 | Configure WireGuard | ''CONFIGURE_WIREGUARD == true'' | Configure WireGuard for all clients except ''oat_server''. |
| 8 | Deploy Modem Manager | ''CONFIGURE_AIRSCAN_MODEM_MANAGER == true'' | Deploy Modem Manager to ''airscan'' hosts via ''airscanmodemmamanger-deploy.yml''. |
| 9 | Deploy RDF Agent | ''CONFIGURE_RDF_AGENT == true'' | Run ''airscan_load_elements'' (DB registration) then deploy RDF Agent via ''rdf-agent-docker-deploy.yml''. |
| 10 | Configure iperf3 | ''CONFIGURE_IPERF3 == true'' | Configure iperf3 server via ''generate_ansible_iperf3_config.yml''. |
==== Credentials ====
* **Vault password:** Jenkins credential ''{envVar}_ansible_vault_pass'' for Ansible vault decryption.
* **Google API:** ''service_account.json'' baked into the Docker image (service account ''scotty@environment-app-versions.iam.gserviceaccount.com'').
* **Docker registry:** ''errigal_docker_registry_username'' / ''errigal_docker_registry_password'' for ''registry.errigal.com''.
==== Inventory Generation (google_sheet_to_ansible_inv.py) ====
The Python script:
- Authenticates with Google Sheets API via ''service_account.json''.
- Opens the sheet tab matching ''SHEET_NAME'' (e.g. ''cts/production'').
- Reads all rows; first row = headers.
- **GLOBAL row:** Non-empty columns become ''all.vars''.
- **Special rows** (''wireguard_server'', ''iperf3_server''): Use ''private_ip'' as ''ansible_host''.
- **oat_server / rdflb_server:** Built from ''vars_for_airscan.yml'' (OAT/RDFLB credentials from ''hosts.ini'').
- **Device rows:** Use ''wireguard_ip'' as ''ansible_host'' with ProxyJump via ''rdflb_server''. If ''configure == "yes"'', add host to ''airscan'', ''rdfagent'', ''wireguard_client'' groups.
- Writes YAML inventory to ''{invFile}.yml''.
----
===== Ansible Roles Reference =====
==== airscan_load_elements ====
**Path:** ''deployment-playbooks/roles/airscan_load_elements/''\\
**Purpose:** Registers AirScan devices in the SNMP Manager and Orchestrator databases.
**Runs as part of:** ''rdf-agent-docker-deploy.yml'' (before RDF Agent deployment, only on ''airscan'' hosts).
=== Database Operations ===
**SNMP Manager (''snmp_manager'' schema):**
- Check if ''network_element'' exists by ''ip_address''
- Insert ''site'' if missing
- Insert ''network_element'' (technology=''AirScan'', ne_type=''Controller'')
- Delete + re-insert ''site_network_element''
- Insert ''expected_heartbeat'' (15-minute interval)
**Orchestrator (''orchestrator'' schema):**
- Insert ''customer_site''
- Insert ''agent'' and ''user_role''
- Insert or update ''element'' (links to SNMP Manager via ''entry_point_id'')
- Insert ''schedule'' (hourly)
- Delete old ''schedule_config'' for PERFORMANCE/POLL
**API calls:**
- Login to Orchestrator
- Get short install code for the agent
- Fetch agent install script to extract ''rdf_access_token''
=== Variables ===
Defaults and the full list of variables used by this role are in [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_load_elements/defaults/main.yml|roles/airscan_load_elements/defaults/main.yml]]. The SQL operations and variable usage can be seen in [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_load_elements/tasks/main.yml|roles/airscan_load_elements/tasks/main.yml]].
Variables come from two sources:
* **Google Sheet** (via inventory) — device IP, name, cluster, site
* **''vars_for_airscan.yml''** (generated from ''hosts.ini'') — DB hosts, orchestrator URL, credentials
----
==== airscan_modem_manager ====
**Path:** ''deployment-playbooks/roles/airscan_modem_manager/''\\
**Purpose:** Deploys the Modem Manager application and configures networking on the device.
=== Deployment Steps ===
- Stop and disable ModemManager
- Configure systemd-networkd for modem interface
- Configure NetworkManager to leave ''usb0'' and ''eth0'' unmanaged
- Create ''/opt/services/airscan/''
- Render ''docker-compose.yml'' and ''.env'' from templates
- Docker login and pull image
- ''docker compose up -d''
- Wait for port 5000
- Verify modem responds: ''curl http://localhost:5000/modem/at'' with body ''AT'' — expect ''OK''
- Optionally write version to Google Sheet
- Install recovery cron job
- Install vnstat for bandwidth monitoring
=== Variables ===
Defaults and the full list of variables are in [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/airscan_modem_manager/defaults/main.yml|roles/airscan_modem_manager/defaults/main.yml]].
----
==== rdf-agent ====
**Path:** ''deployment-playbooks/roles/rdf-agent/''\\
**Purpose:** Deploys the RDF Agent application on AirScan (and non-AirScan) hosts.
=== Deployment Steps ===
- Create ''/opt/services/rdfagent/''
- Render ''docker-compose.yml'' and ''.env'' from templates
- Docker login and pull image
- ''docker compose up -d''
- Optionally write version to Google Sheet
=== AirScan vs Non-AirScan ===
^ Aspect ^ AirScan ^ Non-AirScan ^
| Network mode | ''host'' | Bridge (ports 8080, 162/udp) |
| Privileged | Yes | No |
| ''IS_AIRSCAN'' | ''true'' | ''false'' |
| Volumes | ''/var/lib/vnstat'' mounted | None |
| SNMP listener IP | ''wireguard_ip'' | Default |
=== Variables ===
Defaults and the full list of variables are in [[https://bitbucket.org/errigal/deployment-playbooks/src/master/roles/rdf-agent/defaults/main.yml|roles/rdf-agent/defaults/main.yml]].
----
==== airscan_write_to_google_sheet ====
**Path:** ''deployment-playbooks/roles/airscan_write_to_google_sheet/''\\
**Purpose:** Writes deployment results back to the Google Sheet.
Runs the ''update_google_sheet.py'' script in a one-off Docker container (''airscanautoconfiguration'' image). Updates a single row by hostname with key-value pairs.
**Used by:**
* ''wireguard_client'' — writes ''wireguard_ip''
* ''airscan_modem_manager'' — writes ''airscan_modem_manager_version_actual''
* ''rdf-agent'' — writes ''rdf_agent_version_actual''
----
===== Database Element Registration =====
==== Initial Registration (Ansible) ====
When the Jenkins pipeline runs with RDF Agent deployment, the ''airscan_load_elements'' role performs:
- **Check by IP:** ''SELECT id FROM snmp_manager.network_element WHERE ip_address = '{wireguard_ip}' ''
- **If not found:** INSERT new ''network_element'' with ''name = '{name_in_platform}' '', ''ip_address = '{wireguard_ip}' '', ''technology = 'AirScan' ''
- **Link to site:** DELETE + re-INSERT ''site_network_element''
- **Add heartbeat:** INSERT ''expected_heartbeat'' for monitoring (15-minute interval)
- **Orchestrator:** INSERT IGNORE ''customer_site'', ''agent'', ''element'' with ''entry_point_id = {ne_id}''
The matching key is **IP address** (''wireguard_ip''), not device name. If a ''network_element'' already exists for that IP, the INSERT is skipped.
==== Ongoing Sync (RDFElementSyncJob) ====
SnmpManager runs a scheduled sync job every 60 seconds:
- ''NetworkElement.afterUpdate()'' / ''afterInsert()'' writes to ''network_element_change_sync''
- ''RDFElementSyncJob'' reads change records (< 5 days old, with valid IP)
- For each change, POSTs to orchestrator ''/api/v1/element/update''
- Orchestrator ''ElementService.updateElement()'' finds element by ''entry_point_id''
- If found: updates IPs, credentials, technology, onAir status
- If not found: creates new element
**Key code paths:**
* Change trigger: ''snmpmanager_grails3/.../domain/.../NetworkElement.groovy'' (''afterUpdate'', ''afterInsert'', ''addChangeSyncRecord'')
* Sync service: ''snmpmanager_grails3/.../services/.../RdfElementSyncService.groovy''
* Orchestrator handler: ''rdf_orchestrator/.../service/element/ElementService.java''
==== Known Duplicate Element Issue ====
The orchestrator's unique constraint is on ''(entry_point_id, customer_site_id)'' — not just ''entry_point_id''. For AirScan elements, ''task_processing_agent_override'' should always be set — this ensures a direct correlation between the agent and the element so that tasks for the element are processed on the correct agent running on that device. When the override is set, ''customer_site_id'' is not used for agent routing. This can cause problems:
* The sync job's ''findByEntryPointId()'' does **not** filter by ''customer_site_id''
* If multiple elements exist for the same ''entry_point_id'' with different ''customer_site_id'', the lookup may return an unpredictable one
* The Ansible ''INSERT IGNORE'' can silently fail if a row already exists with different data
* If the ''customer_site_id'' mapping changes (e.g. site renamed), a new element can be created alongside the old one
----
===== Modem Manager API Reference =====
All endpoints are served on port 5000. Routes are defined using Flask-Classful across three files in the [[https://bitbucket.org/errigal/airscanmodemmanager|airscanmodemmanager]] repository:
^ Route base ^ Source file ^ Description ^
| ''/'' | [[https://bitbucket.org/errigal/airscanmodemmanager/src/master/app/app.py|app/app.py]] | Root routes: health check (''/''), status (''/status'') |
| ''/modem/'' | [[https://bitbucket.org/errigal/airscanmodemmanager/src/master/app/modem/modemApi.py|app/modem/modemApi.py]] | Modem operations: AT commands, carrier connect, handoff, ICCID, signal info, band unlock |
| ''/system/'' | [[https://bitbucket.org/errigal/airscanmodemmanager/src/master/app/system/systemApi.py|app/system/systemApi.py]] | System utilities: ping via specific interface |
----
===== Deployment =====
==== Deploying a New AirScan Device ====
- **Add device to Google Sheet:** Add a row in the appropriate tab (e.g. ''cts/production'') with hostname, ''configure=yes'', ''wireguard_peer'', ''apn'', ''name_in_platform'', ''cluster_name'', ''site_name'', and desired versions.
- **Run Jenkins pipeline** with ''source'' matching the sheet tab. Enable all relevant parameters:
* ''CONFIGURE_WIREGUARD=true'' — sets up VPN tunnel
* ''CONFIGURE_AIRSCAN_MODEM_MANAGER=true'' — deploys modem manager
* ''CONFIGURE_RDF_AGENT=true'' — registers DB elements and deploys agent
- **Verify:**
* WireGuard tunnel is up: ''sudo wg show'' on ''rdflb_server''
* Modem Manager responds: ''curl http://{wireguard_ip}:5000/modem/at'' (via tunnel)
* RDF Agent container running: ''docker ps | grep rdf'' on the device
* Element exists in DB: check ''snmp_manager.network_element'' and ''orchestrator.element''
==== Updating Application Versions ====
- Update ''rdf_agent_version'' or ''airscan_modem_manager_version'' in the Google Sheet row.
- Run Jenkins pipeline with the appropriate parameter enabled (''CONFIGURE_RDF_AGENT'' or ''CONFIGURE_AIRSCAN_MODEM_MANAGER'').
- The role pulls the new image, restarts the container, and writes the actual deployed version back to the sheet (''rdf_agent_version_actual'' / ''airscan_modem_manager_version_actual'').
==== Docker Images ====
^ Image ^ Registry Path ^ Build ^
| Modem Manager | ''registry.errigal.com/airscan/airscanmodemmanager:{version}'' | Jenkins (jenkinsCommon), multi-arch (amd64, arm64, arm/v8) |
| RDF Agent | ''registry.errigal.com/rdf_agent:{version}'' | Drone CI, JAR uploaded to S3 |
| Autoconfiguration | ''registry.errigal.com/airscanautoconfiguration:{tag}'' | Built during pipeline run |
| Ansible runner | ''registry.errigal.com/ansibledockerimage:latest'' | Pre-built image for running playbooks |
----
===== Troubleshooting =====
==== Device Not Connecting ====
Check in order:
=== 1. WireGuard Tunnel ===
SSH to ''rdflb_server'' and check if the device's peer is active:
sudo wg show
# Look for the device's peer — check "latest handshake" time
The device's WireGuard IP: ''{internal_subnet base}.{wireguard_peer + 1}''\\
Example: ''internal_subnet=10.13.20.0'', ''wireguard_peer=20'' → IP ''10.13.20.21''
=== 2. Google Sheet ===
Check the sheet tab for the customer:
* Is the device listed with ''configure = yes''?
* Is ''name_in_platform'' correct and matching the DB?
* Are ''wireguard_ip'' and ''wireguard_peer'' correct?
=== 3. Database: snmp_manager.network_element ===
-- Find the device by IP
SELECT id, name, ip_address, on_air, cluster_name, site_name
FROM snmp_manager.network_element
WHERE ip_address = '{wireguard_ip}';
-- Check for duplicates by name
SELECT id, name, ip_address, on_air, cluster_name
FROM snmp_manager.network_element
WHERE name LIKE '%{device_identifier}%';
=== 4. Database: orchestrator.element ===
-- Find element by entry_point_id (= network_element.id)
SELECT e.id, e.entry_point_id, e.external_ip, e.internal_ip, e.on_air,
e.customer_site_id, e.task_processing_agent_override
FROM orchestrator.element e
WHERE e.entry_point_id = {ne_id};
-- Check for duplicate elements by IP
SELECT e.id, e.entry_point_id, e.external_ip, cs.name as site_name
FROM orchestrator.element e
JOIN orchestrator.customer_site cs ON e.customer_site_id = cs.id
WHERE e.external_ip = '{wireguard_ip}';
=== 6. Container Status on Device ===
SSH to the device (via ProxyJump through ''rdflb_server''):
# Check both containers
docker ps
# RDF Agent logs
docker logs rdfagent
# Modem Manager logs
docker logs airscanmodemmanager
=== 7. Heartbeat Monitoring ===
SELECT * FROM snmp_manager.expected_heartbeat
WHERE network_element_id = {ne_id};
----
==== Device Name Changed — Element Mismatch ====
**Symptom:** Customer changed the device name in the platform UI. Device may stop working or show stale data.
**What happens:**
- ''network_element'' name changes
- ''afterUpdate()'' fires, creating a ''network_element_change_sync'' record
- Sync job pushes updated data to orchestrator
- Orchestrator matches by ''entry_point_id'' (not name), so the element updates correctly
**However**, if the pipeline is re-run with a **different** ''name_in_platform'':
* The playbook checks by **IP address**, not name
* If the IP exists, it skips the INSERT (existing element is reused)
* The name in ''network_element'' is **NOT** updated by the playbook
**If duplicates exist:**
-- Find duplicate network_elements
SELECT id, name, ip_address, on_air, cluster_name
FROM snmp_manager.network_element
WHERE technology = 'AirScan'
AND (name LIKE '%{old_name}%' OR name LIKE '%{new_name}%' OR ip_address = '{wireguard_ip}');
-- Find duplicate orchestrator elements
SELECT e.id, e.entry_point_id, e.external_ip, e.on_air, e.customer_site_id,
cs.name as site_name
FROM orchestrator.element e
LEFT JOIN orchestrator.customer_site cs ON e.customer_site_id = cs.id
WHERE e.external_ip = '{wireguard_ip}'
OR e.entry_point_id IN (
SELECT id FROM snmp_manager.network_element
WHERE name LIKE '%{old_name}%' OR name LIKE '%{new_name}%'
);
The correct element should have:
* ''entry_point_id'' matching the ''snmp_manager.network_element.id'' for that IP
* ''customer_site_id'' matching the correct site
* ''task_processing_agent_override'' pointing to the correct agent
Update ''name_in_platform'' in the Google Sheet to match, then re-run the pipeline if needed.
----
==== Common Pipeline Failures ====
^ Failure ^ Cause ^ Fix ^
| Vault password error | ''{envVar}_ansible_vault_pass'' missing in Jenkins | Add credential in Jenkins |
| Sheet access denied | Service account lacks access | Share sheet with ''scotty@environment-app-versions.iam.gserviceaccount.com'' |
| Tab not found | ''source'' param doesn't match sheet tab name | Verify tab name matches ''{customer}/{sheet_name}'' exactly |
| Missing host groups | ''hosts.ini'' lacks ''rdf-orchestrator'', ''rdf-lb'', etc. | Update ''env-configuration/{envVar}/hosts.ini'' |
| WireGuard timeout | Peer unreachable or interface down | Check wireguard_server, peer config, firewall |
| Element INSERT fails | Cluster or site doesn't exist in DB | Create cluster/site first, or check ''cluster_name''/''site_name'' in sheet |
| Docker pull fails | Registry auth or image not found | Check ''errigal_docker_registry'' credentials and image tag |
----
==== Useful Log Locations ====
^ Component ^ Location ^
| SnmpManager | Application logs — search for ''RDFElementSync'' entries |
| Orchestrator | Application logs — search for ''Updating Element with EntryPointId'' |
| Jenkins | Build console output — includes debug from inventory generation |
| WireGuard | ''sudo wg show'' on wireguard_server or ''journalctl -u wg-quick@{interface}'' |
| Modem Manager | ''docker logs'' on device, or ''/var/log/airscanmodemmanager_recovery/'' for recovery |
| RDF Agent | ''docker logs'' on device, or file at ''LOGGING_FILE_PATH'' |
----
===== Customer Environment Reference =====
Each customer has:
* ''env-configuration/{customer}/hosts.ini'' — main infrastructure inventory
* ''env-configuration/{customer}/group_vars/all/30_all.yml'' — environment variables
* Google Sheet tab ''{customer}/...'' — dynamic AirScan configuration
The pipeline generates a dynamic inventory at ''env-configuration/{customer}/{invFile}.yml'' from the Google Sheet.
==== CTS Example ====
^ Item ^ Value ^
| hosts.ini | ''env-configuration/cts/hosts.ini'' — defines ctsapps1/2, ctslb1, ctsoat1/2, ctsesk1/2 |
| Google Sheet tab | ''cts/production'' — dynamic config with EAS-prefixed hostnames |
| OAT servers | ctsoat1 (10.0.87.65), ctsoat2 (10.0.87.115) |
| DB host | ''cts-master-prod.cl0y2kknu458.us-east-1.rds.amazonaws.com'' |
| WireGuard | port 51822, network "cts", subnet 10.13.20.0 |
| Orchestrator URL | ''http://10.13.20.2:8079'' |