Stratusphere Hub performance issue caused by Interval Logging in Network Station policies – Liquidware Customer Support

Problem:

Stratusphere Hub performance issue caused by Interval Logging in Network Station policies.

Symptoms:

Stratusphere Hub’s UI becomes sluggish, and any reports involving network stats take forever to finish. Stratusphere Hub and Network Stations show that there are a large amount of audit records pending.

Diagnosis:

Stratusphere Hub (if standalone) or Stratusphere Database appliance may show very high load average. The appliance may show the following when you execute “su - postgres -c ‘pg_top’” command as root:

Screen_Shot_2013-11-01_at_1.02.15_PM.png

Also when you inspect the UI, Network Stations under Station Administration, you may see the Status of the Station listed as WARN. Select the Station, click on View Properties, and the status shows that there are many pending audit records cached on the Station.

Possible Resolution(s):

(We recommend you to reach out to Liquidware Labs support when you experience this issue, as there are other possible causes that may have similar effect.)

This issue may be caused by interval logging on long-running TCP connections. When Network Station Policies contain Full logging at Interval, it will create a network audit record on each long-running TCP connection it sees. At each audit interval (which the default is 1 minute), it will continue to add record and update all the previous records associated with this connection. When there are many long-running connections, and Stations have been tracking them for a very long time, this traversing update process adds a huge load to the database, thus the high load average you see when you run ‘pg_top’. You may do the following step to resolve this issue:

Update your Stratusphere appliances to the latest. Stratusphere 5.5.1 has the improved process that will resolve this issue. You may find the upgrade procedures in this KB (https://liquidwarelabs.zendesk.com/entries/24930252-Upgrade-Stratusphere-appliances-online-version-5-0-or-above-) and in the Release Notes (http://www.liquidwarelabs.com/docs/LiquidwareLabs.Stratusphere.ReleaseNotes.pdf)
Inspect your Network Station Policies. Turn off any Full Logging at Interval that you do not need. You can keep Full Logging, but avoid ‘at Interval’ where you don’t need it. Push the Network Station Policies to Stations after the change.
Remove all pending audit records that are cached on the Stations. Review Network Stations’ status; if it shows WARN in the status, verify that it has a large number of pending audit records. Logon to the Station and run the following commands: ***See optional automated script below***

ls -l /var/log/audit1/*.data | wc -l (this shows the number of pending records)

rm -f /var/log/audit1/*.data (remove pending records)

sv restart tnt-backend tnt-backend-priv (restart backend services)

Continue onto the next Station affected by this problem.

***Optional automated script method to clear cached network records from ALL network stations attached to a given hub***

1. Upload attached clear-stations.tgz file to your Stratusphere HUB appliance using WinSCP as friend/sspassword to the first directory shown in WinSCP (/home/friend)

2. SSH to the HUB using Putty as friend/sspassword

Execute the following commands:

3. tar -zxvf clear-stations.tgz

4. cd clear-stations

5. su

6. ./clear-stations.sh

This script requires all passwords (both friend and root) be the default sspassword and that all Network Stations are already joined to or registered with the HUB in question.

Database's load average should drop dramatically after the change.

Product: Stratusphere Fit/UX

Product Version: 5.x or above

Expires on: 365 days from publish date

Updated: November 1, 2013

clear-stations.tgz
1 KB Download

Related articles