PyAlarm Device Server User Guide¶
Contents
- PyAlarm Device Server User Guide
Description¶
This device server is used as a alarm logger, it connects to the list of attributes provided and verifies its values.
Its focused on notifying Alarms by log files, Mail, SMS and (some day in the future) electronic logbook.
You can acknowledge these alarms by a proper command.
Internal Structure¶
The device server behaviour relies on three python objects: AlarmAPI, updateAlarms thread and TangoEval.
Each alarm is independent in terms of formula and receivers; but all alarms within the same PyAlarm device will share a common evaluation environment determined by PyAlarm properties.
The AlarmAPI¶
This object encapsulates the access to the alarm configurations database. Tango Database is used by default, all alarm configurations are stored as device properties of each declared PyAlarm device (AlarmList, AlarmReceivers, AlarmSeverities).
The api object allows to load alarms, reconfigure them and transparently move Alarms between PyAlarm devices.
The updateAlarms thread¶
This thread will be executed periodically at a rate specified by the PollingPeriod. All Enabled alarms will be evaluated at each cycle; and if evaluated to a True value (understood as any value not in (0,””,None,False,[],{})).
Once an Alarm has been active by a number of cycles equal to the device AlarmThreshold it will become Active. Then the PyAlarm will process all elements of the AlarmReceivers list.
The TangoEval engine¶
This engine will automatically replace each Tango attribute name in the formula by its value. It will also provide several methods for searching attribute names in the tango database.
Amongst other features, all values are kept in a cache with a depth equal to the AlarmThreshold+1. This cache allows to create alarms using .delta or inspecting the cache for specific behaviors.
Alarm Syntax Recipes¶
Alarms are parsed and evaluated using fandango.TangoEval class.
Sending a Test Message at Startup¶
This alarm formula is just “True” ; therefore will be enabled immediately sendin an email message to test@tester.com
AlarmList -> DEBUG:True
AlarmDescriptions -> DEBUG:The PyAlarm Device $NAME has been restarted
AlarmReceivers -> DEBUG: test@tester.com
Testing a device availability¶
It is done if you put directly the name of the device or its State as a condition by itself. In the second case and alarm will be triggered either if the Pressure is above threshold or the device is not reachable.
PRESSURE:SR/VC/VGCT/Pressure > 1e-4
STATE_AND_PRESSURE:?SR/VC/VGCT and SR/VC/VGCT/Pressure > 1e-4
Getting Tango state/attribute/value/quality/time/delta in formulas¶
The Alarm syntax allows to add the following clauses to the attribute name (value returned by default):
some/device/name{/attribute}{.value/all/time/quality/delta/exception}
attribute: if no attribute name is given, then device state is read.
PLC_Alarm: BL22/CT/EPS-PLC-01 == FAULT
value: default, returns the value of the attribute
Pressure_Alarm: BL22/CT/EPS-PLC-01/CC1_AF.value > 1e-5
time: returns the epoch in seconds of the last value read
Not_Updated: BL22/CT/EPS-PLC-01/CPU_Status.time < (now-60)
quality : returns the tango quality value (ATTR_VALID, ATTR_INVALID, ATTR_WARNING, ATTR_ALARM).
Temperature_Alarm: BL22/CT/EPS-PLC-01/OP_WBAT_OH01_01_TC11.quality == ATTR_ALARM
delta : returns the variation of the value in the last N=AlarmThreshold reads (stored in TangoEval.cache array of size AlarmThreshold+1)
Valve_Just_Closed: BL22/CT/EPS-PLC-01/VALVE_11.delta == -1
exception : True if the attribute is unreadable, False otherwise
Not_Found: BL22/CT/EPS-PLC-01/I_Dont_Exist.exception
all : returns the raw attribute object as returned by PyTango.DeviceProxy.read_attribute method.
Creating a periodic self-reset alarm¶
A simple clock alarm would use the current time and will set AlarmThreshold, PollingPeriod and AutoReset properties. See this example:
A single formula clock would be more hackish; this alarm will execute a command on its own formula
PERIODIC:(FrontEnds/VC/Elotech-01/Temperature and FrontEnds/VC/VGCT-01/P1 \
and (1920<(now%3600)<3200)) or (ResetAlarm('PERIODIC') and False)
Enabling search, expression matching and list comprehensions¶
Having the syntax dom/fam/mem/attr.quality whould allow us to call attrs like:
any([ATTR_ALARM==s+'.quality' for s in FIND('dom/fam/*/pressure')])
One way may be using QUALITY, VALUE, TIME key functions:
any([ATTR_ALARM==QUALITY(s) for s in FIND('dom/fam/*/pressure')])
The use of FIND allows PyAlarm to prepare a list Taurus models that can be redirected from an <pre>event_received(...)</pre> hook.
Some list comprehension examples¶
any([s for s in FIND(SR/ID/SCW01/Cooler*Err*)])
equals to
any(FIND(SR/ID/SCW01/Cooler*Err*))
The negate:
any([s==0 for s in FIND(SR/ID/SCW01/Cooler*Err*)])
is equivalent to
any(not s for s in FIND(SR/ID/SCW01/Cooler*Err*)])
is equivalent to
not all(FIND(SR/ID/SCW01/Cooler*Err*))
is equivalent to
[s for s in FIND(SR/ID/SCW01/Cooler*Err*) if not s]
Grouping Alarms in Formulas¶
The proper way is (for readability I use upper case letters for alarms):
ALARM_1: just/my/tango/attribute_1
ALARM_2: just/my/tango/attribute_2
then:
ALARM_1_OR_2: ALARM_1 or ALARM_2
or:
ALARM_1_OR_2: any(( ALARM_1 , ALARM_2 ))
or:
ALARM_ANY: any( FIND(my/alarm/device/ALARM_*) )
Any alarm you declare becomes both a PyAlarm attribute and a variable that you can anywhere (also in other PyAlarm devices). You don’t trigger any new read because you just use the result of the formula already evaluated.
The GROUP is used to tell you that a set of conditions has changed from its previous state. GROUP instead will be triggered not if any is True, but if any of them toggles to True. It forces you to put the whole path to the alarm:
GROUP(my/alarm/device/ALARM_[12])
PyAlarm Device Properties¶
Distributing Alarms between servers¶
Alarms can be distributed between PyAlarm servers using the PyAlarm/AlarmsList property. A Panic system works well with 1200+ alarms distributed in 75 devices, with loads between 5 and 70 attrs/device. But instead of thinking in terms of N attrs/pyalarm you must distribute load trying to group all attributes from the same host or subsystem.
There are two reasons to do that (and also apply to Archiving):
- When a host is down you’ll have a lot of proxy threads in background trying to reconnect to lost devices. If alarms are distributed on rough numbers it becomes a lot of timeouts spreading through the system. When alarms are grouped by host you isolate the problems.
- Same applies for very event-intensive devices. Devices that generate a lot of information will need lower attrs/pyalarm ratio than devices that do not change so much.
But, it is a good advice to keep the overall number of alarms in the system below 10K alarms. For manageability of the log system and avoid avalanches of useless information the logical number of alarms should be around or below 1000.
Alarm Declaration Properties¶
AlarmList¶
Format of alarms will be:
TAG1:LT/VC/Dev1
TAG2:LT/VC/Dev1/State
TAG3:LT/VC/Dev1/Pressure > 1e-4
NOTE: This property was previously called AlarmsList; it is still loaded if AlarmList is empty for backward compatibility
AlarmDescriptions¶
Description to be included in emails for each alarm. The format is:
TAG:AlarmDescriptions...
NOTE: Special Tags like $NAME (for name of PyAlarm device) or $TAG (for name of the Alarm) will be automatically replaced in description.
AlarmReceivers¶
TAG1:vacuum@accelerator.es,SMS:+34935924381,file:/tmp/err.log
vacuum@accelerator.es:TAG1,TAG2,TAG3
Other options are SNAP or ACTION:
user@cells.es,
SMS:+34666777888, #If SMS sending available
SNAP, #Alarm changes will be recorded in SNAP database.
ACTION(alarm:command,mach/alarm/beep/play_sequence,$DESCRIPTION)
Adding ACTION as receiver¶
Executing a command on alarm/disable/reset/acknowledge:
ACTION(alarm:command,mach/alarm/beep/play_sequence,$DESCRIPTION)
The syntax allow both attribute/command execution and the usage of multiple typed arguments:
ACTION(alarm:command,mach/dummy/motor/move,int(1),int(10))
ACTION(reset:attribute,mach/dummy/motor/position,int(0))
Also commands added to the Class property @AllowedCommands@ can be executed:
ACTION(alarm:system:beep&)
PhoneBook (not implemented yet)¶
File where alarm receivers aliases are declared; e.g.
User:user@accelerator.es;SMS:+34666555666
Default location is: `` $HOME/var/alarm_phone_book.log ``
If User and Operator are defined in phonebook, AlarmsReceivers can be:
TAG2:User,Operator
REMINDER / RECOVERED / AUTORESET messages¶
Reminder¶
If a number of seconds is set, a reminder mail will be sent while the alarm is still active, if 0 no Reminder will be sent.
AlertOnRecovery¶
A message is sent if an alarm is active but the conditions of the attributes return to a safe value. To enable the message the content of this property must contain ‘email’, ‘sms’ or both. If disabled no RECOVERY/AUTO-RESET messages are sent.
Snapshot properties¶
UseSnap¶
If false no snapshots will be trigered (unless specifically added to receivers using “SNAP” ),
CreateNewContexts¶
It enables PyAlarm to create new contexts for alarms if no matching context exists in the database.
Alarm Configuration Properties¶
(In future releases these properties could be individually configurable for each alarm)
Enable : If False forces the device to Disabled state and avoids messaging.
LogFile : File where alarms are logged Default: “/tmp/alarm_$NAME.log”
FlagFile : File where a 1 or 0 value will be written depending if theres active alarms or not.n<br>This file can be used by other notification systems. Default: “/tmp/alarm_ds.nagios”
PollingPeriod : Periode in seconds. in which all attributes not event-driven will be polled. Default: 60000
MaxAlarmsPerDay : Max Number of Alarms to be sent each day to the same receiver. Default: 3
AlarmThreshold : Min number of consecutive Events/Pollings that must trigger an Alarm. Default: 3
FromAddress : Address that will appear as Sender in mail and SMS Default: “controls”
SMSConfig : Arguments for sendSMS command Default: ”:”
MaxMessagesPerAlarm : To avoid the previous property to send a lot of messages continuously this property has been added to limit the maximum number of messages to be sent each time that an alarm is enabled/recovered/reset.
StartupDelay : Time that PyAlarm waits before starting the Alarm evaluation threads.
EvalTimeout : Timeout for read_attribute calls, in milliseconds .
UseProcess : To create new OS processes instead of threads.
Device Server Example¶
These will be the typical properties of a PyAlarm device
#---------------------------------------------------------
# SERVER PyAlarm/AssemblyArea, PyAlarm device declaration
#---------------------------------------------------------
PyAlarm/AssemblyArea/DEVICE/PyAlarm: "LAB/VC/Alarms"
# --- LAB/VC/Alarms properties
LAB/VC/Alarms->AlarmDescriptions: "OVENPRESSURE:The pressure in the Oven exceeds Range",\
"ADIXENPRESSURE:The pressure in the Roughing Station exceeds Range",\
"OVENTEMPERATURE:The Temperature of the Oven exceeds Range",\
"DEBUG:Just for debugging purposes"
LAB/VC/Alarms->AlarmReceivers: OVENPRESSURE:somebody@cells.es,someone_else@cells.es,SMS:+34999666333,\
ADIXENPRESSURE:somebody@cells.es,someone_else@cells.es,SMS:+34999666333,\
OVENTEMPERATURE:somebody@cells.es,someone_else@cells.es,SMS:+34999666333,\
DEBUG:somebody@cells.es
LAB/VC/Alarms->AlarmsList: "OVENPRESSURE:LAB/VC/BestecOven-1/Pressure_mbar > 5e-4",\
"OVENRUNNING:LAB/VC/BestecOven-1/MaxValue > 70",\
"ADIXENPRESSURE:LAB/VC/Adixen-01/P1 > 1e-4 and OVENRUNNING",\
"OVENTEMPERATURE:LAB/VC/BestecOven-1/MaxValue > 220",\
"DEBUG:OVENRUNNING and not PCISDOWN"
LAB/VC/Alarms->PollingPeriod: 30
LAB/VC/Alarms->SMSConfig: ...
Mail Messages¶
Format of Alarm message
Subject: LAB/VC/Alarms: Alarm RECOVERED (OVENTEMPERATURE)
Date: Wed, 12 Nov 2008 11:52:39 +0100
TAG: OVENTEMPERATURE
LAB/VC/BestecOven-1/MaxValue > 220 was RECOVERED at Wed Nov 12 11:52:39 2008
Alarm receivers are:
somebody@cells.es
someone_else@cells.es
Other Active Alarms are:
DEBUG:Fri Nov 7 18:37:35 2008:OVENRUNNING and not PCISDOWN
OVENRUNNING:Fri Nov 7 18:37:17 2008:LAB/VC/BestecOven-1/MaxValue > 70
Past Alarms were:
OVENTEMPERATURE:Fri Nov 7 20:49:46 2008
Format of Recovered message
Subject: LAB/VC/Alarms: Alarm RECOVERED (OVENTEMPERATURE)
Date: Wed, 12 Nov 2008 11:52:39 +0100
TAG: OVENTEMPERATURE
LAB/VC/BestecOven-1/MaxValue > 220 was RECOVERED at Wed Nov 12 11:52:39 2008
Alarm receivers are:
somebody@cells.es
someone_else@cells.es
Other Active Alarms are:
DEBUG:Fri Nov 7 18:37:35 2008:OVENRUNNING and not PCISDOWN
OVENRUNNING:Fri Nov 7 18:37:17 2008:LAB/VC/BestecOven-1/MaxValue > 70
Past Alarms were:
OVENTEMPERATURE:Fri Nov 7 20:49:46 2008