Monitor timeout due to random lags in State command

Hi everyone!

I would like to share with you our problem. For the moment I will explain it on a very specific scenario that we have here at Alba. We will work on preparing an isolated case, but maybe meanwhile some expert could already see an error on our side or a bug.

So let’s start…

The problem was discovered when integrating Keithley in the continuous scan.

The simplified Tango architecture is as follows (all written in Python):

Keithley device emits data ready events when it is involved in an acquisition.

On the client side (Sardana controller) we are listening to those events and in the event callback we execute a command on that device in order to obtain data. Apart of that, the acquisition action (a dedicated thread) executes periodically State command on that device in order to check if the acquisition have finished. All of that is done using a single DeviceProxy.

This causes random and sporadic timeouts when acquiring the monitor on the push_data_ready call.

DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]

We have run the Keithley DS with v5 Tango logging.

We were able to reproduce the problem three times (two of them produce the similar traces) and all of them points to the State command which locks the monitor for more than 3 seconds.

NORMAL EXECUTION OF STATE COMMAND:

1464270693 [140091939538688] In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering ... 
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541613: In rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464270693 [140091939538688] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541784: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541945: Out rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Signalling !
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)

CASE 1 (coincides with emitting Heartbeat event) OCCURENCE 1:

1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering ... 
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464270693 [140091931145984] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 15, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464270693 [140091931145984] DEBUG dserver/Keithley/rpastor Thread 15: waiting !!
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ----------> Time = 462270694,977220 Sending event heartbeat
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta time since last heartbeat 9
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): detected heartbeat event for tango://pc255.cells.es:10000/dserver/keithley/rpastor.heartbeat
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta _time 9
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): nb_event = 1
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Event heartbeat, next wake_up at 462270703,977598
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Sub device property storage, next wake_up at 462272410,474407
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Sleep for : 9000
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464270696 [140091931145984] DEBUG dserver/Keithley/rpastor TIME OUT for thread 15
DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]
]
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.779552: In rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464270696 [140091939538688] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.779722: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.780422: Out rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Signalling !
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)

CASE 1 (coincides with emitting Heartbeat event) OCCURRENCE 2:

1464274066 [139868710455040] In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering ... 
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464274066 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:46.932663: In rpastor/keithely2010/01::always_executed_hook()
1464274066 [139868710455040] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464274066 [139868710455040] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464274066 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:46.932833: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor owner_thread !!
1464274066 [139868361455360] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 2
1464274066 [139868361455360] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ----------> Time = 462274069,533521 Sending event heartbeat
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta time since last heartbeat 25
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): detected heartbeat event for tango://pc255.cells.es:10000/dserver/keithley/rpastor.heartbeat
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta _time 25
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): nb_event = 2
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Event heartbeat, next wake_up at 462274078,534326
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Sub device property storage, next wake_up at 462275844,004265
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Sleep for : 9000
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 10, ctr = 2
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor Thread 10: waiting !!
1464274070 [139868361455360] DEBUG dserver/Keithley/rpastor TIME OUT for thread 8
DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]
]
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor owner_thread !!
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464274070 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:50.147497: Out rpastor/keithely2010/01::always_executed_hook()
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Signalling !
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)

CASE 2 (no Hearbeat emit):

1464272713 [140721105397504] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 16, ctr = 2
1464272713 [140721105397504] DEBUG dserver/Keithley/rpastor Thread 16: waiting !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 2
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464272716 [140721105397504] DEBUG dserver/Keithley/rpastor TIME OUT for thread 16
DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]
]
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464272716 [140721113790208] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446286: Out rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Signalling !
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering ... 
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446446: In rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464272716 [140721097004800] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446530: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 8
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring ...
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 8
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446616: Out rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 8
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Signalling !
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering ... 
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)

We are using:
Tango 8.1.2 (with patches applied)
PyTango 8.1.1

If no one comments we will proceed in preparing an isolated scenario.

Many thanks in advance for any suggestions!

Hi Zibi,

Your case seems interesting. What could help even further is if you could generate a log file for your problem.
To do so in Jive select your DS admin device (dserver/xxx/yyy) and select the logging entry.
For the Logging level, set DEBUG, for the Logging target, set file and for the Logging RFT, set 20000.
Then restart your DS. In /tmp/tango-dserver/DS/instance/dserver_xxx_yyy you should now have a log file generated.
In fact the logging system will manage two log files in a round robbin way. The file size will be 20000 kB each.
Once you get your error, stop your DS and send us these files.

Ideally, if you could do this using Tango 9…

Cheers

Emmanuel

Hi Manu,

Thanks for the hint. Please see attached the log files, for both the admin device and the keithley device.
I see the logs a little bit different than the ones from my initial message. This may be due to the fact that now we have done the tests with Tango 9.1.0 and PyTango 9.1.0. Apart of that I attach you the output of the server around the time of the “Not able to acquire serialization” error (so you could find it easier in the log files…)


1464684463 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 8, ctr = 0
1464684463 [140515282249472] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::state (attribute) arrived
1464684463 [140515265464064] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 20, ctr = 1
1464684463 [140515265464064] DEBUG dserver/Keithley/monitor_serialization Thread 20: waiting !!
1464684463 [140515282249472] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() entering ... 
1464684463 [140515282249472] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() found : No associated device name!
1464684463 [140515282249472] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization ----------> Time = 462684464,455918 Sending event heartbeat
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization ZmqEventSupplier::push_heartbeat_event(): delta time since last heartbeat 300
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization ZmqEventSupplier::push_heartbeat_event(): detected heartbeat event for tango://controls04.cells.es:10000/d
server/keithley/monitor_serialization.heartbeat
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization ZmqEventSupplier::push_heartbeat_event(): delta _time 300
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization ZmqEventSupplier::push_heartbeat_event(): nb_event = 2
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization Event heartbeat, next wake_up at 462684473,456568
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization Sub device property storage, next wake_up at 462685964,631057
1464684464 [140515500812032] DEBUG dserver/Keithley/monitor_serialization Sleep for : 9000
1464684466 [140515273856768] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::ping arrived
1464684466 [140515273856768] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::ping
1464684466 [140515273856768] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 9, ctr = 1
1464684466 [140515273856768] DEBUG dserver/Keithley/monitor_serialization Thread 9: waiting !!
1464684466 [140515265464064] DEBUG dserver/Keithley/monitor_serialization TIME OUT for thread 20
 ----------------------Event generated lost: DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]
]
1464684466 [140515282249472] INFO test/rpastor/1 In test/rpastor/1::always_executed_hook()
1464684466 [140515282249472] DEBUG test/rpastor/1 Entering in state_machine ((Keithley2010(test/rpastor/1),))
1464684466 [140515282249472] DEBUG test/rpastor/1 Exiting state_machine
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 8, ctr = 1
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 8
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 8, ctr = 1
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 8
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 1, thread = 8
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization Signalling !
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464684466 [140515282249472] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::state (attribute)

Please let me know if you need more information, or you want us to reproduce it in a reduced setup.

Cheers,
Zibi

Hi Zibi,

If I tell you where to add some more debug info, could you re-compile Tango C++ lib and re-run your test?

Manu

Hi Manu,

Sure, either post them here, or attach a patch file.

Thanks for you help!
Zibi

Re,

Well, if you could add in the file server/blackbox.cpp in the method insert_corba_attr(xx) a cout5 log before and after the mutex
lock/unlock at the method beginning and at the method end. Something like


cout5 << "Before taking lock" << endl;

Please, also check that in the Keithley class, the log message in the always_executed_hook() method is the first line of code
in this method.
This will not solve the problem (may be some more changes will be required later on) but this should help us to determine precisely in which method the thread executing the state() request is waiting.

Then, re-run your test with the modified lib and send me the log files.

Thank’s for your help

Manu

Re,

Oups sorry, it’s not cout5 but cout4.

Cheers

Manu

Hi Manu,

Again please find attached the log files, for both the admin device and the keithley device.

The following lines were added:

  • Before taking lock
  • After taking lock
  • Before releasing lock
  • After releasing lock

Also, the self.info_stream(‘In %s::always_executed_hook()’ % self.get_name()) is the first line in the always_executed_hook method of the Keithley class.

And this is the output of the server around the error:


464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 0
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::state (attribute) arrived
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() entering ... 
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() found : No associated device name!
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Before taking lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization After taking lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Before releasing lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization After releasing lock
1464708083 [140488084489984] INFO test/rpastor/1 In test/rpastor/1::always_executed_hook()
1464708083 [140488084489984] DEBUG test/rpastor/1 Entering in state_machine ((Keithley2010(test/rpastor/1),))
1464708083 [140488084489984] DEBUG test/rpastor/1 Exiting state_machine
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 1
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 6
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 1
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 6
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 1, thread = 6
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Signalling !
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::state (attribute)
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 0
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::state (attribute) arrived
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() entering ... 
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() found : No associated device name!
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Before taking lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization After taking lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Before releasing lock
1464708083 [140488084489984] DEBUG dserver/Keithley/monitor_serialization After releasing lock
56
1464708083 [140488067704576] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 14, ctr = 1
1464708083 [140488067704576] DEBUG dserver/Keithley/monitor_serialization Thread 14: waiting !!
1464708086 [140488076097280] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::ping arrived
1464708086 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::ping
1464708086 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 7, ctr = 1
1464708086 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Thread 7: waiting !!
1464708087 [140488067704576] DEBUG dserver/Keithley/monitor_serialization TIME OUT for thread 14
 ----------------------Event generated lost: DevFailed[
DevError[
    desc = Not able to acquire serialization (dev, class or process) monitor
  origin = TangoMonitor::get_monitor
  reason = API_CommandTimedOut
severity = ERR]
]
1464708087 [140488084489984] INFO test/rpastor/1 In test/rpastor/1::always_executed_hook()
1464708087 [140488084489984] DEBUG test/rpastor/1 Entering in state_machine ((Keithley2010(test/rpastor/1),))
1464708087 [140488084489984] DEBUG test/rpastor/1 Exiting state_machine
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 1
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 6
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 6, ctr = 1
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 6
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 1, thread = 6
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Signalling !
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708087 [140488084489984] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::state (attribute)
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization DeviceImpl::state (attribute) arrived
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() entering ... 
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::get_associated_device() found : No associated device name!
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Before taking lock
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization After taking lock
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Before releasing lock
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization After releasing lock
1464708087 [140488076097280] INFO test/rpastor/1 In test/rpastor/1::always_executed_hook()
1464708087 [140488076097280] DEBUG test/rpastor/1 Entering in state_machine ((Keithley2010(test/rpastor/1),))
1464708087 [140488076097280] DEBUG test/rpastor/1 Exiting state_machine
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 7, ctr = 1
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 7
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In get_monitor() test/rpastor/1, thread = 7, ctr = 1
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization owner_thread !!
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Attribute::fire_change_event() entering ...
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 2, thread = 7
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization In rel_monitor() test/rpastor/1, ctr = 1, thread = 7
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Signalling !
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization SubDevDiag::set_associated_device() entering ... 
1464708087 [140488076097280] DEBUG dserver/Keithley/monitor_serialization Leaving DeviceImpl::state (attribute)

If you need anything else, just let me know.

Cheers,
Zibi

Hi Zibi,

The mystery is still there… We see code executed correctly in the C++ library but then the thread stops…

Could you send us your DS code?
Do you also confirm that some client is requiring the device state every 10 mS (or closed to 10 mS)?

Thank’s in advance

Manu

Hi Manu,

Yes, the acquisition thread (in the client) is verifying the State every 10 mS.

My colleague will shortly attach the DS code.

Cheers,
Zibi

Hi Manu,

first of all, thanks for your help.
device.pyI attach the DS code as you requested. It is a little bit messy, but I hope it will be useful ().

keithley.tar.gzIn case you need all the module, I also attach it in a tar-gz ()

If you require anything else let us know.
Again, many thanks.
Roger

Hi again Manu,

MonitorLockSerializationServer.pyMonitorLockSerializationClientWe have generated some simple scripts (server and client ) to reproduce the problem.

The server has three attributes, one that generates data_ready events, one that generate change events instead and finally a sleep_time variable that, guess what, we use to execute sleeps (quite unexpected).
Both the attr that generate events start to do so when written. Their write method starts a thread that generates events waiting between each generation.
To stop the event generation we have implemented a StopThread command that kills 'em all.
Finally, but not less important for it, we have implemented a command that makes sleeps (CommandSleep).

Our client simply receive a parameter and executes a method that, depending on the input parameter of the client, starts one of the events generation of the server, then it makes a subscription to the attr and, finally, it starts a loop that executes a command_inout of the device (the CommandSleep)

The setup is based in this scripts but to build it, it is necessary to:

Start the server:
Open a console and define the device as follows:
$ tango_admin --add-server MonitorLockSerializationServer/LockTest MonitorLockSerialization test/monitor_lock/1
Once the device has been defined, start it from the directory where its file is located with:
$ python MonitorLockSerializationServer.py LockTest -v4

To start the client
From another console of a host with the same TANGO_DB we run the script from the directory where it is located like:
$ ./MonitorLockSerializationClient ChangeEvent
if we want to test the setup with change events.
$ ./MonitorLockSerializationClient DataReadyEvent
if we want to test the setup with data_ready events instead.

If we run the setup using the ChangeEvent parameter no error will happen and the system will work as expected.
If we run the setup using the DataReadyEvent parameter, after some loops (10 exactly, as the CommandSleep executes 10 times faster than the push data ready) the client will crash from a timeout executing the CommandSleep and the server will generate the Not able to acquire serialization monitor error. Playing with the sleeps time of the client we can force the problem to happen at the first loop (i.e. increasing the sleep time on the CommandSleep) but as it is now, the error happens quite fast either way.

Cheers,
Roger

Hi,

Thank’s for your minimal test case. I have reproduced the problem here. Will look into it ASAP

Cheers

Manu

Re,

After some analysis of what happens with Tiago, we found out the reason of the problem. It’s a PyTango bug related to the
Python GIL management. The change event and the data ready event are not managed the same way in PyTango. For change_event,
there is some stuff related to GIL but not for data_ready_event and this is the reason of the problem.

Tiago will look after this.

Have a nice evening

Tiago + manu

Thanks to both!

Then we will wait for the fix in PyTango.

I Zibi,

Would you be kind and make a bug report on PyTango github

Thanks in advance

Tiago