Attribute Events gets missed

Hi All,

We have subscribed an attribute change event. The polling period of attribute is kept 200. We have observed that some of the events are getting missed. If the polling period is reduced too much like 10, then it results in low performance.
Whether anyone observed the same problem? How to get it resolved?

Thanks & Regards
TCS-GMRT

Hi,

Is your problem the same as this one which was reported by Andy: CPU load when device has large number of attributes · Issue #424 · tango-controls/cppTango · GitHub ?
If not, please give us more details on your device server and ideally the source code of a device server to reproduce your problem…
Knowing the configuration of your device would be very interesting for us too.
How many attributes in the server?
Are they all polled?
Are they all polled at the same frequency?
How many clients are connected? Are all the clients using events?
Are the clients sending some commands from time to time?
Are you pushing events by code?
Do you have some “polling thread is late” errors?

Kind regards,
Reynald

Hi,
moreover, are those “missed” events not sent by the device or not received by the client?
Cheers,
Lorenzo

Hi Reynald,

We are facing this problem also. As well missing events problem which is not mentioned here.

[quote=“Reynald”]If not, please give us more details on your device server and ideally the source code of a device server to reproduce your problem…
Knowing the configuration of your device would be very interesting for us too.[/quote]
The details of configuration are as follows:

The server has 40 static attributes and 40-50 dynamic attributes.

Yes

No. Some attributes have 1000ms polling and some have 200ms. The polling frequency is kept 200ms for the attributes for which we expect frequent events.

Upto 10 clients are connected. All clients use events

Yes

No

No this error has not occurred

[quote=“lorenzo”]Hi,
moreover, are those “missed” events not sent by the device or not received by the client?
Cheers,
Lorenzo[/quote]

Hi Lorenzo,

The attributes are sent by device i.e updated at the polled attribute by device but not received by client.

You have many attributes, so you might be hitting the issue created by Andy (CPU load when device has large number of attributes · Issue #424 · tango-controls/cppTango · GitHub).
We are currently working on it.

Are the commands sent by the clients taking a long time to execute?

It looks like you didn’t receive any Missed Event errors on the client side… Can you confirm that?
Tango has a mechanism on the client side to detect missed ZMQ events because there is a counter field associated with each event received. If an event is received with a counter value unexpectedly too high, an API_MissedEvents exception should be received on the client side with an error description like “Missed some events! Zmq queue has reached HWM?”. Did you see anything like that on the clients side?

If not, the reason why you are not seeing an event sent for every update of the polling value could be because of your event configuration… Did you specify any Absolute or Relative threshold as event configuration for the attributes which are missing events? Maybe the server is not sending events because the conditions are not met (the attribute value didn’t change enough to send an event)?
Could you please verify that?

Cheers,
Reynald

One additional point you may want to consider is that almost 100 attributes at 200ms polling period means 2ms execution time slice per method with no spare CPU time for the device… which may be on the low side. I know your setup uses larger polling periods as well, but you may want to check the real execution time slice of each method (e.g. attribute/command) as, as per Reynald suggestion, slow ones may impact the fast ones as well (delaying, or even dropping, execution depending on TANGO version).
Lorenzo

Hi TCS_GMRT,

are you using Java to implement your device server? In this case the issue #424 Reynald is working on is not going to help because it is only for C++. When does you start losing events? In order to quantify the event loss can you try with only a few attributes and then increase the number until you start losing events. Have you tried splitting the attributes across multiple device servers? Does it still happen? Is there anything in the logs?

Andy

Hi Raynald,

Thanks for the reply.

No. commands are getting executed within 3 sec.

[quote=“Reynald”]It looks like you didn’t receive any Missed Event errors on the client side… Can you confirm that?
Tango has a mechanism on the client side to detect missed ZMQ events because there is a counter field associated with each event received. If an event is received with a counter value unexpectedly too high, an API_MissedEvents exception should be received on the client side with an error description like “Missed some events! Zmq queue has reached HWM?”. Did you see anything like that on the clients side?[/quote]
We are not getting any such error.

No Absolute and Relative threshold is specified. It is None as by default. Whether we need to specify any such configurations?

[quote=“Reynald”]Maybe the server is not sending events because the conditions are not met (the attribute value didn’t change enough to send an event)?
Could you please verify that?[/quote]
Attribute values are getting changed surely. (any change in spectrum attribute value should send an event)

Hi Lorenzo,

[quote=“lorenzo”]One additional point you may want to consider is that almost 100 attributes at 200ms polling period means 2ms execution time slice per method with no spare CPU time for the device… which may be on the low side. I know your setup uses larger polling periods as well, but you may want to check the real execution time slice of each method (e.g. attribute/command) as, as per Reynald suggestion, slow ones may impact the fast ones as well (delaying, or even dropping, execution depending on TANGO version).
Lorenzo[/quote]

Agree with your points.
We will check it and let you know.

Hi Andy,

Yes we are using Java to implement device server.

We will try this and let you know the results/observations

Hi,

I think we still need more details.
What is the data type of the attributes for which you are missing events? Spectrum attributes? DevDouble, DevFloat, DevState, DevBoolean…?
What are your clients? Atkpanels? Java clients? C++ clients?

Are the clients subscribing to change events only or also to periodic events?

You were writing your commands are not taking a long time to execute, less than 3 seconds.
If a command is taking 2 seconds to execute, this means that during all this time, no attribute can be read because there is a locking mechanism in Tango which by default makes it impossible to read an attribute in parallel of executing a command or reading another attribute. This is to simplify the device server programmer’s life and make thread-safe device servers.
So 2 seconds command execution for instance would be a very long execution time in your case if you want to be able to poll your attributes at 200 ms, or even 10 ms!
So please check your commands execution times and ensure they are not the cause for event losses.

If your attribute is a number scalar/spectrum or image attribute (types DevDouble, DevFloat, Dev(U)Short, Dev(U)Long(64), DevUChar), you need to specify an absolute change or relative change threshold configuration, unless you manually push events by code without checking the thresholds, otherwise you should get errors during the subscriptions (Example for a C++ device server):

Error reason = API_EventPropertiesNotSet
Desc : Event properties (abs_change or rel_change) for attribute devfloatspectrumro are not set
Origin : DServer::event_subscription

If you are using a generic client like atkpanel, I think it tries to subscribe to change events by default and if this fails, it will subscribe to periodic events. If you didn’t change the period configuration parameter of the periodic event for the attribute you are subscribing to, you will get the default period value, which is 1 second.
So, if you subscribe to a number attribute with a generic client like atkpanel and did not specify the absolute change or relative threshold, you will end up with an event received every 1 second, even if you specified a polling period below 1 second.
In atkpanel, you can identify that easily by clicking on “View->Diagnostics…” in the menu, and then choosing the Attribute tab. You will see if ATK is receiving events for your attribute, and also what kind of event (Change event, Periodic event).
You can use the “Update fields” button at the bottom of this ATK diagnostic window to refresh the values.

Hoping this helps,

Kind regards,
Reynald

[quote=“TCS_GMRT”] Reynald
Maybe the server is not sending events because the conditions are not met (the attribute value didn’t change enough to send an event)?
Could you please verify that?

Attribute values are getting changed surely. (any change in spectrum attribute value should send an event)[/quote]

Just for clarity, in addition to Reynald reply, any change in spectrum attribute value should send an event provided you specified the event (change, archive) thresholds in the Attribute event configuration. If not, no event will be sent, and you should get an exception when subscribing…

[quote=“lorenzo”][quote=“TCS_GMRT”] Reynald
Maybe the server is not sending events because the conditions are not met (the attribute value didn’t change enough to send an event)?
Could you please verify that?

Attribute values are getting changed surely. (any change in spectrum attribute value should send an event)[/quote]

Just for clarity, in addition to Reynald reply, any change in spectrum attribute value should send an event provided you specified the event (change, archive) thresholds in the Attribute event configuration. If not, no event will be sent, and you should get an exception when subscribing…[/quote]
Sorry for the late reply.
What exactly do you mean by Attribute event configuration? While subscribing client subscribes the attribute as change event.
No attribute event configurations are done at server side.
Actually the problem is not that we are not receiving the event any time, we do get changed events most of the time. But some times it gets missed.
This problem occurs more frequently when one client subscribes the attribute events of more than 5 device servers. And the attributes of all these device servers changes frequently. So in this scenario out of 5 we miss events of at least 2 device servers.
Any idea why this might have happened? and how can we resolve this

[quote=“Reynald”]You were writing your commands are not taking a long time to execute, less than 3 seconds.
If a command is taking 2 seconds to execute, this means that during all this time, no attribute can be read because there is a locking mechanism in Tango which by default makes it impossible to read an attribute in parallel of executing a command or reading another attribute. This is to simplify the device server programmer’s life and make thread-safe device servers.
So 2 seconds command execution for instance would be a very long execution time in your case if you want to be able to poll your attributes at 200 ms, or even 10 ms!
So please check your commands execution times and ensure they are not the cause for event losses.[/quote]
Hi Reynald,
The command execution time is less. I dont think this is the problem because whenever we have only 1 or 2 device servers, and a client subscribes the change event of spectrum attribute of both device servers, client gets all the events. No event is missed.
The problem is when we increase the number of device servers and a single client.
The spectrum attribute of all the device servers changes simultaneously, so client gets some events and some events are getting missed.
Any suggestion regarding this will be helpful

I mean the event related configuration parameters described in the following section of the documentation: http://tango-controls.readthedocs.io/en/latest/development/advanced/reference.html?highlight=change%20event#the-event-related-configuration-parameters.
You can configure them in Jive and/or in the code using POGO (See the attached pictures)

These parameters are important because Tango will use to decide whether it should send an event or not.

To help you, we need more details about your current configuration.
You didn’t answer to some of my previous questions.
What is your client?
Is it a generic client (ATKPanel, TaurusGUI, custom one), written in C++, Java, Python?

What is the excact type of the all attributes you are subscribing to and their size in case of spectrum of images?
What is the polling period of all the attributes you are subscribing to?
Are all these attributes exported by JAVA device servers?

How do you see that you missed some events? Do you have some kind of counter in the attributes you subscribed to?
What are these attributes monitoring (hardware related values)?

It would be great if you could find a way to reproduce your issue with simple device servers and clients we could execute from anywhere.

Kind regards
Reynald

Hi Reynald,

Thanks for the quick response.

[quote=“Reynald”]I mean the event related configuration parameters described in the following section of the documentation: http://tango-controls.readthedocs.io/en/latest/development/advanced/reference.html?highlight=change%20event#the-event-related-configuration-parameters.
You can configure them in Jive and/or in the code using POGO (See the attached pictures)

These parameters are important because Tango will use to decide whether it should send an event or not.[/quote]
Okay. In our case I think abs_change needs to be set(as any change in attribute should raise an event). Whether I am correct? What should be ideal value of this?

[quote=“Reynald”]o help you, we need more details about your current configuration.
You didn’t answer to some of my previous questions.
What is your client?
Is it a generic client (ATKPanel, TaurusGUI, custom one), written in C++, Java, Python?[/quote]
The issue has occurred with TaurusGUI as well as Java as a client.

[quote=“Reynald”]What is the excact type of the all attributes you are subscribing to and their size in case of spectrum of images?
What is the polling period of all the attributes you are subscribing to?
Are all these attributes exported by JAVA device servers?[/quote]
The attributes we are subscribing are of spectrum of String of length 6.
The polling period of these attributes is set to 100. All the attributes are exported by JAVA device servers

We could see this with logs at device servers and client. We see the changed attribute value at device servers logs but that changed events has not occurred at clients log.

These attributes represent some asynchronous messages specific to application.

I am working on separating the patch of code to reproduce this issue. I will post it once it is ready.

Hi,

I had one more doubt. While subscribing event we need to specify callback. In our case we have set same callback for all subscription. Whether callback should be different for all subscription?

[quote=“TCS_GMRT”]The attributes we are subscribing are of spectrum of String of length 6.
The polling period of these attributes is set to 100. All the attributes are exported by JAVA device servers[/quote]

Can you confirm all the attributes you are subscribing to are spectrum of String of length 6?

For String, State and Boolean attributes, the relative change and absolute change thresholds do not make sense.
An event should be sent as soon as the attribute value changes in this case.

How big are the strings contained in the spectrum?

[quote=“TCS_GMRT”]Hi,

I had one more doubt. While subscribing event we need to specify callback. In our case we have set same callback for all subscription. Whether callback should be different for all subscription?[/quote]

It depends on your use case. You can use the same callback or a different callback. Both are correct.
In some cases, it is more convenient to have a separate callback and in some cases, a generic callback is better.
It all depends on what you need to do.