Set_change_event() with nodb option - detect not working

slee · July 25, 2016, 1:38pm

Hi,

The manual says

[quote]
set_change_event(bool implemented, bool detect = true);

where implemented=true indicates that events are pushed manually from the code and detect=true (when used) triggers the veriﬁcation of the same event properties as for events send by the polling thread. When setting detect=false, no value checking is done on the pushed value[/quote]

A. Test procedure

Setting set_change_event(true, true);
Calling push_change_event() with same value several times.

B. Result

Test with database option

from the second calls, events are ignored

Test with no database option

every calls arrived to the client, with the same values

I’m curious whether this is as designed.
Or am I testing the wrong way?

I’m using Tango v9.2.2 with Windows.

rbourtembourg · July 25, 2016, 3:27pm

Hi,

I just tried it on Linux with Tango 9.2.2, it works as expected with the no database option (same result as your case #1).

So it could be a bug present only on Windows, or something else in your tests…
Or a bug present in a case different than the one I just tested (DevDouble ReadOnly attribute).
What did you put as Absolute Change and/or Relative Change criteria in the code?
Did you configure these criteria in the database as well in your case #1?
What is the type of your attribute? Is it Read/Only or Read/Write?
Are you testing with the exact same device server code in both cases?

Can you publish a sample of code which triggers the problem in your case?

Kind regards
Reynald

rbourtembourg · July 25, 2016, 3:39pm

Additional question:

Are you using localhost in your device definition?
For instance: localhost:1234/my/device/name/my_attribute#dbase=no

I’ve just noticed this does not seem to behave well, on Linux at least.
And the client receives some “Event channel is not responding anymore” errors.
You will typically receive this error every 10 seconds. It will be followed by a reconnection attempt, which does a synchronous call to read your attribute.

This might be what you are observing?

If this is the case, can you try using the real name of your machine instead of localhost?

Kind regards
Reynald

slee · July 26, 2016, 4:34am

Reynald

The attributes are generated dynamically and the change value as well using below code


try
{
	a = new DynamicFINSAttribute(cfg);
	a->set_disp_level(Tango::OPERATOR);

	Tango::UserDefaultAttrProp event_prop;
	event_prop.set_event_abs_change("1");

	//- set other attribute properties : label, display format,...
	a->set_default_properties(event_prop);
	a->set_change_event(true, true);

	this->m_dyn_attrs.add(a); 

	return;
}
catch (std::bad_alloc)
{
	ERROR_STREAM << "TOmronServer::add_dynamic_attribute caught bad_alloc" << std::endl;
}
catch (Tango::DevFailed &e)
{
	ERROR_STREAM << "TOmronServer::add_dynamic_attribute caught DevFailed " << e << std::endl;
}
catch (...)
{
	ERROR_STREAM << "TOmronServer::add_dynamic_attribute caught (...)" << std::endl;
}

The database does not contain any static attributes.

Read/Only

Yes

I will.

Yes.

Nop.

My additional question is whether the push_change_event() call is thread-safe?
Now it is called by several threads derived from omni_thread simultaneously.

slee · July 26, 2016, 5:09am

There is another issue.

Debuging mode, when client calls subscribe_event() to non-database server.
Then the server stopped immediately.

Refer to attached “server-side.png” at the server side, “client-side.png” at the client side.
Also “crash_point.png”, the line when client causes this problem.

Relase mode, it seems OK.

This happens regardless whether I use computer_name or IP at TANGO_HOST environment.

So, I’m having hard time to publish sample code of my initial issue.è_é

rbourtembourg · July 26, 2016, 8:08am

Hi,

It looks like you are trying to subscribe to attributes using wildcards, this is why you get this error on the client side:


Desc: * attribute not found
Origin: MultiAttribute::get_attr_ind_by_name

Remember that we said this is not supported?
You should subscribe to each attribute individually.

push_change_event() should be thread-safe.

Hoping this helps,
Reynald

slee · July 26, 2016, 8:23am

Reynald,

Sorry I posted the wrong picture. (client side)

It says

Re-posted with new picture.

rbourtembourg · July 26, 2016, 9:14am

Thank you, this fits with the fact that the device server stops.

What would be interesting for us is to know the code executed when you read the attribute named E000095.
It would actually be interesting to know if it happens always on the same attribute.
Is your client trying to subscribe to thousands of attributes? This kind of information could be very useful too.
How many threads do you have, how many attributes in the server, how many commands…

We need to find out what could cause this invalid argument exception on the server side.
This seems to happen when a command is executed.
Are there any commands defined for this server?
Is the server throwing not-CORBA (not-TANGO) exceptions in these commands?
Is there a client executing commands?
Or could it be that when you read the E000095 attribute, this is executing a command?

It is still not clear to me why it would happen only in debug mode…
One of the differences I’m thinking of is that in debug mode, the server will be slower.

I hope this can help you a bit to debug this issue…

Kind regards
Reynald

slee · July 26, 2016, 2:49pm

Reynald,

Let me get back to the initial issue (regarding the detect not working issue).

I found something strange.

TANGO-HOST => 127.0.0.1:10000
Then the detect option not working.
Events keep generated with same values.
(repeated about 10 secs interval)
TANGO_HOST => mycomputername:10000
TANGO_HOST => localhost:10000
Both case, the detect option working.
Events with initially shown, then no more.

But the 2.3 case, subscribe_event() call is really sloowwwè_é
(about 5 secs per each call).

Hope this clears something.

rbourtembourg · July 26, 2016, 4:48pm

OK, so what you are observing in case 1 is a problem with the events heartbeat.

As far as I know, the event subscription mechanism works as follows:
During the subscription, a synchronous call to read the value of the attribute the client just subscribed to is executed.
This generates the first event and is used to know the value of the attribute at the time of the subscription.
A thread called the KeepAliveThread will check periodically whether the remote device server is still alive and is expecting to receive some heartbeat events regularly.
If no heartbeat event has been received in the last 10 seconds, an event containing an API_EventTimeout error (flag err is true in the EventData structure) is generated on the client side.
Then, the client automatically retries to subscribe to the event in question, so in case of success, a synchronous call to read the value of the attribute is sent again and an event is generated on the client side.
This is what you are probably observing in case #1.

Please ensure to have the same TANGO_HOST variable definition on client and server side.
This is what was recommended by Andy in a previous post (http://www.tango-controls.org/community/forum/c/general/development/no-heartbeat-error-on-event-subscription/?page=4#post-654) for someone having similar issues.
Try to avoid mixing because there are so many cases to be handled, that there are some cases which are not yet handled correctly as you can see.

About the long time during the subscription, it could be due to the synchronous call happening at subscription time (if it takes quite some time to read the attribute value or if the server is busy answering many requests in parallel (waiting for a synchronization lock)) or a network issue (or …?).

Best regards,
Reynald

slee · July 27, 2016, 4:20am

Hi Reynald,

The work-around for this problem at my case is

adding 192.168.0.100 my_computer_name
(at c:/windows/system32/drivers/etc/hosts)
use TANGO_HOST => my_computer_name:10000

Then no more event heartbeat problem, and no more delay when calling subscribe_event().

I will stick to this setting for the time being.

slee · July 27, 2016, 4:24am

Let me get back the second issue of device server stop problem when DEBUG mode (nodb option).

I tried run server program and client program side by side in one computer
And the TANGO_HOST is my_computer_name:10000
So, there’s no mismatch between server and client side settings.

The time of this exception is

when client side run in “devapi_base.cpp” at
ret = api_ptr->get_zmq_event_consumer()->subscribe_event(this, attr_name,event, callback, filters, stateless);

call stack is

[quote] TOmronClient.exe!Tango::DeviceProxy::subscribe_event(const std::basic_string<char,std::char_traits,std::allocator > & attr_name=“E000000.0”, Tango::EventType event=CHANGE_EVENT, Tango::CallBack * callback=0x00000000030a6ba0, const std::vector<std::basic_string<char,std::char_traits,std::allocator >,std::allocator<std::basic_string<char,std::char_traits,std::allocator > > > & filters=0, bool stateless=false) 줄 7682 C++
TOmronClient.exe!Tango::DeviceProxy::subscribe_event(const std::basic_string<char,std::char_traits,std::allocator > & attr_name=“E000000.0”, Tango::EventType event=CHANGE_EVENT, Tango::CallBack * callback=0x00000000030a6ba0) 줄 842 + 0x40 바이트 C++

TOmronClient.exe!subscribeEvent(Tango::DeviceProxy * device=0x00000000030a8ba0, const std::basic_string<char,std::char_traits,std::allocator > & attr_name=“E000000.0”, DoubleEventCallBack * double_callback=0x00000000030a6ba0) 줄 515 + 0x23 바이트 C++
[/quote]

Then the server program caught exception in “zmqeventsupplier.cpp” at
event_pub_sock->setsockopt(ZMQ_SNDHWM,&hwm,sizeof(hwm));

call stack is

TOmronServer.exe!_CxxThrowException(void * pExceptionObject=0x00000000045fb528, const _s__ThrowInfo * pThrowInfo=0x00007ff7547eb258)  줄 157	C++
TOmronServer.exe!zmq::socket_t::setsockopt(int option_=23, const void * optval_=0x00000000045fb6a8, unsigned __int64 optvallen_=4)  줄 243	C++
TOmronServer.exe!Tango::ZmqEventSupplier::create_event_socket()  줄 476	C++
TOmronServer.exe!Tango::DServer::zmq_event_subscription_change(const Tango::DevVarStringArray * argin=0x0000000003077bb0) 줄 953 C++
TOmronServer.exe!Tango::ZmqEventSubscriptionChangeCmd::execute(Tango::DeviceImpl * device=0x000000000265ac50, const CORBA::Any & in_any={…}) 줄 1357 + 0x12 바이트 C++
TOmronServer.exe!Tango::DeviceClass::command_handler(Tango::DeviceImpl * device=0x000000000265ac50, std::basic_string<char,std::char_traits,std::allocator > & command=“ZmqEventSubscriptionChange”, const CORBA::Any & in_any={…}) 줄 1172 + 0x3e 바이트 C++

Refer to two captured images for the call stacks of the two programs

[quote]It would actually be interesting to know if it happens always on the same attribute.
Is your client trying to subscribe to thousands of attributes?[/quote]

This happens at the first attempt of the looping.

try 
{
	cerr << "Fetching attribute name list " << pInfo->device->name() << " ..." << endl;
	vector<string> *attribute_name_list = pInfo->device->get_attribute_list();
	cerr << "Attribute name size = " << attribute_name_list->size() << "." << endl;

	/* loop over all attributes */
	//====> size of attribues is from 200 to 1500
	for (int i=0; i<attribute_name_list->size(); i++)
	{
		// show progress
		cerr << ".";

		Tango::AttributeInfo ai = pInfo->device->get_attribute_config(attribute_name_list->at(i));

		// skip write-only attribute
		if (ai.writable == WRITE)
		{
			continue;
		}

		if ((ai.name == "State") || (ai.name == "Status"))
			continue;


		//====> below line causes server stop (when i == 0)
		int event_id = subscribeEvent(pInfo->device, ai.name, pInfo->double_callback);

		if (event_id >= 0)
			pInfo->event_ids.push_back(event_id);
	}

	cerr << "Connecting to device " << pInfo->device->name() << " done." << endl;

	return true;
}
catch (DevFailed &e) 
{ 
	Except::print_exception(e);
	return false;
}

[quote]How many threads do you have
how many attributes in the server
how many commands[/quote]

The original server program has ten device patterns which inherit form one base pattern.
I’m testing by building 10 members under one class. (no database option)
And each device has

from 200 to 1000 attributes
2 to 5 threads
10 commands

But for the testing purpose, I’m shrinking to

one device
one thread
same command set

[quote]Are there any commands defined for this server?
Is the server throwing not-CORBA (not-TANGO) exceptions in these commands?
Is there a client executing commands?[/quote]

Actually the client send Mode command to server before subscribing events.
And it passed without any problem.

I don’t think this exception is coming by user defined command.

rbourtembourg · July 27, 2016, 5:33am

Hi,

It looks like you are using forbidden characters in your attribute name.
You have an attribute named “E000000.0”.
I just tried to create a static attribute with this name with Pogo and it returns the following error:

Syntax error in name: Do not use '.' char.

It looks like Tango let you create an attribute with this name as dynamic attribute but since POGO doesn’t let you name an attribute with a ‘.’ char in it, I would really avoid to create attributes with special characters in their name.

Maybe the problem you are encountering is not linked to that but I would already start by changing that.

Happy to hear you found network settings that work in your case with (c:/windows/system32/drivers/etc/hosts).

Cheers,
Reynald

slee · July 27, 2016, 7:59am

Reynald,

I think I found the right place to trigger the exception;

I thought the first exception caught by DevEnv was the place.
But I was wrong.

The place was at DServer::zmq_event_subscription_change(const Tango::DevVarStringArray *argin) in the eventcmds.cpp.

No database mode seems no alternative_event_endpoint().
So, get_alternate_event_endpoint() returns zero size vector.
And tmp_str = ev->get_alternate_event_endpoint()[loop]; => triggers exception.

The proper code should be

           if (ev->get_alternate_event_endpoint().size() != 0)
           {
                tmp_str = ev->get_alternate_event_endpoint()[loop];
                ret_data->svalue[((loop + 1) << 1) + 1] = CORBA::string_dup(tmp_str.c_str());
           }

The release mode?
It doesn’t care I Think.

Sorry for the confusion that I posted previously.

rbourtembourg · July 27, 2016, 8:55am

Hello,

Well, that’s clearly a bug in the Dserver::zmq_event_subscription_change code.
In the same method, there is similar code a bit above but the size of the vector is tested before accessing any of its elements. This is what should be done.
This is not crashing in release mode on Windows but I’m afraid this might corrupt the memory.
There might be some side effects…

Could you please create a ticket on https://sourceforge.net/p/tango-cs/bugs to keep track of this bug?
If you cannot or don’t want to, I can create it. Just let me know.

This should be fixed in the next Tango release.
Thank you very much for reporting this.

slee · July 27, 2016, 11:14am

Reynald,

Please create a ticket for me.
Next time when I may find BUG like this, I’ll do it by myself.

rbourtembourg · July 27, 2016, 11:27am

Here is the link to the ticket:
https://sourceforge.net/p/tango-cs/bugs/805