Yesterday (28.05.2015) I set up a simple test to catch timeout exceptions in Java API (TangORB-9.0.1):
TangoProxy proxy = TangoProxies.newDeviceProxyWrapper("sys/tg_test/1");
while (true) {
LOGGER.debug(String.valueOf(proxy.readAttribute("double_scalar")));
}
In the morning I have found out the computer on which this test was running almost completely unresponsive. No wonder – my client has created 6,8K live ReplyReceiverTimer (org.jacorb.orb.ReplyReceiver.Timer). And it seems that already killed threads are not really freed.
So starting a jive for instance gives this:
khokhria@hzgcttest:~$ jive &
[1] 18627
khokhria@hzgcttest:~$ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.jacorb.orb.ReplyReceiver.<init>(ReplyReceiver.java:92)
at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1005)
at org.jacorb.orb.Delegate.invoke(Delegate.java:939)
at org.jacorb.orb.Delegate.is_a(Delegate.java:1420)
at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:130)
at fr.esrf.TangoApi.ConnectionDAODefaultImpl.createDevice(ConnectionDAODefaultImpl.java:648)
at fr.esrf.TangoApi.ConnectionDAODefaultImpl.connect_to_dbase(ConnectionDAODefaultImpl.java:878)
at fr.esrf.TangoApi.ConnectionDAODefaultImpl.init(ConnectionDAODefaultImpl.java:385)
at fr.esrf.TangoApi.Connection.<init>(Connection.java:324)
at fr.esrf.TangoApi.Database.<init>(Database.java:230)
at fr.esrf.TangoApi.ApiUtilDAODefaultImpl.get_db_obj(ApiUtilDAODefaultImpl.java:291)
at fr.esrf.TangoApi.ApiUtil.get_db_obj(ApiUtil.java:272)
at jive3.MainPanel.initComponents(MainPanel.java:91)
at jive3.MainPanel.<init>(MainPanel.java:66)
at jive3.MainPanel.main(MainPanel.java:743)
Environment setup:
khokhria@hzgcttest:~$ uname -a
Linux hzgcttest 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux
khokhria@hzgcttest:~$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
Just to share this experience.
But if someone has any idea how to work around this or fix please let me know!
this sounds like a bug. What I don’t understand is there were “only” 6.8k live threads. I would assume you made many more proxies. Does this means some are being garbage collected but not all or not fast enough? I vaguely remember that we tried or wanted to cache proxies to the same device in the same process. I presume this is not the case in the current code but would this help as a way of reducing the number of open connections and threads to the same device?
Naive question - could one say this is a bug in your software for not reusing the same proxy? When do you need this use case?
In the meantime I do not have a work around, sorry!
There is only one proxy created. Then I sequentially read an attribute from this proxy. Each read creates a special thread on jacORB level. This thread counts 3s (timeout) and if not notified within this time throws timeout exception.
Apart from 6,8K live such threads there are tons of finished threads of the same type (I suspect they are not properly freed, so OS runs out of memory allocated for threads’ stacks).
this sounds even more serious than I thought. You are saying a simple read of an attribute from a Java client is causing a thread to be created which is not being freed all the time. Do you know how many times you read the attribute in the time you noticed 6.8k threads? If we say a call takes approximately 200 microseconds then in 12 hours then we would expect roughly 216 million reads in 12 hours. Does this compare with your measurements? Does this mean for every 30k synchronous calls you have a dangling thread? This must be visible in all Java client applications. I wonder if other Java programmers have noticed this?
One solution is to use events instead of synchronous calls. Events generate much less traffic and do not use JacORB.
Hi
For your information I did a test during the night and I did not have a problem on new native thread after 75 millions read_attribute().
The bug about Jacorb mailing list talks about release 2.3 (2010).
The last TangORB.jar is build with Jacorb-3.5.
I am not sure this bug still exists.
Cheers
Pascal