[SOLVED] java.lang.OutOfMemoryError: unable to create new native thread

Hi All,

Yesterday (28.05.2015) I set up a simple test to catch timeout exceptions in Java API (TangORB-9.0.1):


TangoProxy proxy = TangoProxies.newDeviceProxyWrapper("sys/tg_test/1");

while (true) {
     LOGGER.debug(String.valueOf(proxy.readAttribute("double_scalar")));
}

In the morning I have found out the computer on which this test was running almost completely unresponsive. No wonder – my client has created 6,8K live ReplyReceiverTimer (org.jacorb.orb.ReplyReceiver.Timer). And it seems that already killed threads are not really freed.

So starting a jive for instance gives this:


khokhria@hzgcttest:~$ jive &
[1] 18627
khokhria@hzgcttest:~$ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at org.jacorb.orb.ReplyReceiver.<init>(ReplyReceiver.java:92)
	at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1005)
	at org.jacorb.orb.Delegate.invoke(Delegate.java:939)
	at org.jacorb.orb.Delegate.is_a(Delegate.java:1420)
	at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:130)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.createDevice(ConnectionDAODefaultImpl.java:648)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.connect_to_dbase(ConnectionDAODefaultImpl.java:878)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.init(ConnectionDAODefaultImpl.java:385)
	at fr.esrf.TangoApi.Connection.<init>(Connection.java:324)
	at fr.esrf.TangoApi.Database.<init>(Database.java:230)
	at fr.esrf.TangoApi.ApiUtilDAODefaultImpl.get_db_obj(ApiUtilDAODefaultImpl.java:291)
	at fr.esrf.TangoApi.ApiUtil.get_db_obj(ApiUtil.java:272)
	at jive3.MainPanel.initComponents(MainPanel.java:91)
	at jive3.MainPanel.<init>(MainPanel.java:66)
	at jive3.MainPanel.main(MainPanel.java:743)


Environment setup:


khokhria@hzgcttest:~$ uname -a
Linux hzgcttest 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux


khokhria@hzgcttest:~$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

Just to share this experience.

But if someone has any idea how to work around this or fix please let me know!

Best regards,

Igor.

Hi Igor,

this sounds like a bug. What I don’t understand is there were “only” 6.8k live threads. I would assume you made many more proxies. Does this means some are being garbage collected but not all or not fast enough? I vaguely remember that we tried or wanted to cache proxies to the same device in the same process. I presume this is not the case in the current code but would this help as a way of reducing the number of open connections and threads to the same device?

Naive question - could one say this is a bug in your software for not reusing the same proxy? When do you need this use case?

In the meantime I do not have a work around, sorry!

Andy

Hi Andy,

There is only one proxy created. Then I sequentially read an attribute from this proxy. Each read creates a special thread on jacORB level. This thread counts 3s (timeout) and if not notified within this time throws timeout exception.

Apart from 6,8K live such threads there are tons of finished threads of the same type (I suspect they are not properly freed, so OS runs out of memory allocated for threads’ stacks).

Best regards,

Igor

Hi Igor,

this sounds even more serious than I thought. You are saying a simple read of an attribute from a Java client is causing a thread to be created which is not being freed all the time. Do you know how many times you read the attribute in the time you noticed 6.8k threads? If we say a call takes approximately 200 microseconds then in 12 hours then we would expect roughly 216 million reads in 12 hours. Does this compare with your measurements? Does this mean for every 30k synchronous calls you have a dangling thread? This must be visible in all Java client applications. I wonder if other Java programmers have noticed this?

One solution is to use events instead of synchronous calls. Events generate much less traffic and do not use JacORB.

Andy

Seems to be the same problem: http://www.jacorb.org/bugzilla/show_bug.cgi?id=632

No clear fix though…

I agree

Hi
For your information I did a test during the night and I did not have a problem on new native thread after 75 millions read_attribute().
The bug about Jacorb mailing list talks about release 2.3 (2010).
The last TangORB.jar is build with Jacorb-3.5.
I am not sure this bug still exists.
Cheers
Pascal

Well it seems the problem is in Oracle’s JVM for Linux.

When I run the same test using


java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

everything runs smoothly

Interesting and good news!

Just found a related discussion in jacORB mailing list: http://lists.spline.inf.fu-berlin.de/pipermail/jacorb-developer/2014-June/000498.html

Qoute: