[SOLVED] java.lang.OutOfMemoryError: unable to create new native thread

ikhokhriakov · May 29, 2015, 10:32am

Hi All,

Yesterday (28.05.2015) I set up a simple test to catch timeout exceptions in Java API (TangORB-9.0.1):


TangoProxy proxy = TangoProxies.newDeviceProxyWrapper("sys/tg_test/1");

while (true) {
     LOGGER.debug(String.valueOf(proxy.readAttribute("double_scalar")));
}

In the morning I have found out the computer on which this test was running almost completely unresponsive. No wonder – my client has created 6,8K live ReplyReceiverTimer (org.jacorb.orb.ReplyReceiver.Timer). And it seems that already killed threads are not really freed.

So starting a jive for instance gives this:


khokhria@hzgcttest:~$ jive &
[1] 18627
khokhria@hzgcttest:~$ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at org.jacorb.orb.ReplyReceiver.<init>(ReplyReceiver.java:92)
	at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:1005)
	at org.jacorb.orb.Delegate.invoke(Delegate.java:939)
	at org.jacorb.orb.Delegate.is_a(Delegate.java:1420)
	at org.omg.CORBA.portable.ObjectImpl._is_a(ObjectImpl.java:130)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.createDevice(ConnectionDAODefaultImpl.java:648)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.connect_to_dbase(ConnectionDAODefaultImpl.java:878)
	at fr.esrf.TangoApi.ConnectionDAODefaultImpl.init(ConnectionDAODefaultImpl.java:385)
	at fr.esrf.TangoApi.Connection.<init>(Connection.java:324)
	at fr.esrf.TangoApi.Database.<init>(Database.java:230)
	at fr.esrf.TangoApi.ApiUtilDAODefaultImpl.get_db_obj(ApiUtilDAODefaultImpl.java:291)
	at fr.esrf.TangoApi.ApiUtil.get_db_obj(ApiUtil.java:272)
	at jive3.MainPanel.initComponents(MainPanel.java:91)
	at jive3.MainPanel.<init>(MainPanel.java:66)
	at jive3.MainPanel.main(MainPanel.java:743)

Environment setup:


khokhria@hzgcttest:~$ uname -a
Linux hzgcttest 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux


khokhria@hzgcttest:~$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

Just to share this experience.

But if someone has any idea how to work around this or fix please let me know!

Best regards,

Igor.

agotz · May 29, 2015, 3:41pm

Hi Igor,

this sounds like a bug. What I don’t understand is there were “only” 6.8k live threads. I would assume you made many more proxies. Does this means some are being garbage collected but not all or not fast enough? I vaguely remember that we tried or wanted to cache proxies to the same device in the same process. I presume this is not the case in the current code but would this help as a way of reducing the number of open connections and threads to the same device?

Naive question - could one say this is a bug in your software for not reusing the same proxy? When do you need this use case?

In the meantime I do not have a work around, sorry!

Andy

ikhokhriakov · May 29, 2015, 7:22pm

Hi Andy,

There is only one proxy created. Then I sequentially read an attribute from this proxy. Each read creates a special thread on jacORB level. This thread counts 3s (timeout) and if not notified within this time throws timeout exception.

Apart from 6,8K live such threads there are tons of finished threads of the same type (I suspect they are not properly freed, so OS runs out of memory allocated for threads’ stacks).

Best regards,

Igor

agotz · May 30, 2015, 5:55am

Hi Igor,

this sounds even more serious than I thought. You are saying a simple read of an attribute from a Java client is causing a thread to be created which is not being freed all the time. Do you know how many times you read the attribute in the time you noticed 6.8k threads? If we say a call takes approximately 200 microseconds then in 12 hours then we would expect roughly 216 million reads in 12 hours. Does this compare with your measurements? Does this mean for every 30k synchronous calls you have a dangling thread? This must be visible in all Java client applications. I wonder if other Java programmers have noticed this?

One solution is to use events instead of synchronous calls. Events generate much less traffic and do not use JacORB.

Andy

ikhokhriakov · June 2, 2015, 2:29pm

Seems to be the same problem: http://www.jacorb.org/bugzilla/show_bug.cgi?id=632

No clear fix though…

agotz · June 2, 2015, 2:34pm

I agree

pverdier · June 2, 2015, 2:53pm

Hi
For your information I did a test during the night and I did not have a problem on new native thread after 75 millions read_attribute().
The bug about Jacorb mailing list talks about release 2.3 (2010).
The last TangORB.jar is build with Jacorb-3.5.
I am not sure this bug still exists.
Cheers
Pascal

ikhokhriakov · June 4, 2015, 6:33pm

Well it seems the problem is in Oracle’s JVM for Linux.

When I run the same test using


java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

everything runs smoothly

agotz · June 4, 2015, 7:39pm

Interesting and good news!

ikhokhriakov · June 28, 2016, 12:35pm

Just found a related discussion in jacORB mailing list: http://lists.spline.inf.fu-berlin.de/pipermail/jacorb-developer/2014-June/000498.html

Qoute:

I think there’s couple things you could do.

On your client side, when you are finish with all your calls to the server then you can invoke ‘_release()’ on your CORBA object to release the connection.

Take a look at the JacORB 3.4 programming guide that you can download from the JacORB website, particularly section 3.3 on the Configuration Properties. You could set these properties and I think it should be able to help your situation.
On Client side

jacorb.connection.client.idle_timeout : Client-side timeout. This is set to non-zero in order to close the connection after specified number of milliseconds idle time. Only connections that don’t have pending messages are closed, unless jacorb.connection.client.timeout ignores pending messages is turned on.
On Server side

jacorb.connection.server.timeout : Maximum time in milliseconds that a server keeps a connection open if nothing happens

Good luck…

dc