Loss of connection with database

fpourchayre · March 7, 2019, 2:20pm

Hi everyone,

We have a big laser system in Romania and we use Tango for our supervision.

Sometimes, we lost the connection with tango database. When we try to open a JIVE or ASTOR, the message “Connection to database failed”. We observed that it happened when we have several clients like JIVE / ASTOR. (Astor creating more problems than Jive)
The database_ds.exe still remains in task manager when the problem occurs.

Here are some information on the database :

We use Tango 8 / Windows.
We modified “start-db.bat” to set PoolSize to 50.
Database is pretty heavy : about 850 devices and 280 starters
We use a mySQL database

The problem is really important for us.

I saw that poolSize is the number of connection to the database. Would it be useful to increase it ?
Is this a known problem ?
Could it be a performance problem of database_ds ? Or could it be our MySQL database ?
Was there some change on Tango 9 ?

Thank you for your answers.

Best regards,
Florian Pourchayre
Thales group

rbourtembourg · March 7, 2019, 4:49pm

Hi Florian,

Emmanuel Taurel redirected me to this other thread on the Tango forum where there was a long discussion about Tango Database performances:
http://www.tango-controls.org/community/forum/c/general/development/tango-database

So, a few years ago, we identified some performance issues in the way the Database DS was handling the memorized attributes.
We concluded that the problem was coming from the fact that the history of the last 10 (by default I think) values was saved in the database when writing a memorized attribute.
This was not efficient and was causing some database performance issues (especially if some memorized attributes are written at high frequency) so it was decided at some point to remove this history feature for the memorized attributes.
This was done at the Database server level.
I think DataBase-Release-5.2 should have the patch.
It seems this version of the Database server was released with Tango distribution 9.1.0.

So if you are using extensively memorized attributes, I would definitely advise to upgrade.

Did you manage to correlate these errors with a special event (massive restart of device servers, start of a specific experiment/operating mode, …)?

fpourchayre · March 8, 2019, 3:33pm

Hi Reynald.

Thank you very much for your answer.

I read the discussion and it could be the solution of our problem. We will have to run more tests to be sure.

We actually don’t know how to create the problem. It happens randomly. The only facts we have are that it happens when we use several clients (Astor / Jive), and when we control the equipments.

We will try your solution and tell the result.

Thank you again.

Best regards,
Florian Pourchayre
Thales Group

agotz · March 8, 2019, 4:25pm

Hi Florian,

I second Reynald’s proposal. I presume you know that the database server can be upgraded without upgrading the rest of Tango i.e. Database device server V9.x with Tango device servers and clients linked with Tango library V8.x.

Have you monitored the timing performance of the Database server? If not you should do this because it will allow you to quantify the problem and when it happens. There are timing measurements in the Database server which are exposed as attirbuteds which you can monitor. One easy way of doing this is to plot them with atkpanel. Simply click on the database device e.g. sys/database/2, and open the Monitor panel. Then select the Timing average / maximum / info tabs to monitor the timing of calls on the database server. You should also correlate it with the number of calls and see if there is not some process overloading the database.

I have attached one example for a database running at the ESRF. The peak is for DbGetDataServerCache which is called very rarely. Otherwise all the call averages are mostly less than 100 milliseconds.

Cheers

Andy

fpourchayre · March 12, 2019, 9:07am

Hi Andy,

Thank you for your answer. We will try to use Tango 9 database device server to see if it is the problem.

We will also study the performances given by the ATK on database device server.

I will post on the forum to let you aware.

Best regards,
Florian Pourchayre
Thales Group