We have a big laser system in Romania and we use Tango for our supervision.
Sometimes, we lost the connection with tango database. When we try to open a JIVE or ASTOR, the message “Connection to database failed”. We observed that it happened when we have several clients like JIVE / ASTOR. (Astor creating more problems than Jive)
The database_ds.exe still remains in task manager when the problem occurs.
Here are some information on the database :
We use Tango 8 / Windows.
We modified “start-db.bat” to set PoolSize to 50.
Database is pretty heavy : about 850 devices and 280 starters
We use a mySQL database
The problem is really important for us.
I saw that poolSize is the number of connection to the database. Would it be useful to increase it ?
Is this a known problem ?
Could it be a performance problem of database_ds ? Or could it be our MySQL database ?
Was there some change on Tango 9 ?
So, a few years ago, we identified some performance issues in the way the Database DS was handling the memorized attributes.
We concluded that the problem was coming from the fact that the history of the last 10 (by default I think) values was saved in the database when writing a memorized attribute.
This was not efficient and was causing some database performance issues (especially if some memorized attributes are written at high frequency) so it was decided at some point to remove this history feature for the memorized attributes.
This was done at the Database server level.
I think DataBase-Release-5.2 should have the patch.
It seems this version of the Database server was released with Tango distribution 9.1.0.
So if you are using extensively memorized attributes, I would definitely advise to upgrade.
Did you manage to correlate these errors with a special event (massive restart of device servers, start of a specific experiment/operating mode, …)?
I read the discussion and it could be the solution of our problem. We will have to run more tests to be sure.
We actually don’t know how to create the problem. It happens randomly. The only facts we have are that it happens when we use several clients (Astor / Jive), and when we control the equipments.
I second Reynald’s proposal. I presume you know that the database server can be upgraded without upgrading the rest of Tango i.e. Database device server V9.x with Tango device servers and clients linked with Tango library V8.x.
Have you monitored the timing performance of the Database server? If not you should do this because it will allow you to quantify the problem and when it happens. There are timing measurements in the Database server which are exposed as attirbuteds which you can monitor. One easy way of doing this is to plot them with atkpanel. Simply click on the database device e.g. sys/database/2, and open the Monitor panel. Then select the Timing average / maximum / info tabs to monitor the timing of calls on the database server. You should also correlate it with the number of calls and see if there is not some process overloading the database.
I have attached one example for a database running at the ESRF. The peak is for DbGetDataServerCache which is called very rarely. Otherwise all the call averages are mostly less than 100 milliseconds.