HDB++ for cassandra

sdhere · November 16, 2015, 1:54pm

Hi,

I have successfully installed hdb++ for cassandra. I am using tango 9 VM which is available on tango website.

I have done following changes in class properties of

HdbConfigurationManager
HdbEventSubscriber
(I have attached snapshots of the same).

Also I have created devices for same.

After that executed following steps

Executed sql script given in code for cassandra.
excuted newly generated executables for configuration-manager and event-subscriber with their instance name. Got following errors for

Configuration Manager
- HdbPPMySQL: mysql connect db error: Access denied for user ‘cassandra’@‘localhost’ (using password: YES)
Event Subscriber
- HdbPPMySQL: mysql connect db error: Access denied for user ‘cassandra’@‘localhost’ (using password: YES)

Even though it is compiled for cassandra it is trying to connect to mysql.

Are there any other configurations required for this ?
Please help in resolving this issue.

You can download compressed folder of hdb++ installation for cassandra from below link.

Thanks,
Sandeep

sdhere · November 17, 2015, 10:29am

Hi,

I have resolved above problem. But facing some other issues.

I have changed sql script for cassandra for creating keyspace instead of using “NetwokTopology” changed it to “SimpleStrategy”.and executed this script.

Whenever I am trying to configure attribute for archiving using HDB++ configurator I am getting following error on command prompt where I have configuration manager running

find_attr_id_type: ERROR in query=SELECT att_conf_id,data_type FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_attr_conf: ERROR executing query=INSERT INTO hdb.att_conf (att_conf_id,cs_name,att_name,data_type) VALUES (?, ?, ?, ?)
configure_Attr(tango://tango9-vm:10000/t/v/1/waterlevel): Error inserting into att_conf table (error = -1)

This is the stacktrace from HDB++ configurator (Configuration query error)

fr.esrf.TangoApi.ConnectionFailed
at
fr.esrf.TangoDs.Except.throw_connection_failed(Except.java:616)
fr.esrf.TangoDs.Except.throw_connection_failed(Except.java:569)
fr.esrf.TangoApi.ConnectionDAODefaultImpl.command_inout(ConnectionDAODefaultImpl.java:922)
fr.esrf.TangoApi.ConnectionDAODefaultImpl.command_inout(ConnectionDAODefaultImpl.java:944)
fr.esrf.TangoApi.Connection.command_inout(Connection.java:388)
org.tango.hdbcpp.tools.ArchiverUtils.addAttribute(ArchiverUtils.java:151)
org.tango.hdbcpp.configurator.HdbConfigurator.addSpecifiedAttribute(HdbConfigurator.java:1169)
org.tango.hdbcpp.configurator.AttributeTree.addAttribute(AttributeTree.java:408)
org.tango.hdbcpp.configurator.AttributeTree.treeMouseClicked(AttributeTree.java:175)
org.tango.hdbcpp.configurator.AttributeTree.access$300(AttributeTree.java:61)
org.tango.hdbcpp.configurator.AttributeTree$2.mouseClicked(AttributeTree.java:147)
java.awt.AWTEventMulticaster.mouseClicked(AWTEventMulticaster.java:270)
java.awt.Component.processMouseEvent(Component.java:6538)
javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
java.awt.Component.processEvent(Component.java:6300)
java.awt.Container.processEvent(Container.java:2236)
java.awt.Component.dispatchEventImpl(Component.java:4891)
java.awt.Container.dispatchEventImpl(Container.java:2294)
java.awt.Component.dispatchEvent(Component.java:4713)
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4534)
java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
java.awt.Container.dispatchEventImpl(Container.java:2280)
java.awt.Window.dispatchEventImpl(Window.java:2750)
java.awt.Component.dispatchEvent(Component.java:4713)
java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
java.awt.EventQueue.access$500(EventQueue.java:97)
java.awt.EventQueue$3.run(EventQueue.java:709)
java.awt.EventQueue$3.run(EventQueue.java:703)
java.security.AccessController.doPrivileged(Native Method)
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
java.awt.EventQueue$4.run(EventQueue.java:731)
java.awt.EventQueue$4.run(EventQueue.java:729)
java.security.AccessController.doPrivileged(Native Method)
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
java.awt.EventDispatchThread.run(EventDispatchThread.java:82)

Also I have attached snapshot of the error.

Please help me to resolve this.

Thanks,
Sandeep

rbourtembourg · November 17, 2015, 11:53am

Dear Sandeep,

I’m glad you fixed your previous problem (I guess a wrong PATH or LD_LIBRARY_PATH, or you forgot to install some of the libraries at the correct location).
Please try to configure Cassandra without username and password.
The support for passwords and usernames is not yet implemented in libhdb++cassandra.
I will add that soon.

If this does not work, please verify that you can connect to Cassandra using the Cassandra tools first.
(nodetool status, cqlsh).

Cheers,

Reynald

rbourtembourg · November 17, 2015, 12:01pm

By the way,

Applying the following patch in LibHdb++Cassandra.cpp might be enough…

--- LibHdb++Cassandra.cpp	2015-11-17 12:59:35.559739000 +0100
+++ LibHdb++Cassandra.cpp.new	2015-11-17 12:59:22.641060000 +0100
@@ -44,6 +44,7 @@
   	//cass_log_set_level(CASS_LOG_DEBUG);
 #endif
 	cass_cluster_set_contact_points(mp_cluster,contact_points.c_str());
+	cass_cluster_set_credentials(mp_cluster, user, password);
 	
 	/* Latency-aware routing enabled with the default settings */
 	cass_cluster_set_latency_aware_routing(mp_cluster, cass_true);

You can try and let me know if it works.

sdhere · November 17, 2015, 12:11pm

Hi Reynald,

I am able to connect to Cassandra using cqlsh.

I have tried making those property values blank and also removing them, but still facing same issue.

When I start Event subscriber following is the output on terminal

tango-cs@tango9-vm:~/HDB++_svn/tango-cs-code/archiving/hdb++$ /home/tango-cs/HDB++_svn/tango-cs-code/archiving/hdb++/hdb++es/trunk/bin/hdb++es-srv 01
HdbPPCassandra: VERSION: $Build: eadURL: Nov 10 2015 11:37:37 $ file:src/LibHdb++Cassandra.cpp $Id: $
1447762063.158 [WARN] (src/connection.cpp:791:void cass::Connection::notify_error(const string&, cass::Connection::ConnectionError)): Host 127.0.0.1 received invalid protocol response Invalid or unsupported protocol version: 4
1447762063.158 [WARN] (src/control_connection.cpp:204:virtual void cass::ControlConnection::on_close(cass::Connection*)): Lost control connection on host 127.0.0.1
1447762063.158 [WARN] (src/control_connection.cpp:223:virtual void cass::ControlConnection::on_close(cass::Connection*)): Host 127.0.0.1 does not support protocol version 4. Trying protocol version 3…
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
find_last_event: ERROR in query=SELECT event FROM hdb.att_history WHERE att_conf_id = ? ORDER BY time DESC LIMIT 1
Cassandra Error: Invalid version for TimeUUID type.
Cassandra Error: Invalid version for TimeUUID type.
insert_history_event: ERROR executing query=INSERT INTO hdb.att_history (att_conf_id,event,time,time_us) VALUES ( ?, ?, ?, ?)
event_AttrError adding start event to history table for attribute tango://tango9-vm:10000/sim/motor/1/position (src/LibHdb++Cassandra.cpp:1550)
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
find_last_event: ERROR in query=SELECT event FROM hdb.att_history WHERE att_conf_id = ? ORDER BY time DESC LIMIT 1
Cassandra Error: Invalid version for TimeUUID type.
Cassandra Error: Invalid version for TimeUUID type.
insert_history_event: ERROR executing query=INSERT INTO hdb.att_history (att_conf_id,event,time,time_us) VALUES ( ?, ?, ?, ?)
event_AttrError adding start event to history table for attribute tango://tango9-vm:10000/sys/tg_test/1/double_scalar (src/LibHdb++Cassandra.cpp:1550)
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
find_last_event: ERROR in query=SELECT event FROM hdb.att_history WHERE att_conf_id = ? ORDER BY time DESC LIMIT 1
Cassandra Error: Invalid version for TimeUUID type.
Cassandra Error: Invalid version for TimeUUID type.
insert_history_event: ERROR executing query=INSERT INTO hdb.att_history (att_conf_id,event,time,time_us) VALUES ( ?, ?, ?, ?)
event_AttrError adding start event to history table for attribute tango://tango9-vm:10000/sys/tg_test/1/float_scalar (src/LibHdb++Cassandra.cpp:1550)
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
find_last_event: ERROR in query=SELECT event FROM hdb.att_history WHERE att_conf_id = ? ORDER BY time DESC LIMIT 1
Cassandra Error: Invalid version for TimeUUID type.
Cassandra Error: Invalid version for TimeUUID type.
insert_history_event: ERROR executing query=INSERT INTO hdb.att_history (att_conf_id,event,time,time_us) VALUES ( ?, ?, ?, ?)
event_AttrError adding start event to history table for attribute tango://tango9-vm:10000/sys/tg_test/1/long_scalar (src/LibHdb++Cassandra.cpp:1550)
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
find_last_event: ERROR in query=SELECT event FROM hdb.att_history WHERE att_conf_id = ? ORDER BY time DESC LIMIT 1
Cassandra Error: Invalid version for TimeUUID type.
Cassandra Error: Invalid version for TimeUUID type.
insert_history_event: ERROR executing query=INSERT INTO hdb.att_history (att_conf_id,event,time,time_us) VALUES ( ?, ?, ?, ?)
event_AttrError adding start event to history table for attribute tango://tango9-vm:10000/sys/tg_test/1/wave (src/LibHdb++Cassandra.cpp:1550)
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_param_Attr: Could not find ID for attribute tango://tango9-vm:10000/sim/motor/1/position
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sim/motor/1/position
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_param_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/double_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/double_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_param_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/float_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/float_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_param_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/long_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/long_scalar
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_param_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/wave
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sys/tg_test/1/wave
Ready to accept request
find_attr_id_in_db: ERROR in query=SELECT att_conf_id FROM hdb.att_conf WHERE att_name = ? AND cs_name = ?
Cassandra Error: All hosts in current policy attempted and were either unavailable or failed
insert_Attr: Could not find ID for attribute tango://tango9-vm:10000/sim/motor/1/position

Problem looks same like the one I have posted before.

Thanks,
Sandeep

rbourtembourg · November 17, 2015, 12:16pm

What version of Cassandra are you using?

sdhere · November 17, 2015, 12:19pm

Hi Reynald,

Below is the output of cqlsh

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.11 | CQL spec 3.2.1 | Native protocol v3]

We’ve installed Cassandra as a service. If you need any additional details please let us know.

Regards,
Sandeep

rbourtembourg · November 17, 2015, 12:28pm

I suppose you are running the device servers on the same machine where Cassandra is installed, right?

sdhere · November 17, 2015, 12:31pm

Yes.

rbourtembourg · November 17, 2015, 12:35pm

It looks like the C++ Cassandra driver you are using is using the v4 native protocol.
Your Cassandra version supports up to protocol V3 I think.
I would suggest to install an older version of the Cassandra C++ driver and to recompile the hdb++ libraries and device servers.
Here at the ESRF, we are still using the version 2.0.1 of the C++ Cassandra driver.

I would strongly encourage you to use that version since there might be some parts of the code which might also not work with the latest version of the C++ Cassandra driver which is still evolving a lot.

hoping this helps

Reynald

sdhere · November 18, 2015, 11:57am

Hi Reynald,

Thanks for your inputs, We have successfully configured HDB++ for Cassandra after changing c++ cassandra drivers version.

But I have to change something in keyspace to make it working

Updated :
CREATE KEYSPACE IF NOT EXISTS hdb WITH REPLICATION = { ‘class’ : ‘SimpleStrategy’, ‘replication_factor’: ‘1’} AND durable_writes = true;

Original :
CREATE KEYSPACE IF NOT EXISTS hdb WITH REPLICATION = { ‘class’ : ‘NetworkTopologyStrategy’, ‘DC1’ : 3 };

Thanks,
Sandeep

rbourtembourg · November 18, 2015, 12:22pm

Hi Sandeep,

Good news!
If you are playing and planning to play with only one Datacenter, I think it makes sense to use the SimpleStrategy indeed. If you plan to add a new datacenter at some point, you should use the NetworkTopologyStrategy as described in the Cassandra documentation (I guess you already saw that).

durable_writes default value is true so this part should not be necessary.

Please remember that HDB++ Cassandra is still under development (as you noticed the installation procedure should be simplified and some documentation will have to be written and published) and some coming changes may impact you.
I wouldn’t recommend to use it in production right now.
An important change still needs to be implemented (partitioning per hour instead of per day). This will improve performances and improve the robustness of the system.
If you use the current version to store production data, you will need to convert the already stored data to the new partitioning system per hour when this one will be implemented.

Cheers,
Reynald

jforsberg · February 9, 2017, 10:53am

Sorry for replying to a very old post, but I found this and was curious about it since we are considering to deploy HDB++ with Cassandra at MAXIV.

You (Reynald) mention that you are changing the period to be per hour instead of per day. Is this change implemented in the latest version of HDB++? Otherwise, is it still on the roadmap?

I have been trying the latest version, but unfortunately I have not found a way to use it with Cassandra version 3.0, since HDB++ is not quite compatible with newer, required versions of the C++ driver. This is actually a regression, since the old version we have been testing with so far works fine with Cassandra 3.0, through C++ driver version 2.2.2. I guess you are not using Cassandra 3 and have not had reason to try newer library versions?

Cheers,
Johan

rbourtembourg · February 9, 2017, 12:18pm

Hi,
This change is not implemented yet in the latest version of HDB++.
It is still on the roadmap because the fact to get partitions of one day per attribute can cause some troubles with Cassandra (partitions getting too big) especially if you are receiving several events per second for some attributes.
The best would be to find a way to configure this per attribute.

Hi, I’m surprised the old HDB++ version is compatible with Cassandra 3.0 but not the newer.
In any case, as you guessed, we did not try with Cassandra 3.0 yet because we thought it was still a bit early to move to a new major release version of Cassandra.
Our experience showed that it is better to wait some time for the bugs to be corrected in the new Cassandra versions.
Especially with Cassandra 3.0 where there is a new storage engine.
We are using Cassandra 2.2(.4) and HDB++ should work with Cassandra 2.2.x versions.

Cheers,
Reynald

jforsberg · February 9, 2017, 1:16pm

OK, that is good to know. Any guess to the possible time frame on this?

Is there a significant cost to making the period shorter? I suppose querying over lots of periods might be a bit slower..?

I see, and this makes sense, however as far as I can see, DataStax are now using Cassandra 3.0 as part of their “enterprise” product and so I assume it’s considered stable for production now. Perhaps we can have a look at fixing the driver incompatibility, since we may be the only ones who care right now

Thanks for your reply,
Johan

rbourtembourg · February 9, 2017, 1:46pm

[quote=“johfor”][quote=“Reynald”]
Hi,
This change is not implemented yet in the latest version of HDB++.
It is still on the roadmap because the fact to get partitions of one day per attribute can cause some troubles with Cassandra (partitions getting too big) especially if you are receiving several events per second for some attributes.
The best would be to find a way to configure this per attribute.
[/quote]

OK, that is good to know. Any guess to the possible time frame on this?
[/quote]

Difficult to say right now… I’m being quite busy with the Tango kernel (and other stuff!). We are hiring someone who will work on HDB++ soon but I suspect this change won’t be released before several months.

Yeah, the main drawback is the fact we will have to execute many more queries, even if there is not so much data in the partitions we are querying… In terms of performances, it might be a bit slower but maybe not that much actually… This has to be evaluated.
If we have attributes which are sending archive events once every hour, we will have partitions containing only 1 row, which is clearly sub-optimal (even a partition per day in this case is a bit overkill).
The best would be if Cassandra would be able to adapt the partitions size itself but this is not the case.
So this is not so easy to handle in our case because we are based on events. And there could be some periods where many events will be sent in the same second while on some other periods, only a few events per day will be sent.
I think there is room for improvement in Cassandra itself because it complains in the logs when there are too big partitions but it does nothing else about it.
I guess there must be a way to improve HDB++ to find a way to adapt automatically to create reasonable size partitions. Using Spark to re-organize the data in optimal partition sizes and add information about what are the partitions available for a given attribute in a given period would be an idea for instance.
But there must be much cleverer ideas…

[quote=“johfor”]

I see, and this makes sense, however as far as I can see, DataStax are now using Cassandra 3.0 as part of their “enterprise” product and so I assume it’s considered stable for production now. Perhaps we can have a look at fixing the driver incompatibility, since we may be the only ones who care right now

Thanks for your reply,
Johan[/quote]

Any contribution is welcome!

jforsberg · February 9, 2017, 2:26pm

Agreed, it would be nice to be able to tune the partition size per-attribute.

I suppose if you were to somehow dynamically change the size of partitions it would also mean that the way partition keys work would need to change, since now it assumes that you can know beforehand the key of the partition your data is in.

The Cassandra docs mention 100.000 points/100MB as “rule of thumb” maximum partition sizes, but it seems like these numbers are really limitations from pre version 2.X days. I also see that 3.0 has undergone a lot of changes to how rows and partitions are stored on disk, it might be interesting to compare.

rbourtembourg · February 9, 2017, 2:42pm

[quote=“johfor”]Agreed, it would be nice to be able to tune the partition size per-attribute.

I suppose if you were to somehow dynamically change the size of partitions it would also mean that the way partition keys work would need to change, since now it assumes that you can know beforehand the key of the partition your data is in.
[/quote]
Exactly. If noting is done on the Cassandra side, we would need to find a way to tell the clients what are the available partitions for the attribute and period of time they want to retrieve.

[quote=“johfor”]
The Cassandra docs mention 100.000 points/100MB as “rule of thumb” maximum partition sizes, but it seems like these numbers are really limitations from pre version 2.X days. I also see that 3.0 has undergone a lot of changes to how rows and partitions are stored on disk, it might be interesting to compare.[/quote]

Might be interesting indeed.
FYI, another improvement which is planned in HDB++/Cassandra and which will reduce the partitions size will be to move the timestamps which are used for diagnostics (recv_time, recv_time_us, insert_time and insert_time_us) to another optional table.

jranpura · June 27, 2017, 2:23pm

Hi Reynald,

I need to clarify my understanding regarding HDB++ Event Subscribers connection with Cassandra.

In the Event Subscriber class properties, we specify the DbHost, DbName and DbPort. This implies that all the Event Subscribers deployed in a single Tango Facility will use these class properties and connect with the specified Cassandra Node of a Cluster.

Suppose a Cassandra cluster comprises of two nodes, viz. Cassandra Node A and Cassandra Node B. The Cassandra Nodes are deployed on different machines. I want to distribute the HDB++ archiving operations between Cassandra Nodes, i.e. some Event Subscribers will write to Cassandra Node A while other Event Subscribers will write to Cassandra Node B. For this, I defined the DbHost, DbName and DbPort as Device properties which override the Class properties.

Please correct me if I’m wrong.

Kind regards,
Jyotin

lpivetta · June 28, 2017, 2:05pm

Hi Jyotin,
you’re right, the Device properties, when defined, will override the Class properties.
Cheers,
Lorenzo