Does anyone know whether there is a java library for parsing a MySQL schema? In code I want to be able to determine the tables and fields specified in a schema. Or will I need to write my own?
Thanks Richard.
Edit: Just want to avoid re-inventing the wheel unnecessarily :)
Answering my own question:
Am using jsqlparser http://jsqlparser.sourceforge.net/
This parses individual statements, not multiple statements such as found in a schema. So split the schema on ';'. It also doesn't like the '`' character, so these need to be stripped out. Code to get column names for a particular table:
public class BUDataColumnsFinder {
public static String[] readSql(String schema) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(schema)));
String mysql = "";
String line;
while ((line = br.readLine()) != null) {
mysql = mysql + line;
}
br.close();
mysql = mysql.replaceAll("`", "");
return mysql.split(";");
}
public static List<String> getColumnNames(String tableName, String schemaFile) throws JSQLParserException, IOException {
CCJSqlParserManager pm = new CCJSqlParserManager();
List<String> columnNames = new ArrayList<String>();
String[] sqlStatements = readSql(schemaFile);
for (String sqlStatement : sqlStatements) {
Statement statement = pm.parse(new StringReader(sqlStatement));
if (statement instanceof CreateTable) {
CreateTable create = (CreateTable) statement;
String name = create.getTable().getName();
if (name.equalsIgnoreCase(tableName)) {
List<ColumnDefinition> columns = create.getColumnDefinitions();
for (ColumnDefinition def : columns) {
columnNames.add(def.getColumnName());
}
break;
}
}
}
return columnNames;
}
public static void main(String[] args) throws Exception {
String schemaFile = "/home/john/config/bu-schema.sql";
String tableName = "records";
List<String> columnNames = BUDataColumnsFinder.getColumnNames(tableName, schemaFile);
for (String name : columnNames) {
System.out.println("name: " + name);
}
}
}
You may want to consider using code from Alibaba's Druid project. Although designed as a sophisticated connection pooling library, this project supports a very advanced parser and AST for ANSI SQL and non-ANSI dialects such as MySQL, Oracle, SQL Server, etc. The project is open source and bears the very liberal Apache License Version 2.0.
The main entry points into this part of the library is SQLUtils.java. You can use values returned from SQLUtils.parseStatements to access a typed model of the statements:
List<SQLStatement> statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
for (SQLStatement statement : statements) {
if (statement instanceof MySqlCreateTableStatment) {
MySqlCreateTableStatment createTable = (MySqlCreateTableStatment) statement;
// Use methods like: createTable.getTableSource()
}
}
Why not just use DatabaseMetaData to find out the tables and columns? This presumes that the schema expressed in SQL has been run against the database you're connected to, but that's not a difficult assumption to satisfy.
MySQL might be able to simply import the data if you have the data in CSV format. I'd dig deeper into MySQL tools before I'd write Java code to do such a thing. If that doesn't work, I'd find an ETL tool to help me. Writing Java would be my solution of last resort.
Related
I have this query code in my application:
#Override
public MyParameter loadMyParameterSetByVersion(Long version) {
StringBuilder sb = new StringBuilder();
sb.append("SELECT mp FROM MyParameter mp ");
sb.append("INNER JOIN FETCH mp.priceParametersGood good ");
sb.append("WHERE mp.objId = :version ");
sb.append("ORDER BY good.isBc, good.isGd, good.priceFrom");
QueryBuilder builder = createQueryBuilder(sb.toString());
builder.addParameter("version", version);
List<MyParameter> result = executeQuery(builder.createQuery());
if (result.size() > 0) {
return result.get(0);
} else {
return null;
}
}
I did not write this method, I just added the ORDER BY condition, because it's needed right now. My problem is, the results are still not sorted that way. Do I need to rewrite this? If yes, what should I use to make it work?
I tried this query in my Oracle DB and there the results are sorted, so I assume it's something with this Query.
I think a orderby is added for you when no orderBy() is added.
So this essentially QueryBuilder will override your orderby that is in the string by adding yet another order by.
Instead of having a hard coded string as the SQL, you should use the DSL syntax that QueryBuilder offers you.
Using some code I found here you will see that you are using QueryBuilder in the wrong way and the original code should be re-written.
You code should look more the the article.
Statement statement = QueryBuilder.select().all().from( table.tableName() ).where( cName ).and( cBtm ).and( cTop ).orderBy( order );
final Iterator<Row> iter = session.execute( statement ).iterator();
for (Row row : iter ){
...
}
We are using MongoDb for saving and fetching data.
All calls that are putting data into collections are working fine and are through common method.
All calls that are fetching data from collections are working fine sometimes and are through common method.
But Sometimes, only for one of the collection, i get my calls being stuck for forever, consuming CPU usage. I have to manually kill the threads otherwise it consumes my whole CPU.
Mongo Connection
MongoClient mongo = new MongoClient(hostName , Integer.valueOf(port));
DB mongoDb = mongo.getDB(dbName);
Code To fetch
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject dbObject = new BasicDBObject("_id" , key);
DBCursor cursor = collection.find(dbObject);
Though i have figured out the collection for which it is causing issues, but how can i improve upon this, since it is occurring for this particular collection and sometimes.
EDIT
Code to save
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject query = new BasicDBObject("_id" , key);
DBObject update = new BasicDBObject();
update.put("$set" , JSON.parse(value));
collection.update(query , update , true , false);
Bulk Write / collection
DB mongoDb = controllerFactory.getMongoDB();
DBCollection collection = mongoDb.getCollection(collectionName);
BulkWriteOperation bulkWriteOperation = collection.initializeUnorderedBulkOperation();
Map<String, Object> dataMap = (Map<String, Object>) JSON.parse(value);
for (Entry<String, Object> entrySet : dataMap.entrySet()) {
BulkWriteRequestBuilder bulkWriteRequestBuilder = bulkWriteOperation.find(new BasicDBObject("_id" ,
entrySet.getKey()));
DBObject update = new BasicDBObject();
update.put("$set" , entrySet.getValue());
bulkWriteRequestBuilder.upsert().update(update);
}
How can i set timeout for fetch calls..??
A different approach is to use the proposed method for MongoDB 3.2 Driver. Keep in mind that you have to update your .jar libraries (if you haven't) to the latest version.
public final MongoClient connectToClient(String hostName, String port) {
try {
MongoClient client = new MongoClient(hostName, Integer.valueOf(port));
return client;
} catch(MongoClientException e) {
System.err.println("Cannot connect to Client.");
return null;
}
}
public final MongoDatabase connectToDB(String databaseName) {
try {
MongoDatabase db = client.getDatabase(databaseName);
return db;
} catch(Exception e) {
System.err.println("Error in connecting to database " + databaseName);
return null;
}
public final void closeConnection(MongoClient client) {
client.close();
}
public final void findDoc(MongoDatabase db, String collectionName) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
FindIterable<Document> iterable = collection
.find(new Document("_id", key));
Document doc = iterable.first();
//For an Int64 field named 'special_id'
long specialId = doc.getLong("special_id");
} catch(MongoException e) {
System.err.println("Error in retrieving document.");
} catch(NullPointerException e) {
System.err.println("Document with _id " + key + " does not exist.");
}
}
public final void insertToDB(MongoDatabase db, String collectioName) {
try {
db.getCollection(collectionName).insertOne(new Document()
.append("special_id", 5)
//Append anything
);
catch(MongoException e) {
System.err.println("Error in inserting new document.");
}
}
public final void updateDoc(MongoDatabase db, String collectionName, long id) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
collection.updateOne(new Document("_id", id),
new Document("$set",
new Document("special_id",
7)));
catch(MongoException e) {
System.err.println("Error in updating new document.");
}
}
public static void main(String[] args) {
String hostName = "myHost";
String port = "myPort";
String databaseName = "myDB";
String collectionName = "myCollection";
MongoClient client = connectToClient(hostName, port);
if(client != null) {
MongoDatabase db = connectToDB(databaseName);
if(db != null) {
findDoc(db, collectionName);
}
client.closeConnection();
}
}
EDIT: As the others suggested, check from the command line if the procedure of finding the document by its ID is slow too. Then maybe this is a problem with your hard drive. The _id is supposed to be indexed but for better or for worse, re-create the index on the _id field.
The answers posted by others are great, but did not solve my purpose.
Actually issue was in my existing code itself , my cursor was waiting in while loop infinite time.
I was missing few checks which has been resolved now.
Just some possible explanations/thoughts.
In general "query by id" has to be fast since _id is supposed to be indexed, always. The code snippet looks correct, so probably the reason is in mongo itself. This leads me to a couple of suggestions:
Try to connect to mongo directly from the command line and run the "find" from there. The chances are that you'll still be able to observe occasional slowness.
In this case:
Maybe its about the disks (maybe this particular server is deployed on the slow disk or at least there is a correlation with some slowness of accessing the disk).
Maybe your have a sharded configuration and one shard is slower than others
Maybe its a network issue that occurs sporadically. If you run mongo locally/on staging env. with the same collection does this reproduce?
Maybe (Although I hardly believe that) the query runs in sub un-optimal way. In this case you can use "explain()" as someone has already suggested here.
If you happen to have replica set, please figure out what is the [Read Preference]. Who knows, maybe you prefer to get this id from the sub-optimal server
I have the following rows with these keys in hbase table "mytable"
user_1
user_2
user_3
...
user_9999999
I want to use the Hbase shell to delete rows from:
user_500 to user_900
I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?
I see here:
https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java
I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?
Table ht = TEST_UTIL.getConnection().getTable("my_table");
long noOfDeletedRows = 0L;
Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
ServerRpcController controller = new ServerRpcController();
BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
new BlockingRpcCallback<BulkDeleteResponse>();
public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
Builder builder = BulkDeleteRequest.newBuilder();
builder.setScan(ProtobufUtil.toScan(scan));
builder.setDeleteType(deleteType);
builder.setRowBatchSize(rowBatchSize);
if (timeStamp != null) {
builder.setTimestamp(timeStamp);
}
service.delete(controller, builder.build(), rpcCallback);
return rpcCallback.get();
}
};
Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
.getStartRow(), scan.getStopRow(), callable);
for (BulkDeleteResponse response : result.values()) {
noOfDeletedRows += response.getRowsDeleted();
}
ht.close();
If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.
Do you really want to do it in shell because there are various other better ways. One way is using the native java API
Construct an array list of deletes
pass this array list to Table.delete method
Method 1: if you already know the range of keys.
public void massDelete(byte[] tableName) throws IOException {
HTable table=(HTable)hbasePool.getTable(tableName);
String tablePrefix = "user_";
int startRange = 500;
int endRange = 999;
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} finally {
if (hbasePool != null && table != null) {
hbasePool.putTable(table);
}
}
}
Method 2: If you want to do a batch delete on the basis of a scan result.
public bulkDelete(final HTable table) throws IOException {
Scan s=new Scan();
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
//add your filters to the scanner
s.addFilter();
ResultScanner scanner=table.getScanner(s);
for (Result rr : scanner) {
Delete d=new Delete(rr.getRow());
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} catch (Exception e) {
LOGGER.log(e);
}
}
Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase.
CoProcessors have many inbuilt issues if you need I can provide a detailed description to you.
Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.
Modified code to support batch operation.
int batchSize = 50;
int batchCounter=0;
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
batchCounter++;
if(batchCounter==batchSize){
try {
table.delete(listOfBatchDelete);
listOfBatchDelete.clear();
batchCounter=0;
}
}}
Creating HBase conf and getting table instance.
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);
HTable hTable = new HTable(hConf, tableName);
If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach
1.First create a List objects with these rowkeys
for (int rowKey = 1; rowKey <= 10; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes(rowKey + "")));
}
2.Then get the Table object by using HBase Connection
Table table = connection.getTable(TableName.valueOf(tableName));
3.Once you have table object call delete() by passing the list
table.delete(deleteList);
The complete code will look like below
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
String tableName = "users";
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));
List<Delete> deleteList = new ArrayList<Delete>();
for (int rowKey = 500; rowKey <= 900; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey)));
}
table.delete(deleteList);
I have 2 input dates: myStartDate,myEndDate and a table TEST_TABLE with columns
TEST_ID, TEST_USER,TEST_START, TEST_END
I need to check if the range of dates between myStartDate and myEndDate have corresponding records in the TEST_TABLE.
I also need to ensure that I don't retrieve duplicate records.
Here's a sample of the logic I have so far:
Assuming,
myStartDate=06/06/2012;myEndDate=06/09/2012
int diff = myEndDate - myStartDate; //In this case = 3
String myQuery = "SELECT * FROM TEST_TABLE WHERE"+ myStartDate +"BETWEEN TEST_START AND TEST_END OR "+ (myStartDate +1) +" BETWEEN TEST_START AND TEST_END OR"+ (myStartDate+2)+"BETWEEN TEST_START AND TEST_END OR"+(myStartDate+3)+"BETWEEN TEST_START AND TEST_END";
List <TestTableData> myList = new List();
//Exceute query & save results into myList using add method
Want to know if there's any way to test the range of dates between myStartDate &myEndDate using a for loop in java code, instead of the approach used above in myQuery.Also, how can I avoid duplicates.
New to Java so any help would be appreciated!
Use a ResultSet to iterate over the output, like the code below.
while (res.next()) {
String col1= res.getString("col1");
String col2 = res.getString("col2");
}
If you use an Array implementation , it does not allow for duplicate elements and hence there is no need to check for one.
But if you must use a list then , you could use the following code to remove any duplicate elements.
public static void removeDuplicates(List list)
{
Set set = new HashSet();
List newList = new ArrayList();
for (Iterator iter = list.iterator(); iter.hasNext(); ) {
Object element = iter.next();
if (set.add(element))
newList.add(element);
}
list.clear();
list.addAll(newList);
}
I think what you are asking are some generic questions about how to read a database and how to handle dates in java. I will give you some sample code below. But I suggest you look at the java database tutorial http://docs.oracle.com/javase/tutorial/jdbc/index.html and the java.util.Date api doc http://docs.oracle.com/javase/1.5.0/docs/api/java/sql/Date.html for more info.
Here is some sample code that specifically demonstrates how to implement your question:
// get the input dates
// they are hard coded in this example
// but would probably normally be passed in
String startDateStr = "2/3/03";
String endDateStr = "3/1/03";
// unfortunately, there are 2 "Date" classes in this code and you need to differentiate
// java.util.Date is the standard java class for manipulating dates
// java.sql.Date is used to handle dates in the database
// name conflicts like this are rare in Java
SimpleDateFormat dateFmt = new SimpleDateFormat("M/d/yy");
java.util.Date myStartDate = dateFmt.parse(startDateStr);
java.util.Date myEndDate = dateFmt.parse(endDateStr);
// conneect to the database
// I am using mysql and its driver class is "com.mysql.jdbc.Driver"
// if using a different database, you would use its driver instead
// make sure the jar containing the driver is in your classpath (library list)
// you also have to know the url string that connects to your database
Class.forName("com.mysql.jdbc.Driver").newInstance(); // loads the driver
Connection dbConn = DriverManager.getConnection(
"jdbc:mysql://localhost/testdb", "(db user)", "(db password)"
);
// get the database rows from the db table
// my table is named "testtable"
// my columns are named "DateStart" and "DateEnd"
Statement st = dbConn.createStatement();
String sqlStr = "Select * from testtable";
ResultSet rs = st.executeQuery(sqlStr);
// loop through the rows until you find a row with the right date range
boolean foundRange = false;
while (rs.next()) {
java.util.Date dbStartDate = rs.getDate("DateStart");
java.util.Date dbEndDate = rs.getDate("DateEnd");
if (myStartDate.before(dbStartDate)) continue;
if (myEndDate.after(dbEndDate)) continue;
foundRange = true;
break;
}
if (foundRange) {
// code that executes when range is found in db
} else {
// code that executes if range not found in db
}
dbConn.close();
Hope this helps you get started.
I use a Row Set to pass query results in my selenium framework. Occasionally the data access object throws the following
java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://MYDatabasename:1433/DB
It uses this same driver and rowset to access and only fails occasionally. Any help would be appreciated.
RowSet:
public static RowSet GetRowSet(String SqlQuery, String[] Parameters, String DB){
CachedRowSet rs;
String ROWSET_IMPL_CLASS = "com.sun.rowset.CachedRowSetImpl";
rs = null;
try {
Class<?> c = Class.forName(ROWSET_IMPL_CLASS);
rs = (CachedRowSet) c.newInstance();
rs.setUrl(Configuration.DBConnString + DB);
rs.setUsername(Configuration.DBUser );
rs.setPassword(Configuration.DBPwd );
rs.setReadOnly(true);
rs.setCommand(SqlQuery);
for (int p=0;
p<Parameters.length;
p++)
{
rs.setString(p+1, Parameters[p]);
}
rs.execute();
Example of code:
public void examplevoid(String string, String string2)
throws Exception {
RowSet RoS = null;
RoS = Example.GetExample(string, string2);
while (RoS.next()) {
String Example = RoS.getString("Example");
selenium.click(Example)
selenium.waitForPageToLoad(setup.timeoutsetting);
}
RoS.close();
Which uses and in turn calls the rowset:
public static RowSet GetExample(String string, String string2) throws
String[] Parameters = {string, string2};
RowSet ExampleRowSet= null;
ExampleRowSet = DataAccess.GetRowSet("Some SQL HERE", Parameters, Configuration.DB);
return Example;
That seems impossible. Either the driver class is loaded, or it isn't. Once loaded, successive calls to DriverManager.getConnection() with the same JDBC URL should never give that error. What else is going on?
Edit: The only questionable thing I see is that all of your Configuration.* properties appear to be fields in a class somewhere. If some of those properties are changing values between tests, maybe your JDBC driver is causing that exception to be thrown because of a bad property value, like the Configuration.DB or Configuration.DBConnString. If it's fairly repeatable, try changing
rs.setUrl(Configuration.DBConnString + DB);
to
String url = Configuration.DBConnString + DB;
log.debug("Using JDBC URL: " + url);
rs.setUrl(url);
When the exception happens, see if the string looks different.