CQLSSTableWriter not exporting .csv fully to SSTable - Cassandra - java

I have csv of 32GB with almost 150million rows, i planned to use SStableloader to export data to cassandra on EC2, & to generate SStable i used java codes below.
Problem is, on server i am only getting 12k rows, also the filesize of generated SStable is just 28 m & process is not throwing any error.
Moreover, if i execute it on another .csv, one with 10 rows, no issues, i get all 10 rows.
if(args.length < 2){
System.out.println("Something wrong with parameters, heres pattern: <CSV_URL> <Default_Output_Dir>");
return;
}
CSV_URL = args[0];
DEFAULT_OUTPUT_DIR = args[1];
// magic!
Config.setClientMode(true);
// Create output directory that has keyspace and table name in the path
File outputDir = new File(DEFAULT_OUTPUT_DIR + File.separator + KEYSPACE + File.separator + TABLE);
if (!outputDir.exists() && !outputDir.mkdirs())
{
throw new RuntimeException("Cannot create output directory: " + outputDir);
}
// Prepare SSTable writer
CQLSSTableWriter.Builder builder = CQLSSTableWriter.builder();
// set output directory
builder.inDirectory(outputDir)
// set target schema
.forTable(SCHEMA)
// set CQL statement to put data
.using(INSERT_STMT)
// set partitioner if needed
// default is Murmur3Partitioner so set if you use different one.
.withPartitioner(new Murmur3Partitioner());
CQLSSTableWriter writer = builder.build();
try (
BufferedReader reader = new BufferedReader(new FileReader(CSV_URL));
CsvListReader csvReader = new CsvListReader(reader, CsvPreference.STANDARD_PREFERENCE)
){
//csvReader.getHeader(true);
// Write to SSTable while reading data
List<String> line;
while ((line = csvReader.read()) != null)
{
writer.addRow(
Integer.parseInt(line.get(0)),
..
new BigDecimal(line.get(22)),
new BigDecimal(line.get(23))
);
}
}
catch (Exception e)
{
e.printStackTrace();
}
try
{
writer.close();
}
catch (IOException ignore) {}
and here's schema:
CREATE KEYSPACE IF NOT EXISTS ma WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE ma;
CREATE TABLE IF NOT EXISTS cassie (PKWID int,DX varchar,......, QS decimal,PRIMARY KEY (PKWID));
using Cassandra 22x.
Java driver for creating SSTable

Related

Reading Large file A, Search Records matching records from file B and write file C in java

I have two files assume its already sorted.
This is just example data, in real ill have around 30-40 Millions of records each file Size 7-10 GB file as row length is big, and fixed.
It's a simple text file, once searched record is found. ill do some update and write to file.
File A may contain 0 or more records of matching ID from File B
Motive is to complete this processing in least amount of time possible.
I am able to do but its time taking process...
Suggestions are welcome.
File A
1000000001,A
1000000002,B
1000000002,C
1000000002,D
1000000002,D
1000000003,E
1000000004,E
1000000004,E
1000000004,E
1000000004,E
1000000005,E
1000000006,A
1000000007,A
1000000008,B
1000000009,B
1000000010,C
1000000011,C
1000000012,C
File B
1000000002
1000000004
1000000006
1000000008
1000000010
1000000012
1000000014
1000000016
1000000018\
// Not working as of now. due to logic is wrong.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
inLine = in.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}
My suggestion would be to use a database. As said in this answer. Using txt files has a big disadvantage over DBs. Mostly because of the lack of indexes and the other points mentioned in the answer.
So what I would do, is create a Database (there are lots of good ones out there such as MySQL, PostgreSQL, etc). Create the tables that are needed, and read the file afterward. Insert each line of the file into the DB and use the db to search and update them.
Maybe this would not be an answer to your concrete question on
Motive is to complete this processing in the least amount of time possible.
But this would be a worthy suggestion. Good luck.
With this approach I am able to process 50M Records in 150 Second on i-3, 4GB Ram and SSD Hardrive.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
// Since file is already sorted, i was looking for the //ans i found here..
long seachInt = Long.parseLong(searchLineSubString);
long inInt = Long.parseLong(inputLineSubString);
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
}
// Which pointer to move..
if(seachInt < inInt) {
searchLine = search.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}

PrintWrite new CSV file after validating information from old CSV file and LOAD DATA INFILE into MySQL

Currently working on this CSV file with lots of entries. Then I did some validation and insert the validated entries into new CSV file. Finally, I did a LOAD DATA INFILE sql query for the new CSV file, but I got 0 results in database. The LOAD DATA INFILE query is able to load the old CSV file though.
The code below calls the method
createNewUserCSV(validatedUsers, OUTPUT_PATH);
String csvDirectory = OUTPUT_PATH + File.separator + "newUser.csv";
String getCSVDirectory = csvDirectory.replace(File.separator, "/");
userDAO.insertUserByFile(getCSVDirectory);
This is my createNewUserCSV method (This code seems to work fine as I can see the newUser.csv created and is in the folder i specified.
private void createNewUserCSV(ArrayList<User> validatedUsers, String outputPath) {
try{
PrintWriter writer = new PrintWriter(new FileOutputStream(outputPath + File.separator + "newUser.csv", true));
StringBuilder sb = new StringBuilder();
sb.append("name");
sb.append(',');
sb.append("password");
sb.append(',');
sb.append("email");
sb.append(',');
sb.append("gender");
sb.append('\n');
for(User user: validatedUsers){
sb.append(user.getName());
sb.append(',');
sb.append(user.getPassword());
sb.append(',');
sb.append(user.getEmail());
sb.append(',');
sb.append(user.getGender());
sb.append('\n');
}
writer.write(sb.toString());
writer.close();
}catch (IOException e){
e.printStackTrace();
}
}
However, when I tried using LOAD DATA INFILE for newUser.csv it does not work, it works for oldUser.csv though with the query.
public void insertUserByFile(String csvFileDirectory) {
//boolean success = true;
try {
conn = dbController.getConnection();
//Load data infile
query = "LOAD DATA LOCAL INFILE '" + csvFileDirectory + "' INTO TABLE user "
+"FIELDS TERMINATED BY ',' "
+"ENCLOSED BY '\"' "
+"LINES TERMINATED BY '\r\n' "
+"IGNORE 1 LINES";
pstmt = conn.prepareStatement(query);
pstmt.executeUpdate();
} catch (Exception e) {
e.printStackTrace();
//success = false;
} finally {
if (dbController != null) {
dbController.close(conn, pstmt, rs);
}
}
//return success;
}
All testing was done locally
Output from MySQL upon uploading the newUser.csv
0 row(s) affected Records: 0 Deleted: 0 Skipped: 0 Warnings: 0
Output from MySQL upon uploading the oldUser.csv
21396 row(s) affected
Is my newUser.csv having issue?
My newUser.csv do have the correct number of validated entries.
It seams that you are appending \n inside createNewUserCSV(), but you are calling LOAD DATA INFILE using \r\n. So replace:
LINES TERMINATED BY '\r\n'
with:
LINES TERMINATED BY '\n'
or append \r\n inside createNewUserCSV().
Also, in regards of LOAD DATA INFILE, it can be that you run into an error - maybe without actually seeing it on screen:
ERROR 1290 (HY000): The MySQL server is running with the
--secure-file-priv option so it cannot execute this statement
It means: The LOAD DATA INFILE has no permission to read the csv file (newUser.csv). Then you must set secure-file-priv in the configuration file of your database (my.cnf, or my.ini) yourself. Like this:
[mysqld]
secure-file-priv = "<PATH-TO-FOLDER-CONTAINING-THE-CSV-FILES>/"
Also note that [by default] you can only read and write files stored in sacred locations, as specified by variable ##GLOBAL.secure_file_priv
Ubuntu 16.04 (EASY): Find out where you are allowed to write
mysql> SELECT ##GLOBAL.secure_file_priv;
+---------------------------+
| ##GLOBAL.secure_file_priv |
+---------------------------+
| /var/lib/mysql-files/ |
+---------------------------+
1 row in set (0.00 sec)
Then, just write there
mysql> SELECT * FROM train INTO OUTFILE '/var/lib/mysql-files/test.csv' FIELDS TERMINATED BY ',';
Query OK, 992931 rows affected (1.65 sec)
mysql>

What is best practice to store 50000+ records in mysql in a single transaction

Input set: thousands(>10000) of csv files, each containing >50000 entries.
output: Store those data in mysql db.
Approach taken:
Read each file and store the data into database. Below is the code snippet for the same. Please suggest if this approach is ok or not.
PreparedStatement pstmt2 = null;
try
{
pstmt1 = con.prepareStatement(sqlQuery);
result = pstmt1.executeUpdate();
con.setAutoCommit(false);
sqlQuery = "insert into "
+ tableName
+ " (x,y,z,a,b,c) values(?,?,?,?,?,?)";
pstmt2 = con.prepareStatement(sqlQuery);
Path file = Paths.get(filename);
lines = Files.lines(file, StandardCharsets.UTF_8);
final int batchsz = 5000;
for (String line : (Iterable<String>) lines::iterator) {
pstmt2.setString(1, "somevalue");
pstmt2.setString(2, "somevalue");
pstmt2.setString(3, "somevalue");
pstmt2.setString(4, "somevalue");
pstmt2.setString(5, "somevalue");
pstmt2.setString(6, "somevalue");
pstmt2.addBatch();
if (++linecnt % batchsz == 0) {
pstmt2.executeBatch();
}
}
int batchResult[] = pstmt2.executeBatch();
pstmt2.close();
con.commit();
} catch (BatchUpdateException e) {
log.error(Utility.dumpExceptionMessage(e));
} catch (IOException ioe) {
log.error(Utility.dumpExceptionMessage(ioe));
} catch (SQLException e) {
log.error(Utility.dumpExceptionMessage(e));
} finally {
lines.close();
try {
pstmt1.close();
pstmt2.close();
} catch (SQLException e) {
Utility.dumpExceptionMessage(e);
}
}
I've used LOAD DATA INFILE ins situations like this in the past.
The LOAD DATA INFILE statement reads rows from a text file into a
table at a very high speed. LOAD DATA INFILE is the complement of
SELECT ... INTO OUTFILE. (See Section 14.2.9.1, “SELECT ... INTO
Syntax”.) To write data from a table to a file, use SELECT ... INTO
OUTFILE. To read the file back into a table, use LOAD DATA INFILE. The
syntax of the FIELDS and LINES clauses is the same for both
statements. Both clauses are optional, but FIELDS must precede LINES
if both are specified.
The IGNORE number LINES option can be used to ignore lines at the start of the file. For example, you can use IGNORE 1 LINES to skip over an initial header line containing column names:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
As #Ridrigo has already pointed out, LOAD DATA INFILE is the way to go. Java is not really needed at all.
If the format of your CSV is not something that can directly be inserted into the database, your Java code can renter the picture. Use it to reorganize/transform the CSV and save it as another CSV file instead of writing it into the database.
You can also use the Java code to iterate through the folder that contains the CSV, and then execute the system command for the
Runtime r = Runtime.getRuntime();
Process p = r.exec("mysql -p password -u user database -e 'LOAD DATA INFILE ....");
you will find that this is much much faster than running individual sql queries for each row of the CSV file.

JSONArray throwing out of memory exception in android

In my app I sync some data at the end of day to the app server.For this I wrap all my data as a JSONArray of JSONObjects.The data mainly includes about 50 pictures each with a size of approx 50kb(along with some text data).All these pictures are encoded using base64 encoding.Everthing works fine when the pictures uploaded(along with some text data) are few in number,but when I upload a large no of pictures ,say around 50 then I see in the logs that all the data is properly formed into the JSONArray,however when I try to display the JSONArray using 'array.toString()' method I encounter an out of memory exception.This I believe is due to the heap getting full(however,when I try making android:largeHeap="true" in the manifest everything is working fine,however I want to avoid using this approach,since this is not a good practice).My intention is just to write this JSONArray value into a file and then break this file into small chunks and send it across to the server.
Please guide me of the best approach of writing the JSONAray value to the file which won't lead to OOM issues.Thanks !
Following is the format of the JSONArray:
[{"pid":"000027058451111","popup_time":"2014-01-13 23:36:01","picture":"...base64encoded string......","punching_time":"Absent","status":"Absent"},{"pid":"000027058451111","popup_time":"2014-01-13 23:36:21","picture":"...base64encoded string......","punching_time":"Absent","status":"Absent"}]
Following are the main snippets of my code:
JSONObject aux;
JSONArray array = new JSONArray();
.
.
// Looping through each record in the cursor
for (int i = 0; i < count; i++) {
aux = new JSONObject();
try {
aux.put("pid", c.getString(c.getColumnIndex("pid")));
aux.put("status", c.getString(c.getColumnIndex("status")));
aux.put("pop_time", c.getString(c.getColumnIndex("pop_time")));
aux.put("punching_time", c.getString(c.getColumnIndex("punching_time")));
aux.put("picture", c.getString(c.getColumnIndex("image_str"))); // stores base64encoded picture
} catch (Exception e) {
e.printStackTrace();
}
array.put(aux); // Inserting individual objects into the array , works perfectly fine,no error here
c.moveToNext(); // Moving the cursor to the next record
}
Log.d("Log", "length of json array - "+array.length()); // shows me the total no of JSONObjects in the JSONArray,works fine no error
// HAD GOT OOM HERE
//Log.d("Log", "JSONArray is - " + array.toString());
if (array.length() != 0){
try {
String responseCode = writeToFile(array); //Writing the JSONArray value to file,which will then send file to server.
if(responseCode.equals("200"))
Log.d("Log","Data sent successfully from app to app server");
else
Log.d("Log","Data NOT sent successfully from app to app server");
} catch (Exception e) {
e.printStackTrace();
}
}
.
.
private String writeToFile(JSONArray data) {
Log.d("Log", "Inside writeToFile");
File externalStorageDir = new File(Environment.getExternalStorageDirectory().getPath(), "Pictures/File");
if (!externalStorageDir.exists()) {
externalStorageDir.mkdirs();
}
String responseCode = "";
File dataFile = new File(externalStorageDir, "File");
/* FileWriter writer;
String responseCode = "";
try {
writer = new FileWriter(dataFile);
writer.append(data);
writer.flush();
writer.close();
responseCode = sendFileToServer(dataFile.getPath(), AppConstants.url_app_server); // Sends the file to server,worked fine for few pictures
} catch (IOException e) {
e.printStackTrace();
}*/
try {
FileWriter file = new FileWriter("storage/sdcard0/Pictures/File/File");
file.write(data.toString()); // GOT OOM here.
file.flush();
file.close();
Log.d("Log","data written from JSONArray to file");
responseCode = sendFileToServer(dataFile.getPath(), AppConstants.url_app_server); // Sends the file to server,worked fine for few pictures
} catch (IOException e) {
e.printStackTrace();
}
return responseCode;
}
public String sendFileToServer(String filename, String targetUrl) {
.
.
// Sends the file to server,worked fine for few pictures
.
.
return response;
}
Here's the issue. You're trying to load your entire dataset into memory. And you're running out of memory.
Android's JSON classes (and some other JSON libraries) are designed to take a Java object (in memory), serialize it to a parse tree of objects (e.g. JSONObject, JSONArray) (in memory), then convert that tree to a String (in memory) and write it out somewhere.
Specifically in your case (at the moment) it appears what when it converts the parse tree into a String it runs out of memory; That String is effectively doubling the amount of memory required at that point.
To solve your issue you have a few different choices, I'll offer 3:
Don't use JSON at all. Refactor to simply send files and information to your server.
Refactor things so that you only read X images into memory at a time and have multiple output files. Where X is some number of images. Note this is still problematic if your image sizes vary greatly / aren't predictable.
Switch to using Jackson as a JSON library. It supports streaming operations where you can stream the JSON to the output file as you create each object in the array.
Edit to add: for your code, it would look something like this using Jackson:
// Before you get here, have created your `File` object
JsonFactory jsonfactory = new JsonFactory();
JsonGenerator jsonGenerator =
jsonfactory.createJsonGenerator(file, JsonEncoding.UTF8);
jsonGenerator.writeStartArray();
// Note: I don't know what `c` is, but if it's a cursor of some sort it
// should have a "hasNext()" or similar you should be using instead of
// this for loop
for (int i = 0; i < count; i++) {
jsonGenerator.writeStartObject();
jsonGenerator.writeStringField("pid", c.getString(c.getColumnIndex("pid")));
jsonGenerator.writeStringField("status", c.getString(c.getColumnIndex("status")));
jsonGenerator.writeStringField("pop_time", c.getString(c.getColumnIndex("pop_time")));
jsonGenerator.writeStringField("punching_time", c.getString(c.getColumnIndex("punching_time")));
// stores base64encoded picture
jsonGenerator.writeStringField("picture", c.getString(c.getColumnIndex("image_str")));
jsonGenerator.writeEndObject();
c.moveToNext(); // Moving the cursor to the next record
}
jsonGenerator.writeEndArray();
jsonGenerator.close();
The above is untested, but it should work (or at least get you going in the right direction).
First and foremost.Thanks a Billion to Brian Roach for assisting me.His inputs helped me solve the problem.I am sharing my answer.
What was I trying to solve? - In my project I had some user data(name,age,picture_time) and some corresponding pictures for each of the user data.At the EOD I needed to sync all this data to the app server.However when I tried to sync a lot of pictures(say 50 of 50kb approx) I faced an OOM(Out of Memory) issue.Initially, I was trying to upload all the data using a conventional JSONArray approach,however soon I found that I was hitting OOM.This, I attribute to the heap getting full when I was trying to access the JSONArray(which had loads of values and why not ?, afterall I was encoding the pics by base64encoding,which trust me has a hell lot of string data in it !)
Inputs from Brian suggested that I write all my data into a file one by one.So,after the whole process is complete I get one single file that has all the data(name,age,picture_time,base64encoded pictures etc) in it,and then I stream this file to the server.
Following is the code snippet which takes the user data from app database,corresponding pictures from sd card,loops through all the records,creates a JSONArray of JSONObjects using Jackson Json Library(which you need to include in your libs folder,should you use this code) and stores them into a file.This file is then streamed to the server(this snippet not included).Hope this helps someone!
// Sync the values in DB to the server
Log.d("SyncData", "Opening db to read files");
SQLiteDatabase db = context.openOrCreateDatabase("data_monitor", Context.MODE_PRIVATE, null);
db.execSQL("CREATE TABLE IF NOT EXISTS user_data(device_id VARCHAR,name VARCHAR,age VARCHAR,picture_time VARCHAR);");
Cursor c = db.rawQuery("SELECT * FROM user_data", null);
int count = c.getCount();
if (count > 0) {
File file = new File(Environment.getExternalStorageDirectory().getPath(), "Pictures/UserFile/UserFile");
JsonFactory jsonfactory = new JsonFactory();
JsonGenerator jsonGenerator = null;
try {
jsonGenerator = jsonfactory.createJsonGenerator(file, JsonEncoding.UTF8);
jsonGenerator.writeStartObject();
jsonGenerator.writeArrayFieldStart("user_data"); //Name for the JSONArray
} catch (IOException e3) {
e3.printStackTrace();
}
c.moveToFirst();
// Looping through each record in the cursor
for (int i = 0; i < count; i++) {
try {
jsonGenerator.writeStartObject(); //Start of inner object '{'
jsonGenerator.writeStringField("device_id", c.getString(c.getColumnIndex("device_id")));
jsonGenerator.writeStringField("name", c.getString(c.getColumnIndex("name")));
jsonGenerator.writeStringField("age", c.getString(c.getColumnIndex("age")));
jsonGenerator.writeStringField("picture_time", c.getString(c.getColumnIndex("picture_time")));
} catch (Exception e) {
e.printStackTrace();
}
// creating a fourth column for the input of corresponding image from the sd card
Log.d("SyncData", "Name of image - " + c.getString(c.getColumnIndex("picture_time")));
image = c.getString(c.getColumnIndex("picture_time")).replaceAll("[^\\d]", ""); //Removing everything except digits
Log.d("SyncData", "imagename - " + image);
File f = new File(Environment.getExternalStorageDirectory().getPath(), "Pictures/UserPic/" + image + ".jpg");
Log.d("SyncData", "------------size of " + image + ".jpg" + "= " + f.length());
String image_str;
if (!f.exists() || f.length() == 0) {
Log.d("SyncData", "Image has either size of 0 or does not exist");
try {
jsonGenerator.writeStringField("picture", "Error Loading Image");
} catch (Exception e) {
e.printStackTrace();
}
} else {
try {
// Reusing bitmaps to avoid Out Of Memory
Log.d("SyncData", "Image exists,encoding underway...");
if (bitmap_reuse == 0) { //ps : bitmap reuse was initialized to 0 at the start of the code,not included in this snippet
// Create bitmap to be re-used, based on the size of one of the bitmaps
mBitmapOptions = new BitmapFactory.Options();
mBitmapOptions.inJustDecodeBounds = true;
BitmapFactory.decodeFile(f.getPath(), mBitmapOptions);
mCurrentBitmap = Bitmap.createBitmap(mBitmapOptions.outWidth, mBitmapOptions.outHeight, Bitmap.Config.ARGB_8888);
mBitmapOptions.inJustDecodeBounds = false;
mBitmapOptions.inBitmap = mCurrentBitmap;
mBitmapOptions.inSampleSize = 1;
BitmapFactory.decodeFile(f.getPath(), mBitmapOptions);
bitmap_reuse = 1;
}
BitmapFactory.Options bitmapOptions = null;
// Re-use the bitmap by using BitmapOptions.inBitmap
bitmapOptions = mBitmapOptions;
bitmapOptions.inBitmap = mCurrentBitmap;
mCurrentBitmap = BitmapFactory.decodeFile(f.getPath(), mBitmapOptions);
if (mCurrentBitmap != null) {
ByteArrayOutputStream stream = new ByteArrayOutputStream();
try {
mCurrentBitmap.compress(Bitmap.CompressFormat.JPEG, 35, stream);
Log.d("SyncData", "------------size of " + "bitmap_compress" + "= " + mCurrentBitmap.getByteCount());
} catch (Exception e) {
e.printStackTrace();
}
byte[] byte_arr = stream.toByteArray();
Log.d("SyncData", "------------size of " + "image_str" + "= " + byte_arr.length);
stream.close();
stream = null;
image_str = Base64.encodeToString(byte_arr, Base64.DEFAULT);
jsonGenerator.writeStringField("picture", image_str);
}
} catch (Exception e1) {
e1.printStackTrace();
}
}
try {
jsonGenerator.writeEndObject(); //End of inner object '}'
} catch (JsonGenerationException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
c.moveToNext(); // Moving the cursor to the next record
}
try {
jsonGenerator.writeEndArray(); //close the array ']'
//jsonGenerator.writeStringField("file_size", "0"); // If need be, place another object here.
jsonGenerator.writeEndObject();
jsonGenerator.flush();
jsonGenerator.close();
} catch (JsonGenerationException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
c.close();
db.close();
}

Insert into Star Schema efficiently using JDBC

I have star schema model in which Server Table contains information about server name. Information Table contains information that I want for specific server. And Actual Data Table contains information about which server contains which information.
Server Table
Information Table
Actual Data Table
Now the problem that I am having is- I am trying to insert Data into the Data Table using JDBC. But I am unsure how should I add data into Actual Data Table in star schema model. Should I connect to database and insert it every time for each information or there is any direct way we can do that by communicating to database only one time. This is my code where I am getting all the information for each server. And IndexData is the class where I insert values into Oracle Database.
public void fetchlog() {
InputStream is = null;
InputStream isUrl = null;
FileOutputStream fos = null;
try {
is = HttpUtil.getFile(monitorUrl);
if(monitorUrl.contains("stats.jsp") || monitorUrl.contains("monitor.jsp")) {
trimUrl = monitorUrl.replaceAll("(?<=/)(stats|monitor).jsp$", "ping");
}
isUrl = HttpUtil.getFile(trimUrl);
BufferedReader in = new BufferedReader (new InputStreamReader (is));
String line;
int i=0,j=0,k=0;
while ((line = in.readLine()) != null) {
if(line.contains("numDocs")) {
docs = in.readLine().trim();
//So should I keep on inserting into Database for each information, like this
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, docs);
} else if(line.contains("indexSize")) {
indexSize = in.readLine().trim();
//For this information-- the same way?
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, indexSize);
} else if(line.contains("cumulative_lookups")) {
cacheHits= in.readLine().trim();
//For this information too-- the same way?
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, cacheHits);
} else if(line.contains("lastCommitTime")) {
lastCommitTime = in.readLine().trim();
//For this information too-- the same way?
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, lastCommitTime );
}
BufferedReader inUrl = new BufferedReader (new InputStreamReader (isUrl));
String lineUrl;
Pattern regex = Pattern.compile("<str name=\"status\">(.*?)</str>");
while ((lineUrl = inUrl.readLine()) != null) {
System.out.println(lineUrl);
if(lineUrl.contains("str name=\"status\"")) {
Matcher regexMatcher = regex.matcher(lineUrl);
if (regexMatcher.find()) {
upDown= regexMatcher.group(1);
//For this information too-- the same way?
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, upDown);
}
System.out.println("Status:- " + status);
}
}
//Or is there some other way we can insert directly into database by communicating with database only one time not multiple times for each information.
//IndexData id = new IndexData(timeStamp, ServerName, InformationName, Value);
fos = new FileOutputStream(buildTargetPath());
IOUtils.copy(is, fos);
} catch (FileNotFoundException e) {
log.error("File Exception in fetching monitor logs :" + e);
} catch (IOException e) {
log.error("Exception in fetching monitor logs :" + e);
}
}
I hope question is clear to everyone. Any suggestions will be appreciated.
There are two things I would suggest you look at. First, use a batch insert to perform all of the associated inserts in one JDBC transaction. For more information:
JDBC Batch Insert Example
I would also strongly recommend that you use a JDBC connection pooling library. We use c3p0 with our Postgres database. You can find more information here:
c3p0 Project Page
The basic idea would is to create a connection pool at startup time, then create JDBC batches for each set of related inserts.

Categories

Resources