How to insert data as fast as possible with Hibernate - java

I read file and create a Object from it and store to postgresql database. My file has 100,000 document that I read from one file and split it and finally store to database.
I can't create List<> and store all document in List<> because my RAM is little. My code to read and write to database are as below. But My JVM Heap fills and can not continue to store more document. How to read file and store to database efficiently.
public void readFile() {
StringBuilder wholeDocument = new StringBuilder();
try {
bufferedReader = new BufferedReader(new FileReader(files));
String line;
int count = 0;
while ((line = bufferedReader.readLine()) != null) {
if (line.contains("<page>")) {
wholeDocument.append(line);
while ((line = bufferedReader.readLine()) != null) {
wholeDocument = wholeDocument.append("\n" + line);
if (line.contains("</page>")) {
System.out.println(count++);
addBodyToDatabase(wholeDocument.toString());
wholeDocument.setLength(0);
break;
}
}
}
}
wikiParser.commit();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public void addBodyToDatabase(String wholeContent) {
Page page = new Page(new Timestamp(System.currentTimeMillis()),
wholeContent);
database.addPageToDatabase(page);
}
public static int counter = 1;
public void addPageToDatabase(Page page) {
session.save(page);
if (counter % 3000 == 0) {
commit();
}
counter++;
}

First of all you should apply a fork-join approach here.
The main task parses the file and sends batches of at most 100 items to an ExecutorService. The ExecutorService should have a number of worker threads that equals the number of available database connections. If you have 4 CPU cores, let's say that the database can take 8 concurrent connections without doing to much context switching.
You should then configure a connection pooling DataSource and have a minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for connection pooling.
Then you need to configure JDBC batching. If you're using MySQL, the IDENTITY generator will disable bathing. If you're using a database that supports sequences, make sure you also use the enhanced identifier generators (they are the default option in Hibernate 5.x).
This way the entity insert process is parallelized and decoupled from the main parsing thread. The main thread should wait for the ExecutorService to finish processing all tasks prior to shutting down.

Actually it is hard to suggest to you without doing real profiling and find out what's making your code slow or inefficient.
However there are several things we can see from your code
You are using StringBuilder inefficiently
wholeDocument.append("\n" + line); should be wrote as wholeDocument.append("\n").append(line); instead
Because what you original wrote will be translated by compiler to
whileDocument.append(new StringBuilder("\n").append(line).toString()). You can see how much unnecessary StringBuilders you have created :)
Consideration in using Hibernate
I am not sure how you manage your session or how you implemented your commit(), I assume you have done it right, there are still more thing to consider:
Have you properly set up batch size in Hibernate? (hibernate.jdbc.batch_size) By default, the JDBC batch size is something around 5. You may want to make sure you set it in bigger size (so that internally Hibernate will send inserts in a bigger batch).
Given that you do not need the entities in 1st level cache for later use, you may want to do intermittent session flush() + clear() to
Trigger batch inserts mentioned in previous point
clear out first level cache
Switch away from Hibernate for this feature.
Hibernate is cool but it is not panacea for everything. Given that in this feature you are just saving records into DB based on text file content. Neither you do need any entity behavior, nor you need to make use of first level cache for later processing, there is not much reason to make use of Hibernate here given the extra processing and space overhead. Simply doing JDBC with manual batch handling is going to save you a lot of trouble .

I use #RookieGuy answer.
stackoverflow.com/questions/14581865/hibernate-commit-and-flush
I use
session.flush();
session.clear();
and finally after read all documents and store them into database
tx.commit();
session.close();
and change
wholeDocument = wholeDocument.append("\n" + line);
to
wholeDocument.append("\n" + line);

I'm not very much sure about the structure of your data file.It will be easy to understand, if you could provide a sample of your file.
The root cause of the memory consumption is the way of reading/iterating the file. Once something get read, stays in memory. You should rather use either java.io.FileInputStream or org.apache.commons.io.FileUtils.
Here is a sample code to iterate with java.io.FileInputStream
try (
FileInputStream inputStream = new FileInputStream("/tmp/sample.txt");
Scanner sc = new Scanner(inputStream, "UTF-8")
) {
while (sc.hasNextLine()) {
String line = sc.nextLine();
addBodyToDatabase(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Here is a sample code to iterate with org.apache.commons.io.FileUtils
File file = new File("/tmp/sample.txt");
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
addBodyToDatabase(line);
}
} finally {
LineIterator.closeQuietly(it);
}

You should begin a transaction, do the save operation and commit a transaction. (Don't begin a transaction after save!). You can try to use StatelessSession to exclude memory consumption by a cache.
And use more less value, for an example 20, in this code
if (counter % 20 == 0)
You can try to pass StringBuilder as a method's argument as far as possible.

Related

Editing a file using async threads in Java

I'm a small java developer currently working on a discord bot that I made in Java. one of the features of my bot is to simply have a leveling system whenever anyone sends a message (and other conditions but this is irrelevant for the problem I'm encountering).
Whenever someone sends a message an event is fired and a thread is created to compute how much exp the user should gain. and eventually, the function to edit the storage file is called.
which works fine when called sparsely. but if two threads try to write on the file at once, the file usually gets deleted or truncated. either of these two cases being undesired behavior
I then tried to make a queuing system that worked for over 24h but still failed once so it is more stable in a way. I only know the basics of how threads work so I may've skipped over an important thing that causes the problem
the function looks like this
Thread editingThread = null;
public boolean editThreadStarted = false;
HashMap<String, String> queue = new HashMap<>();
public final boolean editParameter(String key, String value) {
queue.put(key, value);
if(!editThreadStarted) {
editingThread = new Thread(new Runnable() {
#Override
public void run() {
while(queue.keySet().size() > 0) {
String key = (String) queue.keySet().toArray()[0];
String value = queue.get(key);
File inputFile = getFile();
File tempFile = new File(getFile().getName() + ".temp");
try {
tempFile.createNewFile();
} catch (IOException e) {
DemiConsole.error("Failed to create temp file");
handleTrace(e);
continue;
}
//System.out.println("tempFile.isFile = " + tempFile.isFile());
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile)); BufferedWriter writer = new BufferedWriter(new FileWriter(tempFile))){
String currentLine;
while((currentLine = reader.readLine()) != null) {
String trimmedLine = currentLine.trim();
if(trimmedLine.startsWith(key)) {
writer.write(key + ":" + value + System.getProperty("line.separator"));
continue;
}
writer.write(currentLine + System.getProperty("line.separator"));
}
writer.close();
reader.close();
inputFile.delete();
tempFile.renameTo(inputFile);
} catch (IOException e) {
DemiConsole.error("Caught an IO exception while attempting to edit parameter ("+key+") in file ("+getFile().getName()+"), returning false");
handleTrace(e);
continue;
}
queue.remove(key);
}
editThreadStarted = false;
}
});
editThreadStarted = true;
editingThread.start();
}
return true;
}
getFile() returns the file the function is meant to write to
the file format is
memberid1:expamount
memberid2:expamount
memberid3:expamount
memberid4:expamount
the way the editing works is by creating a temporary file to which i will write all of the original file's data line by line, checking if the memberid matches with what i want to edit, if it does, then instead of writing the original file's line, i will write the new edited line with the new expamount instead, before continuing on with the rest of the lines. Once that is done, the original file is deleted and the temporary file is renamed to the original file, replacing it.
This function will always be called asynchronously so making the whole thing synchronous is not an option.
Thanks in advance
Edit(1) :
I've been suggested to use semaphores and after digging a little into it (i never heard of semaphores before) it seems to be a really good option and would remove the need for a queue, simply aquire in the beginning and release at the end, nothing more required!
I ended up using semaphores as per user207421's suggestions and it seems to work perfectly
I simply put delays between each line write to artificially make the task longer and make it easier to have multiple threads trying to write at once, and they all wait for their turns!
Thanks

is it ok to use Parallel stream to FileWriter?

I want to write a Stream to file. However, the Stream is big (few Gb when write to file) so I want to use parallel. At the end of process, I would like to write to file (I am using FileWriter)
I would like to ask if that has potential cause any problem in file.
Here is some code
function to write stream to file
public static void writeStreamToFile(Stream<String> ss, String fileURI) {
try (FileWriter wr = new FileWriter(fileURI)) {
ss.forEach(line -> {
try {
if (line != null) {
wr.write(line + "\n");
}
} catch (Exception ex) {
System.err.println("error when write file");
}
});
} catch (IOException ex) {
Logger.getLogger(OaStreamer.class.getName()).log(Level.SEVERE, null, ex);
}
}
how I use my stream
Stream<String> ss = Files.lines(path).parallel()
.map(x->dosomething(x))
.map(x->dosomethingagain(x))
writeStreamToFile(ss, "path/to/output.csv")
As others have mentioned, this approach should work, however you should question if it is the best method. Writing to a file is a shared operation between threads meaning you are introducing thread contention.
While it is easy to think that having multiple threads will speed up performance, in the case of I/O operations the opposite is true. Remember I/O operations are finitely bounded, so more threads will not increase performance. In fact, this I/O contention will slow down access to the shared resource because of the constant locking/unlocking of the ability to write to the resource.
The bottom line is that only one thread can write to a file at a time, so parallelizing write operations is counterproductive.
Consider using multiple threads to handle your CPU intensive tasks, and then having all threads post to a queue/buffer. A single thread can then pull from the queue and write to your file. This solution (and more detail) was suggested in this answer.
Checkout this article for more info on thread contention and locks.
Yes It is Ok to use FileWriter as you are using, I have some another ways which may be helpful to you.
As you are dealing with large files, FileChannel can be faster than standard IO. The following code write String to a file using FileChannel:
#Test
public void givenWritingToFile_whenUsingFileChannel_thenCorrect()
throws IOException {
RandomAccessFile stream = new RandomAccessFile(fileName, "rw");
FileChannel channel = stream.getChannel();
String value = "Hello";
byte[] strBytes = value.getBytes();
ByteBuffer buffer = ByteBuffer.allocate(strBytes.length);
buffer.put(strBytes);
buffer.flip();
channel.write(buffer);
stream.close();
channel.close();
// verify
RandomAccessFile reader = new RandomAccessFile(fileName, "r");
assertEquals(value, reader.readLine());
reader.close();
}
Reference : https://www.baeldung.com/java-write-to-file
You can use Files.write with stream operations as below which converts the Stream to the Iterable:
Files.write(Paths.get(filepath), (Iterable<String>)yourstream::iterator);
For example:
Files.write(Paths.get("/dir1/dir2/file.txt"),
(Iterable<String>)IntStream.range(0, 1000).mapToObj(String::valueOf)::iterator);
If you have stream of some custom objects, you can always add the .map(Object::toString) step to apply the toString() method.
Heading
It is not a problem in case it is okay for the file to have the lines in random order. You are reading content in parallel, not in sequence. Therefore you have no guarantees at which point any line is coming in for processing.
That is only thing to keep in mind here.

Accessing Bitcoin BlockChain Transactions with Bitcoinj

I'm trying to access the transactions contained in the blocks I have downloaded but none of the blocks have any transactions; the size of every Transaction list returned is zero. Am I conceptually misunderstanding something about the bitcoin blockchain or is there something wrong with my code?
static NetworkParameters params = MainNetParams.get();
static WalletAppKit kit = new WalletAppKit(params, new java.io.File("."), "chain");
/* store_TX() gets Transactions from blocks and stores them in a file */
static protected void store_TX() throws BlockStoreException, FileNotFoundException, UnsupportedEncodingException{
File txf = new File("TX.txt");
PrintWriter hwriter = new PrintWriter("TX.txt", "UTF-8");
BlockChain chain = kit.chain();
BlockStore block_store = chain.getBlockStore();
StoredBlock stored_block = block_store.getChainHead();
// if stored_block.prev() returns null then break otherwise get block transactions
while (stored_block!=null){
Block block = stored_block.getHeader();
List<Transaction> tx_list = block.getTransactions();
if (tx_list != null && tx_list.size() > 0){
hwriter.println(block.getHashAsString());
}
stored_block = stored_block.getPrev(block_store);
}
hwriter.close();
}
public static void main(String[] args){
BriefLogFormatter.init();
synchronized(kit.startAndWait()){
try {
store_TX();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (BlockStoreException e) {
e.printStackTrace();
}
}
} //end main
you need to use FullPrunedBlockChain, the blockchain only supports SPV.
See https://bitcoinj.github.io/full-verification
It depends on how you downloaded those Blocks. If you downloaded them for example via the BlocksDownloadedEventListener then you only received the Blockheaders which do not contain the Transactions. If you want to get the Transactions too you can use Peer.getBlock(blockHash) to request a download of the full Block from that Peer, which will also contain the Transactions and information related to them. (i.e. Blockreward)
Also you would need to use another type of BlockStore for persisting your Blocks as the SPVBlockstore (which is Standard for WalletAppKit) only saves Blockheaders (so no Transactions).
You can find all types of Blockstores here, so you can choose what suits you best, but always read the description on what they are saving, for not running into that problem again.

Confused about reading data from Registry

I am trying to get some data from registry .The problem is while loop ,because when the application is running in debug mode I can view the value of a line variable .But end of the loop line variable is assigned null
private final String DESKTOP_PATH="\"HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\"
+ "CurrentVersion\\Explorer\\Shell Folders\" /v Desktop";
private final String REG="REG_SZ";
private final String EXACUTE_STR="reg query "+DESKTOP_PATH;
private String getDesktopPath() throws IOException {
Process p = null;
String line = null;
try {
p = Runtime.getRuntime().exec(EXACUTE_STR);
p.waitFor();
InputStream stream=p.getInputStream();
BufferedReader reader=new BufferedReader(new InputStreamReader(stream));
while((line=reader.readLine())!=null){
line+=reader.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return line;
Consider using jna the Java Native Access library which provides Registry handling utilities.
The reason that I suggest a different toolkit instead of simply solving your problem has to do with a few items. When launching an external executable, you have to handle input and output in a buffered manner, and there are two streams of output (normal and error).
In short, typically all of these input and output streams might need to be available, but you won't typically notice failures when they are not present in the typical development environment. Later, in production environments (headless, console-less, etc) these problems become apparent.
To solve these problems with a CLI call, you typically then set up Buffer collectors to capture the output in independent threads, and sometimes you need to stand up a fake buffer provider (some programs check that input is readable, even if they don't read any input!). The JNA library uses JNI, which greatly reduces the issues by side-stepping the CMD shell that wraps your executable call.
However, if you only wanted to know about the logic error in your code, JimW did a good job explaining it.
You are replacing the contents of line with what you read from reader.readLine(), when that finally returns null you return line which is of course null.
Instead create a StringBuilder before starting the loop and append to that.
StringBuilder buffer = new StringBuilder();
while ((line = reader.readLine())!=null) {
buffer.append(line);
}
// buffer.toString() is the String you are looking for
You could also .trim() inside the append if you want to remove end lines.

How to get and write data in text file Like Database

I'm trying to develop a small program. I want to write and get data from a text file like if it were a database. I have some data ex:
User = abc,
Age = 12,
No = 154
I want to write that data in the text file and after I want so search data using User. I don't know how to that. Can anyone tell how to do this.
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new FileWriter("./output.txt"));
writer.write("your data here");
} catch (IOException e) {
System.err.println(e);
} finally {
if (writer != null) {
try {
writer.close();
} catch (IOException e) {
System.err.println(e);
}
}
}
May I know why do you want this..?? Because as the request for read and write will Increase your code will reach the bottle neck. You want to perform heavy I/O operations for getting the lite data. Disk I/O is heaving is its own concurrent read restrictions. So I will not suggest you to use such approach for getting the lite data. You may put some heavy data like Images/ Videos/ Songs etc in files using some unique ID that will be a good approach but like this I will nor prefer you.. But at all you want to do this than go for property files which works on key and value. Put values token separated and split at the time of consumption.

Categories

Resources