Handling large data files [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Lets say I have a very big file having data. A parser parses it and keeps the data in the following class:
Class Data{
DataHeader header;
List<DataLine> lines;
...
}
Before persisting this data in the DB, I do some validations and processing on it. And I persist it only if there are no errors in it.
Data file looks like:
DATAHEADER|.......
DATALINE|1|....
DATALINE|2|....
... and so on
To process this file in a limited JVM memory, it should be processed in batches and also made sure that it is persisted only if it does not have any errors. Appreciate your help on designing the solution.

With big files, you can't always load everything into memory. You sometime have to create a temp table to store the information.
Read a few lines and store them in a list
Check if lines, making sure the data is correct
If it's good, store the line in the temp table in the database
If it's bad, delete the data in the temp table and stop the process with an error
When the file has been loaded in the temp table
Do you global check (try to do them in the database, don't fetch everything back in the application)
If it's good, copy the data from the temp table into the live table. Delete the temp table
If it's bad, delete the data in the temp table and stop the process with an error

Related

How to keep user input even after rerunning the program (JAVA) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
This post was edited and submitted for review 9 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I only started coding about a month ago and I am trying to add a code or program to my work in order to make it store user input and even after rerunning the program, the input will still be kept. Confused? For example, when you make an account on Facebook, they will keep your email and password so next time you just have to login without making the account again. Something like a database to store value I am guessing?
When a program terminates, it looses all the values stored in variables.
This is because we have a GC (Garbage Collection) mechanism which helps to clear up data which are not used.
In your case, to persist data post restart, we need to store to some external store (persistent storage) rather than keeping in the memory.
Your external store can be a database or can be a simple file.
Create a file and store your values in it. Once you restart the program, read the values again from it.
You would require a storage to store the data, as rerunning will erase all previous data. Any filesystem/db can be used to resolve your issue.

Fastest way to insert a lot of rows in Oracle DB with Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am parsing a text to get all the words (around 6000/7000 words) and I do this on multiple files. I want to insert each word in my Oracle Database and if it already exists, it adds one to a counter. What I have done is that from my java application I call an Oracle procedure (with my word), the procedure checks if the word exists or not then insert or update the record.
The problem is that it is taking around 5 minutes for each file, is there an other way to do what I want so it is faster ?
Why not create an in memory map of the counts in java and process all records in-memory.
So if there were 50 instances of word A your map will end up with A:50.
You can then use the
merge
oracle upsert command to either update the database records by adding the in-memory value or insert a new row with the sum. Use oracle bulk operations to do the merges.

Android speed of ArrayList vs Database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have to store 500-1000 values in somewhere, but i cant decide which one is more efficient(ArrayList vs Database).Can you explain pros and cons of arraylist and database?
thanks...
In memory data structures are always faster than DB.
But tradeoff is between size of data structure and available memory. For your need, arraylist will be faster. But data will be gone once application is stopped or killed.
Database is persistent data store. If your need is to store temp data then arraylist is suitable and if you need to store it permanently then you have 2 options:
Database (standard way, API availalbe, standard practice)
Filesystem (keep your data in data structure till application stops and then write it to a file in encrypted form if security is required.)
If you need, explore about in memory databases for android(sqlite). This is best suitable for you.
the use of array list and the data base is different.
if u doesn't want to store your record permanently then you can use "Array List", but if u want to store your record permanently then u need to use data base.
because when you store the record in Array list those record stored into RAM. records don't store into the secondary storage. but when you store the record in to the data base then records stores into the database.
if your concern about the speed of accessing the data from Array List or data base then arrayList is faster then the database

Which one is faster to search an item? Hitting DB or iterating list of values fetched from Db [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In my web application i have a employee table with employee id,name,designation salary... as attributes which may contain thousands of records in it.. I want to search employee name, so for searching employeename which one will work faster hitting DB every time or create list of employee names once in java bean and iterating it for searching every time... Which one is better..??
By far, even if you have millions of records, it is better to hit the database per request. To enhance this, you can add a key/index on your name field in your employee table and the requests will be faster.
In case the data in your employee table doesn't vary that much, you have another option which is using a cache for your employee table. With this, your access to the data will be even faster since it will look up the employee at cache (usually RAM), but this comes with a more complex design and adding policies for your cache data retrieval and setting periods to update the cache data.
This is depends in few things..
Hitting the DB is IO action and in case you have specific screen/process that does a lot in specific flow of course it will be better to load list from DB once and use it several times, And this is in case that you can be sure that employees list won't be change in DB by other process/Or it can change and this is not critical for you..
If the screen/process make only few hits to get employees it should be hitting DB.
Remember that Hitting DB a lot of time can also load the DB and cause him to be slow.. He can't handle with infinite number of request.
Hope that helps

how to properly insert data to database from txt file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
simple question: any ideas how this should be properly done?. I have 3 txt files with lots of information, I created a class that will be in charge of reading the data from the txt files and returning the data as a list of DTO components (Yes, the information can be bundle as such logic unit), depending on the txt file, after that the client will use a DAO and will use such a list and insert the data to a local database (sqlite). My concern is that having such a List could be memory demanding, should I avoid using such list and somehow insert this data using the dao object directly without bundling the data into a dto and finally such list?
You are asking a good question and partially answering it yourself.
Yes, sure if you really have a lot of information you should not read all information from file and then store it in DB. You should read information chunk-by chunk or even if it is possible (from application point of view) line-by-line and store each line is DB.
In this case you will need memory for one line only at any time.
You can design the application as following.
File parser that returns Iterable<Row>
DB writer that accepts Iterable<Row> and stores rows in DB,
Manager that calls both.
In this case the logic responsible on reading and writing file will be encapsulated into certain modules and no extra memory consumption will be required.
do not return list, but an iterator like in this example: Iterating over the content of a text file line by line - is there a best practice? (vs. PMD's AssignmentInOperand)
you have to modify this iterator to return your DTO instead of String:
for(MyDTO line : new BufferedReaderIterator(br)){
// do some work
}
Now you will iterate over file line by line, but you will return DTOs instead of returning lines. Such solution has small memory impact.

Categories

Resources