Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
simple question: any ideas how this should be properly done?. I have 3 txt files with lots of information, I created a class that will be in charge of reading the data from the txt files and returning the data as a list of DTO components (Yes, the information can be bundle as such logic unit), depending on the txt file, after that the client will use a DAO and will use such a list and insert the data to a local database (sqlite). My concern is that having such a List could be memory demanding, should I avoid using such list and somehow insert this data using the dao object directly without bundling the data into a dto and finally such list?
You are asking a good question and partially answering it yourself.
Yes, sure if you really have a lot of information you should not read all information from file and then store it in DB. You should read information chunk-by chunk or even if it is possible (from application point of view) line-by-line and store each line is DB.
In this case you will need memory for one line only at any time.
You can design the application as following.
File parser that returns Iterable<Row>
DB writer that accepts Iterable<Row> and stores rows in DB,
Manager that calls both.
In this case the logic responsible on reading and writing file will be encapsulated into certain modules and no extra memory consumption will be required.
do not return list, but an iterator like in this example: Iterating over the content of a text file line by line - is there a best practice? (vs. PMD's AssignmentInOperand)
you have to modify this iterator to return your DTO instead of String:
for(MyDTO line : new BufferedReaderIterator(br)){
// do some work
}
Now you will iterate over file line by line, but you will return DTOs instead of returning lines. Such solution has small memory impact.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am implementing a spring batch program, my scenario is
I am having an file called A which is having an unique keyfield called RollNumber
The rollNumber is in sorted order(ASC)
I wanted to get the row which is having rollNumber as:101
Is it possible any search algorithm can be implemented on this.
I am able to read the file using Itemreader and find the row but the problem is i am having 1 million Record to process,so the time complexity is very high.
i)Having linear search takes more time,since the file is having large volume of data
Because the records are ordered by key, the natural way of implementing such thing would be Binary Search but I'm not sure is it possible to do with Spring Batch.
The sense of the Spring Batch is to read whole file and process it somehow, not to "jump" in file and do some specific stuff. For such operation you should implement your own mechanism (however it can be implemented e.g. in the Tasklet instance if this operation must be performed in the scope of the Spring Batch process).
The fact is that FlatFileItemReader implementation of ItemReader includes method
public void setLinesToSkip(int linesToSkip) {
this.linesToSkip = linesToSkip;
}
but still - after skipping those line will not allow you to get back or skip another ones during this Spring Batch process
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a project where I will have a binary file. The file is split into multiple sections, each of which represents a list of primitive values. I need a solution where I can have a collection of objects, each of which represents a section of the file. These collections are then all held within a "file" object that represents the file as a whole.
Each collections object will need to provide sequential access to each value in the represented section of the file. What method would provide the fastest data retrieval without loading all the data into memory first?
Also it would be nice if two separate collections of the same "file" object could be accessed by two separate Threads, but this is not as important.
A good approach is to divide the solution into layers, here: one for the file i/o, mapping bytes to Java shorts and ints, another one for the abstraction of the file sections and the entire file.
java.nio's MappedByteBuffer provides a good interface between the "byte array" of a random access file and what you need for getting the Java typed data from that.
As Kayaman has mentioned, FileChannel.map() returns a MappedByteBuffer and you can navigate easily on that with its methods.
The implemention should make use of the OS feature for mapping memory pages to file pages, actually accessing on the file only what you really access in memory. (I've used this recently with Java 8 and Linux, and it performed well on files exceeding even the capacity of a single MappedByteBuffer.)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So I have this app, a Java servlet. It uses a dictionary object that reads words from a file specified as a constructor parameter on instantiation and then serves queries.
I can do basically the same on PHP, but it's my understanding the class will be instantiated on each and every request, and the file will be read again every time. In fact, I did it and it works, but it collapses my humble amazon EC2 micro instance at the ridiculous amount of 11 requests per second or more.
My question is: Shouldn't some kind of compiler/file system optimization be kicking in and making the performance impact insignificant when the file does not change at all?
If the answer is no, I guess my design is quite poor and I should try to improve it. In that case, my second question is: What would be the best approach to improve it?
Building a servlet-like service so the code is properly reused?
Using memcached to keep the words file content in memory?
Using a RDBMS instead of a plain text file and have my dictionary querying it?
(despite the dictionary being only a few KB of static data and despite having to perform some complex queries such as selecting a
(cryptographically safe) random word from those having a length
higher than some per-request user setting and such?)
Something else?
Your best bet is to generate a PHP file which contains the final structure of the dictionary in PHP code. You could then include() that cache file into your code or write a new one when the file changes. You should store it on the filesystem, no databases. You could cache it in memory as well. But I don't think this is really needed at this point.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to count the number of times multiple id's appears in a database table, then using them numbers to put into a JFreeChart.
Im currently unaware how to do this and cannot find code online to do this.
Well I'm certainly no expert but I recently did something similar in an application. You can hardcode the required SQL query strings into your application code and thus, retrieve the required data from your database. You will need a database connector for this. The connector you need depends on what language you are writing code in and what database you are using.
You will receive resultsets where the data from your database is returned as Strings. This includes data that was stored in your database in numeric format so you may need to cast the strings to another format if thats what you require. You feed the result set into a collection structure such as 'ArrayList' (if you are using Java for example). You said you are trying to count id's so you could use searching methods for whatever collection you use to tell you whether the strings in the collection are duplicates, you may need to use a a set (a collection which cant have duplicates) for comparison purposes. There's not that much detail in your question but this should help, just count the duplicates in your collection and keep track of the numbers.
At the end if this you will have a collection of numerical values so you simply feed this to a class which imports the required JFreeChart modules and uses the data to create a chart.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Lets say I have a very big file having data. A parser parses it and keeps the data in the following class:
Class Data{
DataHeader header;
List<DataLine> lines;
...
}
Before persisting this data in the DB, I do some validations and processing on it. And I persist it only if there are no errors in it.
Data file looks like:
DATAHEADER|.......
DATALINE|1|....
DATALINE|2|....
... and so on
To process this file in a limited JVM memory, it should be processed in batches and also made sure that it is persisted only if it does not have any errors. Appreciate your help on designing the solution.
With big files, you can't always load everything into memory. You sometime have to create a temp table to store the information.
Read a few lines and store them in a list
Check if lines, making sure the data is correct
If it's good, store the line in the temp table in the database
If it's bad, delete the data in the temp table and stop the process with an error
When the file has been loaded in the temp table
Do you global check (try to do them in the database, don't fetch everything back in the application)
If it's good, copy the data from the temp table into the live table. Delete the temp table
If it's bad, delete the data in the temp table and stop the process with an error