I'm pulling a List (ArrayList) of data that represents a single row from a database view. While each column normally has a single value, some columns have a delimited String value, such as this:
CompanyID CompanyName ContactIDs Contacts
49 Test Company 5;9 Alice;Bob
Currently, I'm pulling a sub-list of the first values, and then parsing the rest with String.split(), but I'm worried about performance, especially when I'm loading several hundred of these objects at a time. Here is my method:
public void loadFromData(List data) {
getCompany().load(data.subList(0, 2));
//Pulls 49 and Test Company and loads it into a Company object
getContacts().clear();
//getContacts() retrieves an ArrayList of Contact objects
String[] contactIds = ((String)data.get(2)).split(";");
String[] contactNames = ((String)data.get(3)).split(";");
List data = new ArrayList();
for (int i = 0; i < companyCategoryIds.length; i++) {
data.clear();
data.add(contactIds[i]);
data.add(getCompany().getCompanyId());
data.add(contactNames [i]);
getContacts().add(new Contact().load(data));
}
}
Is there a better way to go about doing this? Or is this probably the most efficient way to divvy up the List that I'm given?
Assuming that I cannot change the List itself, the joining via ';' is done server-side on the database before I get it.
Thanks in advance!
Well, String.split() is the most straightforward way, but it does use regex and it can be a bit slow if you do that a lot. However, since you're doing database access (which is a lot slower), I wouldn't worry about it just yet. Run a profiler to see whether you actually have a problem, before trying to get rid of String.split(). Optimization is not something you do just because you feel that something is slow.
Related
How to join list of millions of values into a single String by appending '\n' at end of each line -
Input data is in a List:
list[0] = And the good south wind still blew behind,
list[1] = But no sweet bird did follow,
list[2] = Nor any day for food or play
list[3] = Came to the mariners' hollo!
Below code joins the list into a string by appending new line character at the end -
String joinedStr = list.collect(Collectors.joining("\n", "{", "}"));
But, the problem is if the list has millions of data the joining fails. My guess is String object couldn't handle millions line due to large size.
Please give suggestion.
The problem with trying to compose a gigantic string is that you have to keep the entire thing in memory before you do anything further with it.
If the string is too big to fit in memory, you have only two options:
increase the available memory, or
avoid keeping a huge string in memory in the first place
This string is presumably destined for some further processing - maybe it's being written to a blob in a database, or maybe it is the body of an HTTP response. It's not being constructed just for fun.
It is probably much more preferable to write to some kind of stream (maybe an implementation of OutputStream) that can be read one character at a time. The consumer can optionally buffer based on the delimiter if they are aware of the context of what you're sending, or they can wait until they have the entire thing.
Preferably you would use something which supports back pressure so that you can pause writing if the consumer is too slow.
Exactly how this looks will depend on what you're trying to accomplish.
Maybe you can do it with a StringBuilder, which is designed specifically for handling large Strings. Here's how I'd do it:
StringBuilder sb = new StringBuilder();
for (String s : list) sb.append(s).append("\n");
return s.toString();
Haven't tested this code though, but it should work
We have a system that processes flat-file and (with a couple of validations only) inserts into database.
This code:
//There can be 8 million lines-of-codes
for(String line: lines){
if (!Class.isBranchNoValid(validBranchNoArr, obj.branchNo)){
continue;
}
list.add(line);
}
definition of isBranchNoValid:
//the array length ranges from 2 to 5 only
public static boolean isBranchNoValid(String[] validBranchNoArr, String branchNo) {
for (int i = 0; i < validBranchNoArr.length; i++) {
if (validBranchNoArr[i].equals(branchNo)) {
return true;
}
}
return false;
}
The validation is at line-level (we have to filter or skip the line that doesn't have a branchNo in the array). Earlier, this wasn't (filter) the case.
Now, high-performance degradation is troubling us.
I understand (may be, I am wrong) that this repeated function call is causing a lot of stack creation resulting in a very high GC invocation.
I can't figure out a way (is it even possible) to perform this filter without this high cost of performance degradation (a little difference is fine).
This is not a stack problem for sure, because your function is not recursive nothing is kept in the stack between calls; after each call the variables are erased since they are not needed anymore.
You can put the valid numbers in a set and use that one for some optimization but in your case I am not sure it will bring any benefits at all since you have at most 5 elements.
So there are several possible bottlenecks in your scenario.
reading the lines of the file
Parse the line to construct the object to insert into the database
check the applicability of the object (ie branch no filter)
insert into the db
Generally, you'd say IO is the slowest, so 1. and 2. You're saying nothing except 2. changed, right? That is weird.
Anyway, if you want to optimize that, I wouldn't be passing the array around 8 million times, and I wouldn't iterate it every time either. Since your valid branches are known, create a HashSet from it - it has O(1) access.
Set<String> validBranches = Arrays.stream(branches)
.collect(Collectors.toCollection(HashSet::new));
Then, iterate the lines
for (String line : lines) {
YourObject obj = parse(line);
if (validBranches.contains(obj.branchNo)) {
writeToDb(obj);
}
}
or, in the stream version
Files.lines(yourPath)
.map(this::parse)
.filter(o -> validBranches.contains(o.branchNo))
.forEach(this::writeToDb);
I'd also check if it isn't more efficient to first collect a batch of objects, then write to db. Also, it's possible that handling the lines in parallel gains some speed, in case the parsing is time intensive.
I have a query with a resultset of half a million records, with each record I'm creating an object and trying to add it into an ArrayList.
How can I optimize this operation to avoid memory issues as I'm getting out of heap space error.
This is a fragment o code :
while (rs.next()) {
lista.add(sd.loadSabanaDatos_ResumenLlamadaIntervalo(rs));
}
public SabanaDatos loadSabanaDatos_ResumenLlamadaIntervalo(ResultSet rs)
{
SabanaDatos sabanaDatos = new SabanaDatos();
try {
sabanaDatos.setId(rs.getInt("id"));
sabanaDatos.setHora(rs.getString("hora"));
sabanaDatos.setDuracion(rs.getInt("duracion"));
sabanaDatos.setNavegautenticado(rs.getInt("navegautenticado"));
sabanaDatos.setIndicadorasesor(rs.getInt("indicadorasesor"));
sabanaDatos.setLlamadaexitosa(rs.getInt("llamadaexitosa"));
sabanaDatos.setLlamadanoexitosa(rs.getInt("llamadanoexitosa"));
sabanaDatos.setTipocliente(rs.getString("tipocliente"));
} catch (SQLException e) {
logger.info("dip.sabana.SabanaDatos SQLException : "+ e);
e.printStackTrace();
}
return sabanaDatos;
}
NOTE: The reason of using list is that this is a critic system, and I just can make a call every 2 hours to the bd. I don't have permission to do more calls to the bd in short times, but I need to show data every 10 minutes. Example : first query 10 rows, I show 1 rows each minute after the sql query.
I dont't have permission to create local database, write file or other ... Just acces to memory.
First Of All - It is not a good practice to read half million objects
You can think of breaking down the number of records to be read into small chunks
As a solution to this you can think of following options
1 - use of CachedRowSetImpl - it is same resultSet , it is a bad practice to keep resultSet open (as it is a Database connection property) If you use ArrayList - then you are again performing operations and utilizing the memory
For more info on cachedRowSet you can go to
https://docs.oracle.com/javase/tutorial/jdbc/basics/cachedrowset.html
2 - you can think of using an In-Memory Database, such as HSQLDB or H2. They are very lightweight and fast, provide the JDBC interface you can run the SQL queries as well
For HSQLDB implementation you can check
https://www.tutorialspoint.com/hsqldb/
It might help to have Strings interned, have for two occurrences of the same string just one single object.
public class StringCache {
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache horaMap = new StringCache();
StringCache tipoclienteMap = new StringCache();
sabanaDatos.setHora(horaMap.cached(rs.getString("hora")));
sabanaDatos.setTipocliente(tipoclienteMap .cached(rs.getString("tipocliente")));
Increasing memory is already said.
A speed-up is possible by using column numbers; if needed gotten from the column name once before the loop (rs.getMetaData()).
Option1:
If you need all the items in the list at the same time you need to increase the heap space of the JVM, adding the argument -Xmx2G for example when you launch the app (java -Xmx2G -jar yourApp.jar).
Option2:
Divide the sql in more than one call
Some of your options:
Use a local database, such as SQLite. That's a very lightweight database management system which is easy to install – you don't need any special privileges to do so – its data is held in a single file in a directory of your choice (such as the directory that holds your Java application) and can be used as an alternative to a large Java data structure such as a List.
If you really must use an ArrayList, make sure you take up as little space as possible. Try the following:
a. If you know the approximate number of rows, then construct your ArrayList with an appropriate initialCapacity to avoid reallocations. Estimate the maximum number of rows your database will grow to, and add another few hundred to your initialCapacity just in case.
b. Make sure your SabanaDatos objects are as small as they can be. For example, make sure the id field is an int and not an Integer. If the hora field is just a time of day, it can be more efficiently held in a short than a String. Similarly for other fields, e.g. duracion - perhaps it can even fit into a byte, if its range allows it to? If you have several flag/Boolean fields, they can be packed into a single byte or short as bits. If you have String fields that have a lot of repetitions, you can intern them as per Joop's suggestion.
c. If you still get out-of-memory errors, increase your heap space using the JVM flags -Xms and -Xmx.
I'm not actually sure how to ask this question because its very confusing. I have a java app that has a MVC structure, which gets data from a database. I retrieve String ArrayList of data from a JDBC query. It contains information about Competitors in a race (eg: Name, Race_no, Start_Time, Swim_Time, Bike_Time, Finish_Time etc). The list size will vary week to week depending on the number of the competitors who raced that week. I have no problem getting the data from the database, but when I pass the information to the controller to pass to the view, I am having trouble assigning the data to a JLabel. So far the data is sent as one large array so I need to somehow split the array up in blocks of 12 (which is how many JLabels are required to be set for each competitor). I then set each of those 12 JLabels into its own JPanel ArrayList - then dynmically add to one JPanel for printing. My question is, how do I split the ArrayList to get the first 12, then the second lot of 12, etc.. I have tried doing a nested loop and set the size to 12, but of course that only gets the first 12 everytime. Maybe I need to store the data from the JDBC result set as something else.. I really need some guidance on this as have been stuck for days.
nb: I had this working as one large method in the data handler class, where I would use the while(rs.next()) to do all the work, but because of the MVC structure, I need to break the code up: This is the desired outcome:
EDIT:
I have implement this code which give me the desired output, but now having trouble assigning the JLabel variables with the data in the [j] loop:
<pre>
public void getRaceLabels()
{
ArrayList<String[]> raceLabels = dh.getRaceTimeLabels();
//System.out.println(raceLabels);
for (int i = 0; i < raceLabels.size(); i++)
{
String[] element = (String[]) raceLabels.get(i);
//System.out.println(element);
for (int j = 0; j < element.length; j++)
{
System.out.print(element[j]+" ,");
}
System.out.println("break");
}
</pre>
Create yourself a POJO which represents the basic information you need for a single record.
When loading the data from the database, load this data into the POJO.
Now, for each panel, you just need to pass it a single object, meaning you can now get away with using a simple iterator instead
Context
I've written a small Java app for basic testing of data migration from Oracle to Microsoft.
The app does the following things:
Queries Oracle USER_TAB_COLUMNS table to gather details about each table and it's fields.
Generates SELECT statements from the information gathered
Runs the SELECT statements on both the ORACLE and Microsoft versions of the database, saving the results as a String for each row within a Table object.
For each table, compares the rows to find discrepancies
Outputs a text file for each table, listing the mismatched rows. (For analysis)
Issue
The issue I'm having is in the comparison of the two String arrays I have (Oracle Rows and Microsoft Rows).
For some tables, there can be almost a million rows of data. Though my current code can match 1000 Oracle rows to Microsoft ones within a few seconds - the time adds up.
Current Attempts at Fixing Issue
Concatenating to 'row' String when reading in data rather than during comparison. (Before I had fields as there own String, and concatenated before comparing)
Breaking from the inner loop once match has been found for a row.
Removing 'oracleTable.getRows().size()' from the loop as to only perform this calculation once.
Ideas
Removing the row counter. (Would this make much of a difference? It's harder to observe the progress / speed without the counter so it is hard to tell)
Removing the matched Microsoft Row from it's List. (I thought it would be a good idea to remove the String from the List of Microsoft rows so that the same row isn't compared twice. I'm unsure whether this will add in more processing than it will save - as it's difficult removing from a List whilst iterating through it.
Code
numRowsOracle = oracleTable.getRows().size();
numRowsMicrosoft = msTable.getRows().size();
int orRowCounter = 0;
boolean matched;
// Each Oracle Row
for (String or : oracleTable.getRows()) {
matched = false;
orRowCounter++;
if (orRowCounter % 1000 == 0) {
System.out.println("Oracle Row: " + orRowCounter + " / "
+ numRowsOracle);
}
// Each Microsoft Row
for (String mr : msTable.getRows()) {
if (mr.equalsIgnoreCase(or)) {
matched = true;
break;
}
}
if (!matched) { // Adding row to list of unmatched
unmatchedRowStrings.add(or);
}
}
// Writing report on table.
exportlogs.writeTableLog(td.getTableName(), unmatchedRowStrings
.size(), unmatchedRowStrings, numRowsOracle,
numRowsMicrosoft);
}
Any suggestions on how to speed this up? I'd accept ideas not only speeding up comparing the two arrays, but also storing the data differently. I have not used other types of String storage, such as hashmaps. Would something different be quicker?
This is untested, so take this with a grain of salt, especially if you're using non-ascii characters.
You can make a lowercase (or uppercase) verison of the data in a single pass and then use a hashset to validate them.
// make a single pass over oracle rows, so O(n)
Set<String> oracleLower = new HashSet<>();
for(String or : oracleTable.getRows()) {
oracleLower.add(or.toLowerCase());
}
// make a single pass over msft rows, so O(n)
Set<String> msftLower = new HashSet<>();
for(String ms : microsoftTable.getRows()) {
msftLower.add(ms.toLowerCase());
}
// make a single pass over oracle rows, again O(n)
for(String or : oracleLower) {
// backed by a hash table, this has a constant time lookup
if(!msftLower.contains(or)) {
unmatched.add(or);
}
}
Each operation is O(n), thanks to the hash table. This requires double the space requirements, though. Optimizations may be necessary to only make one collection lowercase (probably msft) and only make the other one (probably oracle) lowercase inside the loop - then it would be more like for(String or : oracleTable.getRows()) { or = or.toLowerCase(); if(!msftLower.contains(or)) { ... } }