I have a csv file named abc.csv which contains data as follows :
All,Friday,0:00,315.06,327.92,347.24
All,Friday,1:00,316.03,347.73,370.55
and so on .....
I wish to import the data into Redis. How to do it through the Java API.
Please suggest the steps to do this.
I wish to run the jar and get the data imported into Redis db.
Any help on mass insert would also be helpful in case Java option is not possible.
You can do it by using Jedis(https://github.com/xetorthio/jedis), java client for redis. Just create a class with main method and create a connection, set keys.
void processData(){
List<String> lines = Files.readAllLines(
Paths.get("\Path\abc.csv"), Charset.forName("UTF-8"));
Jedis connection = new Jedis("host", port);
Pipeline p = connection.pipelined();
for(String line: lines){
String key = getKey(line);
String value = getValue(line);
p.set(key,value);
}
p.sync();
}
If the file is big you can create an inputstream out of it and read line by line instead of loading the whole file. You should also than call p.sync() in batches, just keep a counter and do a modulo with batch size.
Redisson allows to organize multiple commands to one. This called Redis pipelining. Here is example:
List<String> lines = ...
RBatch batch = redisson.createBatch();
RList<String> list = batch.getList("yourList");
for(String line: lines){
String key = getKey(line);
batch.getBucket(key).set(line);
}
// send to Redis as single command
batch.execute();
You can use Jedis Lib. Here I have extended my Java Object to get key and Hash to set in the object. In your case, you can have a list of parsed CSV values and keys directly sent as a list.
private static final int BATCH_SIZE = 1000;
public void setRedisHashInBatch(List<? extends RedisHashMaker> Obj) {
try (Jedis jedis = jedisPool.getResource()) {
log.info("Connection IP -{}", jedis.getClient().getSocket().getInetAddress());
Pipeline p = jedis.pipelined();
log.info("Setting Records in Cache. Size: " + Obj.size());
int j = 0;
for (RedisHashMaker obj : Obj) {
p.hmset(obj.getUniqueKey(), obj.getRedisHashStringFromObject());
j++;
if (j == BATCH_SIZE) {
p.sync();
j = 0;
}
}
p.sync();
}
}
Related
I want to fetch files from SFTP which are created after a given timestamp(time of last pull) in java. I am using j2ssh as of now. Please let me know if some other API supports such a feature.
Jsch supports the ls command which will bring you back all the attributes of the remote file. You can write a little code to eliminate the files you want to retrieve from there.
Java Doc: http://epaul.github.io/jsch-documentation/javadoc/
This example compares the remote file timestamps to find the oldest file, it wouldn't be much of a stretch to modify it to compare your last run date against the remote file date, then do the download as part of the loop.
Code from Finding file size and last modified of SFTP file using Java
try {
list = Main.chanSftp.ls("*.xml");
if (list.isEmpty()) {
fileFound = false;
}
else {
lsEntry = (ChannelSftp.LsEntry) list.firstElement();
oldestFile = lsEntry.getFilename();
attrs = lsEntry.getAttrs();
currentOldestTime = attrs.getMTime();
for (Object sftpFile : list) {
lsEntry = (ChannelSftp.LsEntry) sftpFile;
nextName = lsEntry.getFilename();
attrs = lsEntry.getAttrs();
nextTime = attrs.getMTime();
if (nextTime < currentOldestTime) {
oldestFile = nextName;
currentOldestTime = nextTime;
}
}
I am new to Hindsight & Hadoop map reduce concept. I am trying to merge multiple XML files to a single XML file using map reduce program. My intention is to merge each XML file into a destination XML file by prepending and appending file name as start and end tag.
For eg. the below XML's should be merged into a single XML shown below
Input XML Files
<xml><a></a></xml>
<xml><b></b></xml>
<xml><c></c></xml>
Output XML File
<xml>
<File1Name><xml><a></a></xml><File2Name>
<File2Name><xml><b></b></xml><File3Name>
<File3Name><xml><c></c></xml><File3Name>
<xml>
Question 1: Is it possible to map a XML file to each mapper and create a key value pair, key as a file name and value as an each XML file prepending and appending file name as start and end tags and reducer to merge all XML's to a single context and output to XML shown above.
Question 2: How can i get file name as key in mapper code?
Answer 1:
I don't suggest sending just a single XML to a mapper unless the files are over 1gb a piece. You can send a list of xml locations to your mapper and then in your mapper code open each location and extract the data into your output.
Answer 2:
If using azure blob storage, you could list all the blobs in a container and assign them to the input split.
How to create your list of InputSplits:
ArrayList<InputSplit> ret = new ArrayList<InputSplit>();
/*Do this for each path we receive. Creates a directory of splits in this order s = input path (S1,1),(s2,1)…(sN,1),(s1,2),(sN,2),(sN,3) etc..
*/
for (int i = numMinNameHashSplits; i <= Math.min(numMaxNameHashSplits,numNameHashSplits–1); i++) {
for (Path inputPath : inputPaths) {
ret.add(new ParseDirectoryInputSplit(inputPath.toString(), i));
System.out.println(i + ” “+inputPath.toString());
}
}
return ret;
}
}
Once the List<InputSplits> is assembled, each InputSplit is handed to a Record Reader class where each Key, Value, pair is read then passed to the map task. The initialization of the recordreader class uses the InputSplit, a string representing the location of a “folder” of invoices in blob storage, to return a list of all blobs within the folder, the blobs variable below. The below Java code demonstrates the creation of the record reader for each hashslot and the resulting list of blobs in that location.
Public class ParseDirectoryFileNameRecordReader
extends RecordReader<IntWritable, Text> {
private int nameHashSlot;
private int numNameHashSlots;
private Path myDir;
private Path currentPath;
private Iterator<ListBlobItem> blobs;
private int currentLocation;
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
myDir = ((ParseDirectoryInputSplit)split).getDirectoryPath();
//getNameHashSlot tells us which slot this record reader is responsible for
nameHashSlot = ((ParseDirectoryInputSplit)split).getNameHashSlot();
//gets the total number of hashslots
numNameHashSlots = getNumNameHashSplits(context.getConfiguration());
//gets the input credientals to the storage account assigned to this record reader.
String inputCreds = getInputCreds(context.getConfiguration());
//break the directory path to get account name
String[] authComponents = myDir.toUri().getAuthority().split(“#”);
String accountName = authComponents[1].split(“\\.”)[0];
String containerName = authComponents[0];
String accountKey = Utils.returnInputkey(inputCreds, accountName);
System.out.println(“This mapper is assigned the following account:”+accountName);
StorageCredentials creds = new StorageCredentialsAccountAndKey(accountName,accountKey);
CloudStorageAccount account = new CloudStorageAccount(creds);
CloudBlobClient client = account.createCloudBlobClient();
CloudBlobContainer container = client.getContainerReference(containerName);
blobs = container.listBlobs(myDir.toUri().getPath().substring(1) + “/”, true,EnumSet.noneOf(BlobListingDetails.class), null,null).iterator();
currentLocation = –1;
return;
}
Once initialized, the record reader is used to pass the next key to the map task. This is controlled by the nextKeyValue method, and it is called every time map task starts. The blow Java code demonstrates this.
//This checks if the next key value is assigned to this task or is assigned to another mapper. If it assigned to this task the location is passed to the mapper, otherwise return false
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
while (blobs.hasNext()) {
ListBlobItem currentBlob = blobs.next();
//Returns a number between 1 and number of hashslots. If it matches the number assigned to this Mapper and its length is greater than 0, return the path to the map function
if (doesBlobMatchNameHash(currentBlob) && getBlobLength(currentBlob) > 0) {
String[] pathComponents = currentBlob.getUri().getPath().split(“/”);
String pathWithoutContainer =
currentBlob.getUri().getPath().substring(pathComponents[1].length() + 1);
currentPath = new Path(myDir.toUri().getScheme(), myDir.toUri().getAuthority(),pathWithoutContainer);
currentLocation++;
return true;
}
}
return false;
}
The logic in the map function is than simply as follows, with inputStream containing the entire XML string
Path inputFile = new Path(value.toString());
FileSystem fs = inputFile.getFileSystem(context.getConfiguration());
//Input stream contains all data from the blob in the location provided by Text
FSDataInputStream inputStream = fs.open(inputFile);
Resources:
http://www.andrewsmoll.com/3-hacks-for-hadoop-and-hdinsight-clusters/ "Hack 3"
http://blogs.msdn.com/b/mostlytrue/archive/2014/04/10/merging-small-files-on-hdinsight.aspx
I'm building a java application that connects to a MQQueueManager and extracts information about queues. I'm able to get data like QueueType, MaximumMessageLength and more. However, I also want the name of the cluster the queue might be in. There is no function that comes with the MQQueue that gives me this information. After searching the internet I found several things pointing in this direction, but no examples.
A part of my function that gives me the MaximumDepth is:
queueManager = makeConnection(host, portNo, qMgr, channelName);
queue = queueManager.accessQueue(queueName, CMQC.MQOO_INQUIRE);
maxQueueDepth = queue.getMaximumDepth();
(makeConnection is not shown here, it is the function that makes the actual connection to the QueueManager; I also left out the try/catch/finally for less clutter)
How do I get ClusterName and perhaps other data, that doesn't have a function like queue.getMaximumDepth()?
There are two ways to get information about a queue.
The API Inquire call gets operational status of a queue. This includes things like the name the MQOpen call resolved to or the depth if the queue is local. Much of the q.inquire functionality has been superseded with getter and setter functions on the queue. If you are not using the v8.0 client with the latest functionality, you are highly advised to upgrade. It can access all versions of QMgr.
The following code is from Getting and setting attribute values in WebSphere MQ classes for Java
// inquire on a queue
final static int MQIA_DEF_PRIORITY = 6;
final static int MQCA_Q_DESC = 2013;
final static int MQ_Q_DESC_LENGTH = 64;
int[] selectors = new int[2];
int[] intAttrs = new int[1];
byte[] charAttrs = new byte[MQ_Q_DESC_LENGTH]
selectors[0] = MQIA_DEF_PRIORITY;
selectors[1] = MQCA_Q_DESC;
queue.inquire(selectors,intAttrs,charAttrs);
System.out.println("Default Priority = " + intAttrs[0]);
System.out.println("Description : " + new String(charAttrs,0));
For things that are not part of the API Inquire call, a PCF command is needed. Programmable Command Format, commonly abbreviated as PCF, is a message format used to pass messages to the command queue and for reading messages from the command queue, event queues and others.
To use a PCF command the calling application must be authorized with +put on SYSTEM.ADMIN.COMMAND.QUEUE and for +dsp on the object being inquired upon.
IBM provides sample code.
On Windows, please see: %MQ_FILE_PATH%\Tools\pcf\samples
In UNIX flavors, please see: /opt/mqm/samp/pcf/samples
The locations may vary depending on where MQ was installed.
Please see: Handling PCF messages with IBM MQ classes for Java. The following snippet is from the PCF_DisplayActiveLocalQueues.java sample program.
public static void DisplayActiveLocalQueues(PCF_CommonMethods pcfCM) throws PCFException,
MQDataException, IOException {
// Create the PCF message type for the inquire.
PCFMessage pcfCmd = new PCFMessage(MQConstants.MQCMD_INQUIRE_Q);
// Add the inquire rules.
// Queue name = wildcard.
pcfCmd.addParameter(MQConstants.MQCA_Q_NAME, "*");
// Queue type = LOCAL.
pcfCmd.addParameter(MQConstants.MQIA_Q_TYPE, MQConstants.MQQT_LOCAL);
// Queue depth filter = "WHERE depth > 0".
pcfCmd.addFilterParameter(MQConstants.MQIA_CURRENT_Q_DEPTH, MQConstants.MQCFOP_GREATER, 0);
// Execute the command. The returned object is an array of PCF messages.
PCFMessage[] pcfResponse = pcfCM.agent.send(pcfCmd);
// For each returned message, extract the message from the array and display the
// required information.
System.out.println("+-----+------------------------------------------------+-----+");
System.out.println("|Index| Queue Name |Depth|");
System.out.println("+-----+------------------------------------------------+-----+");
for (int index = 0; index < pcfResponse.length; index++) {
PCFMessage response = pcfResponse[index];
System.out.println("|"
+ (index + pcfCM.padding).substring(0, 5)
+ "|"
+ (response.getParameterValue(MQConstants.MQCA_Q_NAME) + pcfCM.padding).substring(0, 48)
+ "|"
+ (response.getParameterValue(MQConstants.MQIA_CURRENT_Q_DEPTH) + pcfCM.padding)
.substring(0, 5) + "|");
}
System.out.println("+-----+------------------------------------------------+-----+");
return;
}
}
After more research I finally found what I was looking for.
This example of IBM: Getting and setting attribute values in WebSphere MQ classes helped me to set up the inquiry.
The necessary values I found in this list: Constant Field Values.
I also needed to expand the openOptionsArg of accessQueue(), else cluster queues cannot be inquired.
Final result:
(without makeConnection())
public class QueueManagerServices {
final static int MQOO_INQUIRE_TOTAL = CMQC.MQOO_FAIL_IF_QUIESCING | CMQC.MQOO_INPUT_SHARED | CMQC.MQOO_INQUIRE;
MQQueueManager queueManager = null;
String cluster = null;
MQQueue queue = null;
public String getcluster(String host, int portNo, String qMgr, String channelName){
try{
queueManager = makeConnection(host, portNo, qMgr, channelName);
queue = queueManager.accessQueue(queueName, MQOO_INQUIRE_TOTAL);
int MQCA_CLUSTER_NAME = 2029;
int MQ_CLUSTER_NAME_LENGTH = 48;
int[] selectors = new int[1];
int[] intAttrs = new int[1];
byte[] charAttrs = new byte[MQ_CLUSTER_NAME_LENGTH];
selectors[0] = MQCA_CLUSTER_NAME;
queue.inquire(selectors, intAttrs, charAttrs);
cluster = new String (charAttrs);
} catch (MQException e) {
System.out.println(e);
} finally {
if (queue != null){
queue.close();
}
if (queueManager != null){
queueManager.disconnect();
}
}
return cluster;
}
}
I am trying to take a very long file of strings and convert it to an XML according to a schema I was given. I used jaxB to create classes from that schema. Since the file is very large I created a thread pool to improve the performance but since then it only processes one line of the file and marshalls it to the XML file, per thread.
Below is my home class where I read from the file. Each line is a record of a transaction, for every new user encountered a list is made to store all of that users transactions and each list is put into a HashMap. I made it a ConcurrentHashMap because multiple threads will work on the map simultaneously, is this the correct thing to do?
After the lists are created a thread is made for each user. Each thread runs the method ProcessCommands below and receives from home the list of transactions for its user.
public class home{
public static File XMLFile = new File("LogFile.xml");
Map<String,List<String>> UserMap= new ConcurrentHashMap<String,List<String>>();
String[] UserNames = new String[5000];
int numberOfUsers = 0;
try{
BufferedReader reader = new BufferedReader(new FileReader("test.txt"));
String line;
while ((line = reader.readLine()) != null)
{
parsed = line.split(",|\\s+");
if(!parsed[2].equals("./testLOG")){
if(Utilities.checkUserExists(parsed[2], UserNames) == false){ //User does not already exist
System.out.println("New User: " + parsed[2]);
UserMap.put(parsed[2],new ArrayList<String>()); //Create list of transactions for new user
UserMap.get(parsed[2]).add(line); //Add First Item to new list
UserNames[numberOfUsers] = parsed[2]; //Add new user
numberOfUsers++;
}
else{ //User Already Existed
UserMap.get(parsed[2]).add(line);
}
}
}
reader.close();
} catch (IOException x) {
System.err.println(x);
}
//get start time
long startTime = new Date().getTime();
tCount = numberOfUsers;
ExecutorService threadPool = Executors.newFixedThreadPool(tCount);
for(int i = 0; i < numberOfUsers; i++){
System.out.println("Starting Thread " + i + " for user " + UserNames[i]);
Runnable worker = new ProcessCommands(UserMap.get(UserNames[i]),UserNames[i], XMLfile);
threadPool.execute(worker);
}
threadPool.shutdown();
while(!threadPool.isTerminated()){
}
System.out.println("Finished all threads");
}
Here is the ProcessCommands class. The thread receives the list for its user and creates a marshaller. From what I unserstand marshalling is not thread safe so it is best to create one for each thread, is this the best way to do that?
When I create the marshallers I know that each from (from each thread) will want to access the created file causing conflicts, I used synchronized, is that correct?
As the thread iterates through it's list, each line calls for a certain case. There are a lot so I just made pseudo-cases for clarity. Each case calls the function below.
public class ProcessCommands implements Runnable{
private static final boolean DEBUG = false;
private List<String> list = null;
private String threadName;
private File XMLfile = null;
public Thread myThread;
public ProcessCommands(List<String> list, String threadName, File XMLfile){
this.list = list;
this.threadName = threadName;
this.XMLfile = XMLfile;
}
public void run(){
Date start = null;
int transactionNumber = 0;
String[] parsed = new String[8];
String[] quoteParsed = null;
String[] universalFormatCommand = new String[9];
String userCommand = null;
Connection connection = null;
Statement stmt = null;
Map<String, UserObject> usersMap = null;
Map<String, Stack<BLO>> buyMap = null;
Map<String, Stack<SLO>> sellMap = null;
Map<String, QLO> stockCodeMap = null;
Map<String, BTO> buyTriggerMap = null;
Map<String, STO> sellTriggerMap = null;
Map<String, USO> usersStocksMap = null;
String SQL = null;
int amountToAdd = 0;
int tempDollars = 0;
UserObject tempUO = null;
BLO tempBLO = null;
SLO tempSLO = null;
Stack<BLO> tempStBLO = null;
Stack<SLO> tempStSLO = null;
BTO tempBTO = null;
STO tempSTO = null;
USO tempUSO = null;
QLO tempQLO = null;
String stockCode = null;
String quoteResponse = null;
int usersDollars = 0;
int dollarAmountToBuy = 0;
int dollarAmountToSell = 0;
int numberOfSharesToBuy = 0;
int numberOfSharesToSell = 0;
int quoteStockInDollars = 0;
int shares = 0;
Iterator<String> itr = null;
int transactionCount = list.size();
System.out.println("Starting "+threadName+" - listSize = "+transactionCount);
//UO dollars, reserved
usersMap = new HashMap<String, UserObject>(3); //userName -> UO
//USO shares
usersStocksMap = new HashMap<String, USO>(); //userName+stockCode -> shares
//BLO code, timestamp, dollarAmountToBuy, stockPriceInDollars
buyMap = new HashMap<String, Stack<BLO>>(); //userName -> Stack<BLO>
//SLO code, timestamp, dollarAmountToSell, stockPriceInDollars
sellMap = new HashMap<String, Stack<SLO>>(); //userName -> Stack<SLO>
//BTO code, timestamp, dollarAmountToBuy, stockPriceInDollars
buyTriggerMap = new ConcurrentHashMap<String, BTO>(); //userName+stockCode -> BTO
//STO code, timestamp, dollarAmountToBuy, stockPriceInDollars
sellTriggerMap = new HashMap<String, STO>(); //userName+stockCode -> STO
//QLO timestamp, stockPriceInDollars
stockCodeMap = new HashMap<String, QLO>(); //stockCode -> QLO
//create user object and initialize stacks
usersMap.put(threadName, new UserObject(0, 0));
buyMap.put(threadName, new Stack<BLO>());
sellMap.put(threadName, new Stack<SLO>());
try {
//Marshaller marshaller = getMarshaller();
synchronized (this){
Marshaller marshaller = init.jc.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
marshaller.marshal(LogServer.Root,XMLfile);
marshaller.marshal(LogServer.Root,System.out);
}
} catch (JAXBException M) {
M.printStackTrace();
}
Date timing = new Date();
//universalFormatCommand = new String[8];
parsed = new String[8];
//iterate through workload file
itr = this.list.iterator();
while(itr.hasNext()){
userCommand = (String) itr.next();
itr.remove();
parsed = userCommand.split(",|\\s+");
transactionNumber = Integer.parseInt(parsed[0].replaceAll("\\[", "").replaceAll("\\]", ""));
universalFormatCommand = Utilities.FormatCommand(parsed, parsed[0]);
if(transactionNumber % 100 == 0){
System.out.println(this.threadName + " - " +transactionNumber+ " - "+(new Date().getTime() - timing.getTime())/1000);
}
/*System.out.print("UserCommand " +transactionNumber + ": ");
for(int i = 0;i<8;i++)System.out.print(universalFormatCommand[i]+ " ");
System.out.print("\n");*/
//switch for user command
switch (parsed[1].toLowerCase()) {
case "One"
*Do Stuff"
LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
break;
case "Two"
*Do Stuff"
LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
break;
}
}
}
The function create_Log has multiple cases so as before, for clarity I just left one. The case "QUOTE" only calls one object creation function but other other cases can create multiple objects. The type 'log' is a complex XML type that defines all the other object types so in each call to create_Log I create a log type called Root. The class 'log' generated by JaxB included a function to create a list of objects. The statement:
Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);
takes the root element I created, creates a list and adds the newly created object 'quote_QuoteType' to that list. Before I added threading this method successfully created a list of as many objects as I wanted then marshalled them. So I'm pretty positive the bit in class 'LogServer' is not the issue. It is something to do with the marshalling and syncronization in the ProcessCommands class above.
public class LogServer{
public static log Root = new log();
public static QuoteServerType Log_Quote(String[] input, int TransactionNumber){
ObjectFactory factory = new ObjectFactory();
QuoteServerType quoteCall = factory.createQuoteServerType();
**Populate the QuoteServerType object called quoteCall**
return quoteCall;
}
public static void create_Log(String[] input, int TransactionNumber, CommandType Command){
System.out.print("TRANSACTION "+TransactionNumber + " is " + Command + ": ");
for(int i = 0; i<input.length;i++) System.out.print(input[i] + " ");
System.out.print("\n");
switch(input[1]){
case "QUOTE":
System.out.print("QUOTE CASE");
QuoteServerType quote_QuoteType = Log_Quote(input,TransactionNumber);
Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);
break;
}
}
So you wrote a lot of code, but have you try if it is actually working? After quick look I doubt it. You should test your code logic part by part not going all the way till the end. It seems you are just staring with Java. I would recommend practice first on simple one threaded applications. Sorry if I sound harsh, but I will try to be constructive as well:
Per convention, the classes names are starts with capital letter, variables by small, you do it other way.
You should make a method in you home (Home) class not a put all your code in the static block.
You are reading the whole file to the memory, you do not process it line by line. After the Home is initialized literary whole content of file will be under UserMap variable. If the file is really large you will run out of the heap memory. If you assume large file than you cannot do it and you have to redisign your app to store somewhere partial results. If your file is smaller than memmory you could keep it like that (but you said it is large).
No need for UserNames, the UserMap.containsKey will do the job
Your thread pools size should be in the range of your cores not number of users as you will get thread trashing (if you have blocking operation in your code make tCount = 2*processors if not keep it as number of processors). Once one ProcessCommand finish, the executor will start another one till you finish all and you will be efficiently using all your processor cores.
DO NOT while(!threadPool.isTerminated()), this line will completely consume one processor as it will be constantly checking, call awaitTermination instead
Your ProcessCommand, has view map variables which will only had one entry cause as you said, each will process data from one user.
The synchronized(this) is Process will not work, as each thread will synchronized on different object (different isntance of process).
I believe creating marshaller is thread safe (check it) so no need to synchronization at all
You save your log (whatever it is) before you did actual processing in of the transactions lists
The marshalling will override content of the file with current state of LogServer.Root. If it is shared bettween your proccsCommand (seems so) what is the point in saving it in each thread. Do it once you are finished.
You dont need itr.remove();
The log class (for the ROOT variable !!!) needs to be thread-safe as all the threads will call the operations on it (so the list inside the log class must be concurrent list etc).
And so on.....
I would recommend, to
Start with simple one thread version that actually works.
Deal with processing line by line, (store reasults for each users in differnt file, you can have cache with transactions for recently used users so not to keep writing all the time to the disk (see guava cache)
Process multithreaded each user transaction to your user log objects (again if it is a lot you have to save them to the disk not keep all in memmory).
Write code that combines logs from diiffernt users to create one (again you may want to do it mutithreaded), though it will be mostly IO operations so not much gain and more tricky to do.
Good luck
override cont
I am getting Java Heap Space Error while writing large data from database to an excel sheet.
I dont want to use JVM -XMX options to increase memory.
Following are the details:
1) I am using org.apache.poi.hssf api
for excel sheet writing.
2) JDK version 1.5
3) Tomcat 6.0
Code i have wriiten works well for around 23 thousand records, but it fails for more than 23K records.
Following is the code:
ArrayList l_objAllTBMList= new ArrayList();
l_objAllTBMList = (ArrayList) m_objFreqCvrgDAO.fetchAllTBMUsers(p_strUserTerritoryId);
ArrayList l_objDocList = new ArrayList();
m_objTotalDocDtlsInDVL= new HashMap();
Object l_objTBMRecord[] = null;
Object l_objVstdDocRecord[] = null;
int l_intDocLstSize=0;
VisitedDoctorsVO l_objVisitedDoctorsVO=null;
int l_tbmListSize=l_objAllTBMList.size();
System.out.println(" getMissedDocDtlsList_NSM ");
for(int i=0; i<l_tbmListSize;i++)
{
l_objTBMRecord = (Object[]) l_objAllTBMList.get(i);
l_objDocList = (ArrayList) m_objGenerateVisitdDocsReportDAO.fetchAllDocDtlsInDVL_NSM((String) l_objTBMRecord[1], p_divCode, (String) l_objTBMRecord[2], p_startDt, p_endDt, p_planType, p_LMSValue, p_CycleId, p_finYrId);
l_intDocLstSize=l_objDocList.size();
try {
l_objVOFactoryForDoctors = new VOFactory(l_intDocLstSize, VisitedDoctorsVO.class);
/* Factory class written to create and maintain limited no of Value Objects (VOs)*/
} catch (ClassNotFoundException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (InstantiationException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (IllegalAccessException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
}
for(int j=0; j<l_intDocLstSize;j++)
{
l_objVstdDocRecord = (Object[]) l_objDocList.get(j);
l_objVisitedDoctorsVO = (VisitedDoctorsVO) l_objVOFactoryForDoctors.getVo();
if (((String) l_objVstdDocRecord[6]).equalsIgnoreCase("-"))
{
if (String.valueOf(l_objVstdDocRecord[2]) != "null")
{
l_objVisitedDoctorsVO.setPotential_score(String.valueOf(l_objVstdDocRecord[2]));
l_objVisitedDoctorsVO.setEmpcode((String) l_objTBMRecord[1]);
l_objVisitedDoctorsVO.setEmpname((String) l_objTBMRecord[0]);
l_objVisitedDoctorsVO.setDoctorid((String) l_objVstdDocRecord[1]);
l_objVisitedDoctorsVO.setDr_name((String) l_objVstdDocRecord[4] + " " + (String) l_objVstdDocRecord[5]);
l_objVisitedDoctorsVO.setDoctor_potential((String) l_objVstdDocRecord[3]);
l_objVisitedDoctorsVO.setSpeciality((String) l_objVstdDocRecord[7]);
l_objVisitedDoctorsVO.setActualpractice((String) l_objVstdDocRecord[8]);
l_objVisitedDoctorsVO.setLastmet("-");
l_objVisitedDoctorsVO.setPreviousmet("-");
m_objTotalDocDtlsInDVL.put((String) l_objVstdDocRecord[1], l_objVisitedDoctorsVO);
}
}
}// End of While
writeExcelSheet(); // Pasting this method at the end
// Clean up code
l_objVOFactoryForDoctors.resetFactory();
m_objTotalDocDtlsInDVL.clear();// Clear the used map
l_objDocList=null;
l_objTBMRecord=null;
l_objVstdDocRecord=null;
}// End of While
l_objAllTBMList=null;
m_objTotalDocDtlsInDVL=null;
-------------------------------------------------------------------
private void writeExcelSheet() throws IOException
{
HSSFRow l_objRow = null;
HSSFCell l_objCell = null;
VisitedDoctorsVO l_objVisitedDoctorsVO = null;
Iterator l_itrDocMap = m_objTotalDocDtlsInDVL.keySet().iterator();
while (l_itrDocMap.hasNext())
{
Object key = l_itrDocMap.next();
l_objVisitedDoctorsVO = (VisitedDoctorsVO) m_objTotalDocDtlsInDVL.get(key);
l_objRow = m_objSheet.createRow(m_iRowCount++);
l_objCell = l_objRow.createCell(0);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(String.valueOf(l_intSrNo++));
l_objCell = l_objRow.createCell(1);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getEmpname() + " (" + l_objVisitedDoctorsVO.getEmpcode() + ")"); // TBM Name
l_objCell = l_objRow.createCell(2);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDr_name());// Doc Name
l_objCell = l_objRow.createCell(3);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPotential_score());// Freq potential score
l_objCell = l_objRow.createCell(4);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDoctor_potential());// Freq potential score
l_objCell = l_objRow.createCell(5);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getSpeciality());//CP_GP_SPL
l_objCell = l_objRow.createCell(6);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getActualpractice());// Actual practise
l_objCell = l_objRow.createCell(7);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPreviousmet());// Lastmet
l_objCell = l_objRow.createCell(8);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getLastmet());// Previousmet
}
// Write OutPut Stream
try {
out = new FileOutputStream(m_objFile);
outBf = new BufferedOutputStream(out);
m_objWorkBook.write(outBf);
} catch (Exception ioe) {
ioe.printStackTrace();
System.out.println(" Exception in chunk write");
} finally {
if (outBf != null) {
outBf.flush();
outBf.close();
out.close();
l_objRow=null;
l_objCell=null;
}
}
}
Instead of populating the complete list in memory before starting to write to excel you need to modify the code to work in such a way that each object is written to a file as it is read from the database. Take a look at this question to get some idea of the other approach.
Well, I'm not sure if POI can handle incremental updates but if so you might want to write chunks of say 10000 Rows to the file. If not, you might have to use CSV instead (so no formatting) or increase memory.
The problem is that you need to make objects written to the file elligible for garbage collection (no references from a live thread anymore) before writing the file is finished (before all rows have been generated and written to the file).
Edit:
If can you write smaller chunks of data to the file you'd also have to only load the necessary chunks from the db. So it doesn't make sense to load 50000 records at once and then try and write 5 chunks of 10000, since those 50000 records are likely to consume a lot of memory already.
As Thomas points out, you have too many objects taking up too much space, and need a way to reduce that. There is a couple of strategies for this I can think of:
Do you need to create a new factory each time in the loop, or can you reuse it?
Can you start with a loop getting the information you need into a new structure, and then discarding the old one?
Can you split the processing into a thread chain, sending information forwards to the next step, avoiding building a large memory consuming structure at all?