Process Large Files in Java - java

I have a requirement like, need to process records from a text file and insert/update in to a table. Following is the code that I have written. But when the records in the file are 50,000 its taking more than 30 minutes to process the records, and if the records are close to 80k, out of memory error is thrown. Can anyone please suggest a way to optimize the code that I have written to improve performance ?
public static String insertIntoCHG_PNT_Table(String FILE_NAME) throws NumberFormatException, IOException
{
Date DATE_INSERTED = new Date();
String strLine = "";
FileReader fr = new FileReader(FILE_NAME);
BufferedReader br = new BufferedReader(fr);
long SEQ = 0;
double consumption = 1;
String returnString = "";
CHG_PNT insertObj = null;
long KY_PREM_NO = 0;
long KY_SPT = 0;
String COD_COT_TYP = "";
String DT_EFF = "";
String TS_KY_TOT = "";
String COD_COT = "";
String ACL_VLE = "";
String ACL_QTY = "";
String WTR_VLE = "";
String WTR_QTY = "";
String SWG_VLE = "";
String SWG_QTY = "";
String CD_TYPE_ACT = "";
String DT_TERM = "";
String CD_STAT = "";
String DT_STAT = "";
String VLN_PPE_SIZ_COD = "";
String WTR_PPE_SIZ_MTD = "";
String SWG_PPE_SIZ_MTD = "";
while( (strLine = br.readLine()) != null){
/*
* Meter Serial No, Property No, Current Meter Index, Previous meter index, Consumption needs to be added
*
*
*/
String[] split = strLine.split("\\;");
KY_PREM_NO = Long.parseLong(split[0].trim());
KY_SPT = Long.parseLong(split[1].trim());
COD_COT_TYP = split[2].trim();
DT_EFF = split[3].trim();
TS_KY_TOT = split[4].trim();
COD_COT = split[5].trim();
ACL_VLE = split[6].trim();
ACL_QTY = split[7].trim();
WTR_VLE = split[8].trim();
WTR_QTY = split[9].trim();
SWG_VLE = split[10].trim();
SWG_QTY = split[11].trim();
CD_TYPE_ACT = split[12].trim();
DT_TERM = split[13].trim();
CD_STAT = split[14].trim();
DT_STAT = split[15].trim();
VLN_PPE_SIZ_COD = split[16].trim();
WTR_PPE_SIZ_MTD = split[17].trim();
SWG_PPE_SIZ_MTD = split[18].trim();
long counter = 0;
long newCounter = 0;
CHG_PNT checkRecordCount = null;
checkRecordCount = checkAndUpdateRecord(KY_PREM_NO,KY_SPT,COD_COT_TYP,TS_KY_TOT);
try {
if(checkRecordCount == null)
insertObj = new CHG_PNT();
else
insertObj = checkRecordCount;
insertObj.setKY_PREM_NO(KY_PREM_NO);
//insertObj.setSEQ_NO(SEQ);
insertObj.setKY_SPT(KY_SPT);
insertObj.setCOD_COT_TYP(COD_COT_TYP);
insertObj.setDT_EFF(DT_EFF);
insertObj.setTS_KY_TOT(TS_KY_TOT);
insertObj.setCOD_COT(COD_COT);
insertObj.setACL_VLE(Double.parseDouble(ACL_VLE));
insertObj.setACL_QTY(Double.parseDouble(ACL_QTY));
insertObj.setWTR_VLE(Double.parseDouble(WTR_VLE));
insertObj.setWTR_QTY(Double.parseDouble(WTR_QTY));
insertObj.setSWG_VLE(Double.parseDouble(SWG_VLE));
insertObj.setSWG_QTY(Double.parseDouble(SWG_QTY));
insertObj.setCD_TYPE_ACT(CD_TYPE_ACT);
insertObj.setDT_TERM(DT_TERM);
insertObj.setCD_STAT(Double.parseDouble(CD_STAT));
insertObj.setDT_STAT(DT_STAT);
insertObj.setVLN_PPE_SIZ_COD(VLN_PPE_SIZ_COD);
insertObj.setWTR_PPE_SIZ_MTD(WTR_PPE_SIZ_MTD);
insertObj.setSWG_PPE_SIZ_MTD(SWG_PPE_SIZ_MTD);
insertObj.setDATE_INSERTED(DATE_INSERTED);
if(checkRecordCount == null)
{
insertObj.setDATE_INSERTED(DATE_INSERTED);
insertObj.insert();
}
else
{
insertObj.setDATE_MODIFIED(DATE_INSERTED);
insertObj.update();
}
BSF.getObjectManager()._commitTransactionDirect(true);
}catch(Exception e)
{
String abc = e.getMessage();
}
}
fr.close();
br.close();
String localPath = FILE_NAME;
File f = new File(FILE_NAME);
String fullPath = f.getParent();
String fileName = f.getName();
String SubStr1 = new String("Processing");
int index = fullPath.lastIndexOf(SubStr1);
String path = fullPath.substring(0, index);
String destPath = path+"\\Archive\\"+fileName;
PMP_PROPERTIES.copyFile(new File(localPath),new File(destPath));
File file = new File(FILE_NAME);
file.delete();
return null;
}

There are two main problems. The first one is a performance problem - and, contrary to your intuition, the problem is the database insertion speed.
You are inserting each item in a separate transaction. You should not do that if you want your inserts to be quick. Introduce a counter variable and perform a commint only each N inserts and at the end.
int commitStep = 100;
int modCount = 0;
while() {
//... your code
modCount++;
if ( modCount % commitStep == 0 ) {
BSF.getObjectManager()._commitTransactionDirect(true);
}
}
You can read more about sql insert speed-up here: Sql insert speed up
The second problem is, possibly, file reading scalability. It will work for smaller files, but not for larger ones. This question Read large files in Java has some good answers to your problem.

Related

java:get different parts of two similar strings

Get different parts of two similar strings
example:
1.String1="bjsqzctjjzxyxgs" String2="bjqzctjjxxzxyxgs" result:String[] s3 = {"s","xx"}
2.String1="bjssdwxxjsyxgs" String2="bjsdwxxjsyxgs" result:String[] s3 = {"s"}
3.String1="bjydcrwljskjyxgs" String2="bjydcrjswlkjyxgs" result:String[] s3 = {"wljs","jswl"}
for the example1,i make the string to the char array ,then i can get the String "sqzctjj" and "qzctjjxx",but i cant get the result like the example result {"s","xx"}.
I hope some friends can guide me out.
thanks
public static Map<String,String> getPreviousStrAndLastStr(String shortCompanyName, String shortSelectCompanyName){
char[] shortCompanyNameCharArray = (shortCompanyName).toCharArray();
char[] shortSelectCompanyNameCharArray = shortSelectCompanyName.toCharArray();
Map<String,String> map = new HashMap<String,String>();
int cirNum = 0;
int previousDifIndex = 0;
int lastDifIndex = 0;
String longSplitStr = "";
String shortSplitStr = "";
if(shortCompanyNameCharArray.length>shortSelectCompanyNameCharArray.length){
cirNum = shortSelectCompanyNameCharArray.length;
longSplitStr = shortCompanyName;
shortSplitStr = shortSelectCompanyName;
}else{
cirNum = shortCompanyNameCharArray.length;
longSplitStr = shortSelectCompanyName;
shortSplitStr = shortCompanyName;
}
for(int i=0;i<cirNum;i++){
if (shortCompanyNameCharArray[i]!=shortSelectCompanyNameCharArray[i]){
System.out.println(shortCompanyNameCharArray[i]+"--------"+shortSelectCompanyNameCharArray[i]);
previousDifIndex = i;
break;
}
}
if(previousDifIndex != 0){
for(int i=0;i<cirNum;i++){
if (shortCompanyNameCharArray[shortCompanyNameCharArray.length-i-1]!=shortSelectCompanyNameCharArray[shortSelectCompanyNameCharArray.length-i-1]){
System.out.println(shortCompanyNameCharArray[shortCompanyNameCharArray.length-i-1]+"--------"+shortSelectCompanyNameCharArray[shortSelectCompanyNameCharArray.length-i-1]);
lastDifIndex =shortSplitStr.length() - i;
break;
}
if(previousDifIndex==(cirNum-i)){
lastDifIndex = shortSplitStr.length() - i;;
break;
}
}
}
String previousStr = shortSplitStr.substring(0,previousDifIndex);
String lastStr = shortSplitStr.substring(lastDifIndex,shortSplitStr.length());
String diffStr1 = longSplitStr.replace(previousStr,"").replace(lastStr,"");
String diffStr2 = shortSplitStr.replace(previousStr,"").replace(lastStr,"");
map.put("previousStr",previousStr);
map.put("lastStr",lastStr);
map.put("diffStr1",diffStr1);
map.put("diffStr2",diffStr2);
map.put("previousDifIndex",previousDifIndex+"");
map.put("lastDifIndex",lastDifIndex+"");
return map;
}
input "bjydcrwljskjyxgs" and "bjqzctjjxxzxyxgs"
output {diffStr2=sqzctjj, previousStr=bj, diffStr1=qzctjjxx, lastDifIndex=9, lastStr=zxyxgs, previousDifIndex=2}
so i want to make the diffStr1="qzctjjxx"and the diffStr2="sqzctjj" how to be
the String[] s3 = {"s","xx"}

CSV Result is printing more then one time

I am using csvwriter to get the output in csv format in java .
I have a procedure which is returning data in following format:
result didNumber businessName actualNumber
91930921800 TEMP NAME
917930921800 TEMP NAME
NOANSWER 917930921800 TEMP NAME 9845618737
NOANSWER 917930921800 TEMP NAME 9845618737
NOANSWER 917930921800 TEMP NAME 8951577970
NOANSWER 917930921801 TEMP NAME 9920036823
NOANSWER 917930921802 TEMP NAME 8762224646
NOANSWER 917930921802 TEMP NAME 9845618737
NOANSWER 917930921803 TEMP NAME 9448017755
NOANSWER 917930921803 TEMP NAME 9845618737
NOANSWER 917930921895 TEMP NAME 9448017755
DIALING 917930921800 TEMP NAME 8951577970
DIALING 917930921802 TEMP NAME 8762224646
Here is my java code to write in csv file.
public void writeCSVData(ResultSet cdrs,String fileName) throws IOException, SQLException {
String methodName = "writeCSVData";
logger.debug("entering in " + methodName + " in " + className);
BufferedWriter writer = new BufferedWriter(new FileWriter(createFile(fileName).getAbsoluteFile()));
CSVWriter csvWriter = new CSVWriter(writer);
String currentDid = null;
String prevDid = null;
String businessName = null;
String businessReference = null;
String actualNumber = null;
StringBuilder result = new StringBuilder();
String[] entries = null;
int count = 0;
while (cdrs.next()) {
currentDid = cdrs.getString("didNumber");
if (count == 0){
prevDid = currentDid;
result.append(currentDid);
result.append(",");
businessName = cdrs.getString("businessName");
result.append(businessName);
result.append(",");
businessReference = cdrs.getString("businessReference");
result.append(businessReference);
result.append(",");
actualNumber = cdrs.getString("actualNumber");
result.append(actualNumber);
count++;
continue;
}
if(!prevDid.equals(currentDid)){
entries = result.toString().split(",");
result.append("\n");
csvWriter.writeNext(entries);
prevDid = currentDid;
currentDid = cdrs.getString("didNumber");
result.append(currentDid);
result.append(",");
businessName = cdrs.getString("businessName");
result.append(businessName);
result.append(",");
businessReference = cdrs.getString("businessReference");
result.append(businessReference);
result.append(",");
actualNumber = cdrs.getString("actualNumber");
result.append(actualNumber);
result.append(",");
continue;
}
if(prevDid.equals(currentDid)){
prevDid = currentDid;
result.append(",");
actualNumber = cdrs.getString("actualNumber");
result.append(actualNumber);
result.append(",");
continue;
}
}
}
Basically I am writing all the actual number for a didNumber in one line and I am writing it to csv.
The response is correct only the thing is every line is repeating more than once .
Here is my output:
"917930921800","TEMP NAME","null","null","null","","9845618737","","8951577970","","9845618737","","8951577970"
"917930921800","TEMP NAME","null","null","null","","9845618737","","8951577970","","9845618737","","8951577970","
917930921801","TEMP NAME","null","9920036823"
"917930921800","TEMP NAME","null","null","null","","9845618737","","8951577970","","9845618737","","8951577970","
917930921801","TEMP NAME","null","9920036823","
917930921802","TEMP NAME","null","8762224646","","8762224646","","9845618737"
"917930921800","TEMP NAME","null","null","null","","9845618737","","8951577970","","9845618737","","8951577970","
917930921801","TEMP NAME","null","9920036823","
917930921802","TEMP NAME","null","8762224646","","8762224646","","9845618737","
917930921803","TEMP NAME","null","9448017755","","9845618737
Ideally the response should be like this:
"917930921800","TEMP NAME","null","null","null","","9845618737","","8951577970","","9845618737","","8951577970"
917930921801","TEMP NAME","null","9920036823","
917930921802","TEMP NAME","null","8762224646","","8762224646","","9845618737"
917930921803","TEMP NAME","null","9448017755","","9845618737
Please show me what might I have done wrong.
I would recommend you to use some library like opecsv as Lutz suggested .
The possible problem is here .
StringBuilder result = new StringBuilder();
Yoou are declearing this line outside the loop and not reseting it after the writing the string .
Add the following line below csvWriter.writeNext(entries);
result.delete(0, result.length());
It will reset the string .

Iterating List using DAO Java

In this code I declared a variable that contains the values of the List which is the acode. I can see the values in this List, but I don't know how to iterate over the values using the adao.adao.findAllacctDesc(**acode**). How can i iterate over this list, so that the options show the values?
Here is the code:
<%
TblTaxTypeDAO tdao = DAOFactory.getDaoManager(TblTaxType.class);
TblAccountCodesDAO adao = DAOFactory.getDaoManager(TblAccountCodes.class);
List<TblTaxType> acode = null;
String tcode = request.getParameter("taxt");
String bcode = request.getParameter("bfns");
acode = tdao.findAllAcctCode(bcode, tcode);
Debugger.print(acode);
List<TblAccountCodes> acctdesclist = null;
acctdesclist = adao.findAllacctDesc(acode); <= Having error in this line because acode is a list not a string.
String acctdescoptions = "";
if( acctdesclist!=null) {
if( acctdesclist.size()>0 ) {
for(int i=0; i<acctdesclist.size();i++) {
TblAccountCodes acctcode = (TblAccountCodes) acctdesclist.get(i);
acctdescoptions += "<option value='"+acctcode.getAcctCode()+"'>"+acctcode.getAcctDesc()+"</option>";
acctcode = null;
}
}
}
adao = null;
acctdesclist = null;
%>
<%=acctdescoptions%>
change
List<TblAccountCodes> acctdesclist = null;
acctdesclist = adao.findAllacctDesc(acode); <= Having error in this line because acode is a list not a string.
by
List<TblAccountCodes> acctdesclist = null;
for(TblTaxType T:acode){
acctdesclist.add(adao.findAllacctDesc(T.getString))
}
Is this what you want?
where getString is the function in TblTaxType where you return the string you need.

How would I tidy this code into a loop in java?

public class TagHandler {
private final String START = "<START ";
private final String END = "<END ";
public String handleTag(String buf, String[] attrList) {
String startPattern1 = START+attrList[0]+">";
String endPattern1 = END+attrList[0]+">";
String startPattern2 = START+attrList[1]+">";
String endPattern2 = END+attrList[1]+">";
String startPattern3 = START+attrList[2]+">";
String endPattern3 = END+attrList[2]+">";
String startPattern4 = START+attrList[3]+">";
String endPattern4 = END+attrList[3]+">";
String startPattern5 = START+attrList[4]+">";
String endPattern5 = END+attrList[4]+">";
String extract1 = new String(buf);
String extract2 = new String(buf);
String extract3 = new String(buf);
String extract4 = new String(buf);
String extract5 = new String(buf);
extract1 = extract1.substring(extract1.indexOf(startPattern1)+startPattern1.length(), extract1.indexOf(endPattern1));
extract2 = extract2.substring(extract2.indexOf(startPattern2)+startPattern2.length(), extract2.indexOf(endPattern2));
extract3 = extract3.substring(extract3.indexOf(startPattern3)+startPattern3.length(), extract3.indexOf(endPattern3));
extract4 = extract4.substring(extract4.indexOf(startPattern4)+startPattern4.length(), extract4.indexOf(endPattern4));
extract5 = extract5.substring(extract5.indexOf(startPattern5)+startPattern5.length(), extract5.indexOf(endPattern5));
String s = ("BLOPABP"+extract1) + ("\nBLOPCALL"+extract2) +("\nBLOPEXP"+extract3) +("\nBLOPHEAD"+extract4)+("\nBLOPMAJ"+extract5);
return s;
}
How would I tidy up the code above into some sort of loop? Basically I have a file that i'm reading and extract the data within the tags and I'm passing the tags into this TagHandler method and returning the extracted data as a string with the tag headers without the "< START >" and "< END TAG"> leaving only the header on the start tag.
Here you go. This should do what you want.
public class TagHandler {
private final String START = "<START ";
private final String END = "<END ";
public String handleTag(String buf, String[] attrList) {
String[] blop = {"BLOPABP", "BLOPCALL", "BLOPEXP", "BLOPHEAD", "BLOPMAJ"};
String s = "";
for (int i = 0; i < attrList.length; i++) {
String startPattern = START+attrList[i]+">";
String endPattern = END+attrList[i]+">";
String extract = buf.substring(buf.indexOf(startPattern)+startPattern.length(), buf.indexOf(endPattern));
s += blop[i]+extract;
if (i < attrList.length-1) {
s += "\n";
}
}
return s;
}
}
Look out for an out of bounds exception, if attrList has more than 5 elements.
You can try something like this, optimize it if you can :
public String handleTag(String buf, String[] attrList) {
StringBuilder temp = new StringBuilder();
final String[] prefix = {"BLOPABP","\nBLOPCALL","\nBLOPEXP",
"\nBLOPHEAD","\nBLOPMAJ"};
for(int i=0;i<attrList.length;i++){
String startPattern = START+attrList[i]+">";
String endPattern = END+attrList[i]+">";
String extract = new String(buf);
extract = extract.substring(
extract.indexOf(startPattern)+startPattern.length(),
extract.indexOf(endPattern));
temp.append(prefix[i%5]+extract);
}
return temp.toString();
}
This should work. You can replace = new ArrayList<String> with = new ArrayList<>() if you're using java 7.
private final String START = "<START ";
private final String END = "<END ";
List<String> startPatterns = new ArrayList<String>();//can use ArrayList<> instead if java 1.7
List<String> stringExtracts = new ArrayList<String>();
final String[] tags = new String[]{"BLOPABP","\nBLOPCALL","\nBLOPEXP","\nBLOPHEAD","\nBLOPMAJ"};
public String handleTag(String buf, String[] attrList) {
int numPatterns = tags.length;
String s;
String extract = new String(buf);
for(int i=0; i<numPatterns; i++){
String startPattern = START+attrList[i]+">";
startPatterns.add(startPattern);
String endPattern = END+attrList[i]+">";
endPatterns.add(endPattern);
String extract = extract.substring(extract.indexOf(startPattern)+startPattern.length(), extract.indexOf(endPattern));
stringExtracts.add(extract);
s += tags[i] + extract;
}
return s;
}
This assumes that you need access to the individual startPatterns, endPatterns and stringExtracts again, not just s. If you only need s though then discard the ArrayLists - it will work like this:
private final String START = "<START ";
private final String END = "<END ";
final String[] tags = new String[]{"BLOPABP","\nBLOPCALL","\nBLOPEXP","\nBLOPHEAD","\nBLOPMAJ"};
public String handleTag(String buf, String[] attrList) {
int numPatterns = tags.length;
String s;
String extract = new String(buf);
for(int i=0; i<numPatterns; i++){
String startPattern = START+attrList[i]+">";
String endPattern = END+attrList[i]+">";
String extract = extract.substring(extract.indexOf(startPattern)+startPattern.length(), extract.indexOf(endPattern));
s += tags[i] + extract;
}
return s;
}

Java File I/O help

I have a problem with my code. I need to do several operations on a log file with this structure:
190.12.1.100 2011-03-02 12:12 test.html
190.12.1.100 2011-03-03 13:18 data.html
128.33.100.1 2011-03-03 15:25 test.html
128.33.100.1 2011-03-04 18:30 info.html
I need to get the number of visits per month, number of visits per page and number of unique visitors based on the IP. That is not the question, I managed to get all three operations working. The problem is, only the first choice runs correctly while the other choices just return values of 0 afterwards, as if the file is empty, so i am guessing i made a mistake with the I/O somewhere. Here's the code:
import java.io.*;
import java.util.*;
public class WebServerAnalyzer {
private Map<String, Integer> hm1;
private Map<String, Integer> hm2;
private int[] months;
private Scanner input;
public WebServerAnalyzer() throws IOException {
hm1 = new HashMap<String, Integer>();
hm2 = new HashMap<String, Integer>();
months = new int[12];
for (int i = 0; i < 12; i++) {
months[i] = 0;
}
File file = new File("webserver.log");
try {
input = new Scanner(file);
} catch (FileNotFoundException fne) {
input = null;
}
}
public String nextLine() {
String line = null;
if (input != null && input.hasNextLine()) {
line = input.nextLine();
}
return line;
}
public int getMonth(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
if (dtok.countTokens() == 3) {
String year = dtok.nextToken();
String month = dtok.nextToken();
String day = dtok.nextToken();
int m = Integer.parseInt(month);
return m;
}
}
return -1;
}
public String getIP(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
return ip;
}
return null;
}
public String getPage(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
return page;
}
return null;
}
public void visitsPerMonth() {
String line = null;
do {
line = nextLine();
if (line != null) {
int m = getMonth(line);
if (m != -1) {
months[m - 1]++;
}
}
} while (line != null);
// Print the result
String[] monthName = {"JAN ", "FEB ", "MAR ",
"APR ", "MAY ", "JUN ", "JUL ", "AUG ", "SEP ",
"OCT ", "NOV ", "DEC "};
for (int i = 0; i < 12; i++) {
System.out.println(monthName[i] + months[i]);
}
}
public int count() throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream("webserver.log"));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
}
return count;
} finally {
is.close();
}
}
public void UniqueIP() throws IOException{
String line = null;
for (int x = 0; x <count(); x++){
line = nextLine();
if (line != null) {
if(hm1.containsKey(getIP(line)) == false) {
hm1.put(getIP(line), 1);
} else {
hm1.put(getIP(line), hm1.get(getIP(line)) +1 );
}
}
}
Set set = hm1.entrySet();
Iterator i = set.iterator();
System.out.println("\nNumber of unique visitors: " + hm1.size());
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
System.out.print(me.getKey() + " - ");
System.out.println(me.getValue() + " visits");
}
}
public void pageVisits() throws IOException{
String line = null;
for (int x = 0; x <count(); x++){
line = nextLine();
if (line != null) {
if(hm2.containsKey(getPage(line)) == false)
hm2.put(getPage(line), 1);
else
hm2.put(getPage(line), hm2.get(getPage(line)) +1 );
}
}
Set set = hm2.entrySet();
Iterator i = set.iterator();
System.out.println("\nNumber of pages visited: " + hm2.size());
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
System.out.print(me.getKey() + " - ");
System.out.println(me.getValue() + " visits");
}
}
Any help figuring out the problem would be much appreciated as I am quite stuck.
I didn't read the code thoroughly yet, but I guess you're not setting the read position back to the beginning of the file when you start a new operation. Thus nextLine() would return null.
You should create a new Scanner for each operation and close it afterwards. AFAIK scanner doesn't provide a method to go back to the first byte.
Currently I could also think of 3 alternatives:
Use a BufferedReader and call reset() for each new operation. This should cause the reader to go back to byte 0 provided you didn't call mark() somewhere.
Read the file contents once and iterate over the lines in memory, i.e. put all lines into a List<String> and then start at each line.
Read the file once, parse each line and construct an apropriate data structure that contains the data you need. For example, you could use a TreeMap<Date, Map<Page, Map<IPAdress, List<Visit>>>>, i.e. you'd store the visits per ip address per page for each date. You could then select the appropriate submaps by date, page and ip address.
The reset method of BufferedReader that Thomas recommended would only work if the file size is smaller than the buffer size or if you called mark with a large enough read ahead limit.
I would recommend reading throught the file once and to update your maps and month array for each line. BTW, you don't need a Scanner just to read lines, BufferedReader has a readLine method itself.
BufferedReader br = ...;
String line;
while (null != (line = br.readLine())) {
String ip = getIP(line);
String page = getPage(line);
int month = getMonth(line);
// update hashmaps and arrays
}

Categories

Resources