How to parse a log file in Java? - java

I have a file that has this two rows of data.
Jan 1 22:54:17 drop %LOGSOURCE% >eth1 rule: 7; rule_uid: {C1336766-9489- 4049-9817-50584D83A245}; src: 70.77.116.190; dst: %DSTIP%; proto: tcp; product: VPN-1 & FireWall-1; service: 445; s_port: 2612;
Jan 1 23:02:56 accept %LOGSOURCE% >eth1 inzone: External; outzone: Local; rule: 3; rule_uid: {723F81EF-75C9-4CBB-8913-0EBB3686E0F7}; service_id: icmp-proto; ICMP: Echo Request; src: 24.188.22.101; dst: %DSTIP%; proto: icmp; ICMP Type: 8; ICMP Code: 0; product: VPN-1 & FireWall-1;
May I really know what's the code to parse them into different columns? One problem is
eth1 rule:7;
eth1 inzone: External; outzone: Local;
I want to let them fall under the same column. I really need some desperate help as I have no knowledge of programming and I'm tasked to do this ><

You'd probably start with Java's split function for strings:
Oracle Doc
Look at example 3
I assume you could lump your first column as everything from the start to '>' after %LOGSOURCE%. I'm also guessing there are other columns that would be lumped together and that in the end you'd only expect a certain amount of columns per row.
You could use code like this:
//a line of the log can be split on '>' and ';' for the other columns of interest
//logLine is a line off the your log, I'm assuming it's a string object
string[] splitLine = logLine.split("[>;]+");
//I'm pretending there are 7 columns, for simplicity sake I'm using an ArrayList
// of string arrays (ArraList<string[]>) that would get declared
//above all this called logList
string[] logEntry = new string[7];
//Save the time stamp of the log entry by iterating through splitLine
for(int counter1 = 0; counter1 < splitLine.length; counter1++)
{
//Timestamp column
if(counter1 == 0)
logEntry[0] = splitLine[counter1];
//First column
if(counter1 == 1)
logEntry[1] = splitLine[counter1];
//Logic to determine what needs to get appended to second column,
//could be many if statements
if(...)
logEntry[1] += splitLine[counter1];
//Logic to determine what starts third column
if(...)
logEntry[2] = splitLine[counter1];
//Logic to determine what needs to get appended to third column,
//could be many if statements
if(...)
logEntry[2] += splitLine[counter1];
//And so on... till you fill all your columns up or as much as you want
}
//Add your columned log to your list for use after you've parsed up the file
logList.add(logEntry);
You'd probably stick all this logic in a for loop that continually grabs a line off your log into the logLine string used at the top of the code sample. It's not the most efficient way, but it's pretty straightforward. Hopefully this gives you a start to approaching your problem.

Related

Get parameters of nested functions

I am trying to implement a parser in Java to extract the arguments of some functions.
When I have a function like:
max(1, 2, 3)
I just simply can use a Regular Expresion to extract the args.
But all my functions are not like that. If I have some nested function, eg:
max(sum(1, max(1,2,sum(2,5)), 3, 5, mult(3,3))
I would like to obtain:
sum(1, max(1,2,sum(2,5))
3
5
mult(3,3)
I tried using a Regular Expression, but I asume the language is not regular. Another approach was splliting by ',', but did not work as well.
Is there any method for extracting the arguments of a function? I do not really know how this type of problem can be solved since there is no a pattern to use for extracting the arguments.
Any help or insight would be really appreciated. Thanks!!
Parsing a source code into an some abstract model is quite complex topic, depending on the language complexity.
But first step is usually tokenization, where you read one character at a time and detect full tokens (like variable names, function names, operators, literals etc).
Since you presented only very limited scope for the problem , you have very small set of tokens:
name of a function
( and ) to indicate method call
, to separate arguments
numbers
Reading one symbol at the time, you should be able to very easily detect when one token ends and the next one begins. Also your tokens are very distinct (i.e. you don't have to differentiate function name from variable name), you can very easily categorize them.
Once you have a token, you know the grammar (you have only function calls), you can easily build a syntax tree (where at the root you have top level function call with its arguments being children nodes).
From that structure you can easily fetch whichever parts you wish.
If you are more interested in how it works in the javac compiler, you can always check out its source code (it's open source after all):
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java
However, that may be quite a long read.
Finally found a method that works:
public List<String> parseArgs(String l){
int startIdx = l.indexOf("(") + 1;
int endIdx = l.lastIndexOf(")") - 1;
int count = 0;
int argIdx = startIdx;
List<String> args = new ArrayList<>();
for (int i = startIdx; i < endIdx; i++) {
if (l.charAt(i) == '(')
count -= 1;
else if (l.charAt(i) == ')'){
count += 1;
}
else if (l.charAt(i) == ',' && count == 0){
args.add(l.substring(argIdx, i).trim());
argIdx = i + 1;
}
}
args.add(l.substring(argIdx, endIdx + 1).trim());
return args;
}
String l = "max(sum(1, max(1,2,sum(2,5))), 3, 5, mult(3,3))";
parseArgs(l).forEach(System.out::println);
//Prints
sum(1, max(1,2,sum(2,5)))
3
5
mult(3,3)

what is the fastest way to get dimensions of a csv file in java

My regular procedure when coming to the task on getting dimensions of a csv file as following:
Get how many rows it has:
I use a while loop to read every lines and count up through each successful read. The cons is that it takes time to read the whole file just to count how many rows it has.
then get how many columns it has:
I use String[] temp = lineOfText.split(","); and then take the size of temp.
Is there any smarter method? Like:
file1 = read.csv;
xDimention = file1.xDimention;
yDimention = file1.yDimention;
I guess it depends on how regular the structure is, and whether you need an exact answer or not.
I could imagine looking at the first few rows (or randomly skipping through the file), and then dividing the file size by average row size to determine a rough row count.
If you control how these files get written, you could potentially tag them or add a metadata file next to them containing row counts.
Strictly speaking, the way you're splitting the line doesn't cover all possible cases. "hello, world", 4, 5 should read as having 3 columns, not 4.
Your approach won't work with multi-line values (you'll get an invalid number of rows) and quoted values that might happen to contain the deliminter (you'll get an invalid number of columns).
You should use a CSV parser such as the one provided by univocity-parsers.
Using the uniVocity CSV parser, that fastest way to determine the dimensions would be with the following code. It parses a 150MB file to give its dimensions in 1.2 seconds:
// Let's create our own RowProcessor to analyze the rows
static class CsvDimension extends AbstractRowProcessor {
int lastColumn = -1;
long rowCount = 0;
#Override
public void rowProcessed(String[] row, ParsingContext context) {
rowCount++;
if (lastColumn < row.length) {
lastColumn = row.length;
}
}
}
public static void main(String... args) throws FileNotFoundException {
// let's measure the time roughly
long start = System.currentTimeMillis();
//Creates an instance of our own custom RowProcessor, defined above.
CsvDimension myDimensionProcessor = new CsvDimension();
CsvParserSettings settings = new CsvParserSettings();
//This tells the parser that no row should have more than 2,000,000 columns
settings.setMaxColumns(2000000);
//Here you can select the column indexes you are interested in reading.
//The parser will return values for the columns you selected, in the order you defined
//By selecting no indexes here, no String objects will be created
settings.selectIndexes(/*nothing here*/);
//When you select indexes, the columns are reordered so they come in the order you defined.
//By disabling column reordering, you will get the original row, with nulls in the columns you didn't select
settings.setColumnReorderingEnabled(false);
//We instruct the parser to send all rows parsed to your custom RowProcessor.
settings.setRowProcessor(myDimensionProcessor);
//Finally, we create a parser
CsvParser parser = new CsvParser(settings);
//And parse! All rows are sent to your custom RowProcessor (CsvDimension)
//I'm using a 150MB CSV file with 1.3 million rows.
parser.parse(new FileReader(new File("c:/tmp/worldcitiespop.txt")));
//Nothing else to do. The parser closes the input and does everything for you safely. Let's just get the results:
System.out.println("Columns: " + myDimensionProcessor.lastColumn);
System.out.println("Rows: " + myDimensionProcessor.rowCount);
System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");
}
The output will be:
Columns: 7
Rows: 3173959
Time taken: 1279 ms
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
IMO, What you are doing is an acceptable way to do it. But here are some ways you could make it faster:
Rather than reading lines, which creates a new String Object for each line, just use String.indexOf to find the bounds of your lines
Rather than using line.split, again use indexOf to count the number of commas
Multithreading
I guess here are the options which will depend on how you use the data:
Store dimensions of your csv file when writing the file (in the first row or as in an additional file)
Use a more efficient way to count lines - maybe http://docs.oracle.com/javase/6/docs/api/java/io/LineNumberReader.html
Instead of creating an arrays of fixed size (assuming thats what you need the line count for) use array lists - this may or may not be more efficient depending on size of file.
To find number of rows you have to read the whole file. There is nothing you can do here. However your method of finding number of cols is a bit inefficient. Instead of split just count how many times "," appeard in the line. You might also include here special condition about fields put in the quotas as mentioned by #Vlad.
String.split method creates an array of strings as a result and splits using regexp which is not very efficient.
I find this short but interesting solution here:
https://stackoverflow.com/a/5342096/4082824
LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")));
lnr.skip(Long.MAX_VALUE);
System.out.println(lnr.getLineNumber() + 1); //Add 1 because line index starts at 0
lnr.close();
My solution is simply and correctly process CSV with multiline cells or quoted values.
for example We have csv-file:
1,"""2""","""111,222""","""234;222""","""""","1
2
3"
2,"""2""","""111,222""","""234;222""","""""","2
3"
3,"""5""","""1112""","""10;2""","""""","1
2"
And my solution snippet is:
import java.io.*;
public class CsvDimension {
public void parse(Reader reader) throws IOException {
long cells = 0;
int lines = 0;
int c;
boolean qouted = false;
while ((c = reader.read()) != -1) {
if (c == '"') {
qouted = !qouted;
}
if (!qouted) {
if (c == '\n') {
lines++;
cells++;
}
if (c == ',') {
cells++;
}
}
}
System.out.printf("lines : %d\n cells %d\n cols: %d\n", lines, cells, cells / lines);
reader.close();
}
public static void main(String args[]) throws IOException {
new CsvDimension().parse(new BufferedReader(new FileReader(new File("test.csv"))));
}
}

String naming convention in java

I am currently trying to make a naming convention. The idea behind this is parsing.
Lets say I obtain an xml doc. Everything can be used once, but these 2 in the code below can be submitted several times within the xml document. It could be 1, or simply 100.
This states that ItemNumber and ReceiptType will be grabbed for the first element.
ItemNumber1 = eElement.getElementsByTagName("ItemNumber").item(0).getTextContent();
ReceiptType1 = eElement.getElementsByTagName("ReceiptType").item(0).getTextContent();
This one states that it will grab the second submission if they were in their twice.
ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(1).getTextContent();
ReceiptType2 = eElement.getElementsByTagName("ReceiptType").item(1).getTextContent();
ItemNumber and ReceiptType must both be submitted together. So if there is 30 ItemNumbers, there must be 30 Receipt Types.
However now I would like to set this in an IF statement to create variables.
I was thinking something along the lines of:
int cnt = 2;
if (eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();)
**MAKE VARIABLE**
Then make a loop which adds one to count to see if their is a third or 4th. Now here comes the tricky part..I need them set to a generated variable. Example if ItemNumber 2 existed, it would set it to
String ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
I do not wish to make pre-made variable names as I don't want to code a possible 1000 variables if that 1000 were to happen.
KUDOS for anyone who can help or give tips on just small parts of this as in the naming convention etc. Thanks!
You don't know beforehand how many ItemNumbers and ReceiptTypes you'll get ? Maybe consider using two Lists (java.util.List). Here is an example.
boolean finished = ... ; // true if there is no more item to process
List<String> listItemNumbers = new ArrayList<>();
List<String> listReceiptTypes = new ArrayList<>();
int cnt = 0;
while(!finished) {
String itemNumber = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
String receiptType = eElement.getElementsByTagName("ReceiptType").item(cnt).getTextContent();
listItemNumbers.add(itemNumber);
listReceiptTypes.add(receiptType);
++cnt;
// update 'finished' (to test if there are remaining itemNumbers to process)
}
// use them :
int indexYouNeed = 32; // for example
String itemNumber = listItemNumbers.get(indexYouNeed); // index start from 0
String receiptType = listReceiptTypes.get(indexYouNeed);

resultSet iterating through rows within the same line

Ok that may not make much sense so I'll try and clarify with code, I've got a basic Result set like so: -
ResultSet results = stmt.executeQuery(constants.execQueueSQL());
I then want to iterate through this with a while loop like so: -
while (results.next()){
String val1 = results.getString("col 1");
String val2 = results.getString("col 2");
String val3 = results.getString("col 3");
}
Now, when I debug this within eclipse I can see that before executing the while line the currentRow is already showing as 1 in results, and as I step to each line within the while the currentRow count increases by 1!!
I know that I have 4 rows returned but by the time I get to retrieving col 3 I'm already out, and all subsequent values are wrong as they are being obtained from separate lines, why is this...?

How can Selenium RC process dynamic rows?

I have a table similar to the following:
Part #
Price
Status
1st Part #
$1.00
OK
2nd Part #
$2.00
Discontinued
Nth Part #
$N.00
Reordered
My java code will be looking for the status of "Nth Part #" where I have no idea how big the table is, how many columns it has, and no idea what N is (until run time). In Ruby/WATIR, I would have used the table's id to grab it's HTML, and then used Ruby to iterate over the rows until the part # matched, and then check that row's corresponding status in the Status column (whichever column that might be, but it's set in the hd header's row).
Selenium's standard table lookup function selenium.getTable("table.1.2") only works for static tables that contain the same contents for each test. The overkill selenium.get_html_source is a waste since selenium knows how to find the table already, plus then I have to parse the entire web page.
Any ideas on how I can grab the html of the table, and what would be the best way to iterate over the rows and/or columns?
Thanks in advance.
The easiest thing to do would be to use getTable like this
selenium.getTable("table." + (1 + n) + ".3")
to get the "Status" cell for the nth row if you know what n will be at runtime.
If you are trying to iterate over all of the rows in the table, you could do something like this
try {
for(int n = 1; true; n++) {
String cellContents = selenium.getTable("table." + n + ".3");
//do something with n
}
}
catch {
//handle end of table
}
or, alternatively
final int rowCount = (int)selenium.getXPathCount("id('table')/tbody/tr");
for(int n = 1; n < rowCount; n++) {
String cellContents = selenium.getTable("table." + n + ".3");
}
Remember that in getTable(locator.row.column), row and column start at 1.
Not exactly what you're asking for, but I solved a similar problem by assigning the unique id (part number it sounds like in your case) to be the html id of the tr. Then I used the Selenium xpath locators to get the row and columns I needed for my test.

Categories

Resources