Java - algorithm to find executable lines of code - java

I wrote a java program which scans and finds Executable lines of codes (ELOC), blank lines of codes(BLOC) and comments (CLOC) for only java and c++ codes. Following is my code:
if(extension.contains("java") || extension.contains("c++"))
{
Scanner input = new Scanner(fileObject);
while(input.hasNext())
{
String s = input.nextLine();
if(s.length()==0)
{
bloc++;
}
else if(s.contains("/*") || s.startsWith("/*"))
{
cloc++;
while(!s.contains("*/"))
{
cloc++;
s = input.nextLine();
}
}
else if(s.contains("//"))
{
cloc++;
}
else
{
eloc++;
}
}//while
System.out.println("ELOC: "+(eloc));
System.out.println("Blank Lines: "+bloc);
System.out.println("Comment Lines: "+cloc);
}
I ran different java and c++ source codes but it does not always give the correct answer. What Can I do to make it better? Is there any java code online that I can use?
For this question, I'm only counting the executable lines of codes. If a line looks like following:
int x=0;//some comment
then the line above should be counted as one executable line. Following is my updated code:
String extension=getExtension(fileObject.getName());
if(extension.contains("java") || extension.contains("c++"))
{
Scanner input = new Scanner(fileObject);
String s;
while(input.hasNext())
{
s = input.nextLine().trim();
eloc++;
if(s.equals(""))
{
bloc++;
}
if(s.startsWith("//"))
{
cloc++;
}
if(s.contains("/*") && !s.contains("*\\"))
{
cloc++;
while(!s.contains("*/"))
{
cloc++;
eloc++;
s = input.nextLine();
}
}
else if(s.contains("/*") && s.contains("*\\"))
{
cloc++;
}
}
System.out.println("Total number of lines: "+eloc);
System.out.println("ELOC: "+(eloc-(cloc+bloc)));
System.out.println("Blank Lines: "+bloc);
System.out.println("Comment Lines: "+cloc);
}
Any comment/advice will be appreciated..Thanks!

On a Unix system you could simply use cloc. This gives you the following output:
src$ cloc .
51 text files.
51 unique files.
285 files ignored.
http://cloc.sourceforge.net v 1.53 T=0.5 s (82.0 files/s, 5854.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Java 39 618 119 2145
XML 2 8 0 37
-------------------------------------------------------------------------------
SUM: 41 626 119 2182
-------------------------------------------------------------------------------
Code lines do not contain comments or blanks, but do include something like block brackets or import statements using cloc for Java.
There are other tools available, but this is the most simple if you just need to count the code lines. Hope this helps.

Blank lines might not have a length of zero. Their content might contain whitespace, at least mine do. Try trimming it before checking length to get a more accurate count.
The only thing else I can say is that your numbers will be off if you have lines that contain code and comments. It looks like the code you have now will consider an entire line a comment if it even partially contains a comment. For example:
Validate(input); // This validates user input
This will not be counted as ELOC but as CLOC. This may not be a problem if the coding style is more like this:
// Validate user input
Validate(input);
But not every developer will use the second way. I personally use a mix of both depending on context.

Example which does not produce expected counts:
int a;
a = 7; // comment, yeah
int b /* my favorite variable */ = 3;
executeMethod(dataField,
moreData,excitingBoolean,resultSetFromMyGrandma,
anotherParameterTakingAnotherWholeLine);
Your program is not handling comments or multi-line statements very gracefully.
Edit
I would suggest parsing it fully into a tree, recognizing comments and executable lines of code by the grammar that Java compilers use, and counting from there. There are plenty of exceptions that simple checks might skip over. Additionally, consider the line:
String commentCodeFun = " // not a real comment ";
It's a nightmare for your current approach

Related

How to know if there is some code before a comment in java using kotlin?

I'm interested in writing some code to detect a special case in java code where we have code with some comments on the same line.
int i = 10 //here is the comment
I would like to be able to detect such lines using kotlin's File forEachLine method. However I don't know how to do it.
The only thing I've been able to do is to find whenever a line contains a comment by doing :
File(fileName).forEachLine {
if(it.contains("//")){
println("There is a comment!")
}
}
I'm not interested in detecting comments but only those lines where there is code and a comment on the same line.
Note: fileName is a java file ex: Test.java that we read line by line.
You could use a combination of split and filter functions on each line of your file:
fun getCodeLinesWithComments(lines: ArrayList<String>): ArrayList<String> {
val codeLinesWithComments = arrayListOf<String>()
for (line in lines) {
if (line.split("//").filter { it.isNotEmpty() }.count() == 2) {
codeLinesWithComments.add(line)
}
}
return codeLinesWithComments
}

String split from a CSV - Java

I am having a small problem, I hope you can help.
I am reading a CSV in java, in which one of the column has string as follows:
a. "123345"
b. "12345 - 67890"
I want to split this like(Split it into two separate columns):
a. "123345", ""
b. "12345","67890"
Now, when I am using Java's default split function, it splits the string as follows:
a. "123345,"
b. "12345,67890" (Which is basically a string)
Any idea how can I achieve this? I have wasted my 3 hours on this. Hope any one can help.
Code as follows:
while ((line = bufferReader.readLine()) != null)
{
main = line.split("\\,"); //splitting the CSV using ","
//I know that column # 13 is the column where I can find digits like "123-123" etc..
therefore I have hard coded it.
if (main[12].contains("-"))
{
temp = main[12].split("-");
//At this point, when I print temp, it still shows me a string.
//What I have to do is to write them to the csv file.
E.g: ["86409, 3567"] <--Problem here!
}
else
{
//do nothing
}
}
after this, i will write the main[] array to the file.
Please check if java.util.StringTokenizer helps
Example:
StringTokenizer tokenizer = new StringTokenizer(inputString, ";")
Manual: StringTokenizer docs

IndexOf(), String index out of bounds: -1

I have no idea what is happening. I have a list of products along with a number separated with a tab. When I use indexOf() to find the tab, I get a String index out of bounds error, and it says the index is -1. Here's the code:
package taxes;
import java.util.*;
import java.io.*;
public class Taxes {
public static void main(String[] args) throws IOException {
//File aFile = new File("H:\\java\\PrimeNumbers\\build\\classes\\primenumbers\\priceList.txt");
File aFile = new File("C:\\Users\\Tim\\Documents\\NetBeansProjects\\Taxes\\src\\taxes\\priceList.txt");
priceChange(aFile);
}
static void priceChange(File inFile) throws IOException {
Scanner scan = new Scanner("priceList.txt");
char tab = '\t';
while (scan.hasNextLine()) {
String line = scan.nextLine();
int a = line.indexOf(tab);
String productName = line.substring(0,a);
String priceTag = line.substring(a);
}
}
}
And here's the input:
Plyer set 10
Jaw Locking Plyers 10
Cable Cutter 7
16 oz. Hammer 5
64 oz. Dead Blow Hammer 12
Sledge Hammer 20
Cordless Drill 22
Hex Impact Driver 50
Drill Bit Set 30
Miter Saw 200
Circular Saw 40
Scanner scan = new Scanner("priceList.txt");
This line of code is wrong. This Scanner instance will scan the String "priceList.txt". It doesn't contain a tab, therefore indexOf returns -1.
Change it to:
Scanner scan = new Scanner(inFile);
to use the method argument, that is the desired file instance of your priceList.txt.
String.indexOf(char) will return -1 if an instance isn't found.
You need to check before proceeding that a isn't negative.
You can read more about the indexOf method here and here.
Because you are checking int a = line.indexOf(tab) in every iteration of the while loop, there has to be a tab in every single line of your document in order for the error to be prevented.
When your while (scan.hasNextLine()) loop runs into a line with no tab in it, the index is going to be -1, and you get the StringIndexOutOfBoundsException when trying to get line.substring(0,a), with a being -1.
while (scan.hasNextLine()) {
String line = scan.nextLine();
int a = line.indexOf(tab);
if(a!=-1) {
String productName = line.substring(0,a);
String priceTag = line.substring(a);
}
}
If you look very carefully at the input lines you have posted, you'll see
Jaw Locking Plyers 10
...
Cordless Drill 22
Hex Impact Driver 50
Drill Bit Set 30
that the "Hex Impact Driver" line has the price two characters to the right of the one in the lines before and after. This is an indication that "50" does not start at a tab position whereas "10" is at such a position, the next after the one for "22" and "30".
The Q&A editor does preserve TABs, so your editor preserves them as well, and your program should be able to recognize a TAB in the input lines.
That said, a TAB entered by hand (!) is a very poor choice for a separator. As you have experienced, text file presentation doesn't show it. It would be much better to use a special character that does not occur in the product names. Plausible choices are '|', '#', and '\'.
Another good way would be to use pattern matching to find the numeric price at the end of a line - the product name is what remains after removing the price and calling trim() on the remaining string.
Since it has been verified that indexOf(tab) returns -1, the question is why does the line of text not contain t a tab when you seem certain that it does?
The answer is most likely the settings on your IDE. For instance, I usually configure Netbeans to convert a tab to three spaces. So if you typed this input file yourself within an IDE, the tab-to-space conversion is likely the problem.
Work around:
If we copy/paste some text into Netbeans that includes tabs, the tabs do not get converted to spaces.
The text file could be created with notepad or any other simple text editor to avoid the problem.
Change the settings on your IDE, at least for this project.

Verifying unexpected empty lines in a file

Aside: I am using the penn.txt file for the problem. The link here is to my Dropbox but it is also available in other places such as here. However, I've not checked whether they are exactly the same.
Problem statement: I would like to do some word processing on each line of the penn.txt file which contains some words and syntactic categories. The details are not relevant.
Actual "problem" faced: I suspect that the file has some consecutive blank lines (which should ideally not be present), which I think the code verifies but I have not verified it by eye, because the number of lines is somewhat large (~1,300,000). So I would like my Java code and conclusions checked for correctness.
I've used slightly modified version of the code for converting file to String and counting number of lines in a string. I'm not sure about efficiency of splitting but it works well enough for this case.
File file = new File("final_project/penn.txt"); //location
System.out.println(file.exists());
//converting file to String
byte[] encoded = null;
try {
encoded = Files.readAllBytes(Paths.get("final_project/penn.txt"));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String mystr = new String(encoded, StandardCharsets.UTF_8);
//splitting and checking "consecutiveness" of \n
for(int j=1; ; j++){
String split = new String();
for(int i=0; i<j; i++){
split = split + "\n";
}
if(mystr.split(split).length==1) break;
System.out.print("("+mystr.split(split).length + "," + j + ") ");
}
//counting using Scanner
int count=0;
try {
Scanner reader = new Scanner(new FileInputStream(file));
while(reader.hasNext()){
count++;
String entry = reader.next();
//some word processing here
}
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
System.out.println(count);
The number of lines in Gedit--if I understand correctly--matched the number of \n characters found at 1,283,169. I have verified (separately) that the number of \r and \r\n (combined) characters is 0 using the same splitting idea. The total splitting output is shown below:
(1283169,1) (176,2) (18,3) (13,4) (11,5) (9,6) (8,7) (7,8) (6,9) (6,10) (5,11) (5,12) (4,13) (4,14) (4,15) (4,16) (3,17) (3,18) (3,19) (3,20) (3,21) (3,22) (3,23) (3,24) (3,25) (2,26) (2,27) (2,28) (2,29) (2,30) (2,31) (2,32) (2,33) (2,34) (2,35) (2,36) (2,37) (2,38) (2,39) (2,40) (2,41) (2,42) (2,43) (2,44) (2,45) (2,46) (2,47) (2,48) (2,49) (2,50)
Please answer whether the following statements are correct or not:
From this, what I understand is that there is one instance of 50 consecutive \n characters and because of that there are exactly two instances of 25 consecutive \n characters and so on.
The last count (using Scanner) reading gives 1,282,969 which is an exact difference of 200. In my opinion, what this means is that there are exactly 200 (or 199?) empty lines floating about somewhere in the file.
Is there any way to separately verify this "discrepancy" of 200? (something like a set-theoretic counting of intersections maybe)
A partial answer to question (the last part) is as follows:
(Assuming the two statements in the question are true)
If instead of printing number of split parts, if you print no. of occurrences of \n j times, you'll get (simply doing a -1):
(1283168,1) (175,2) (17,3) (12,4) (10,5) (8,6) (7,7) (6,8) (5,9) (5,10) (4,11) (4,12) (3,13) (3,14) (3,15) (3,16) (2,17) (2,18) (2,19) (2,20) (2,21) (2,22) (2,23) (2,24) (2,25) (1,26) (1,27) (1,28) (1,29) (1,30) (1,31) (1,32) (1,33) (1,34) (1,35) (1,36) (1,37) (1,38) (1,39) (1,40) (1,41) (1,42) (1,43) (1,44) (1,45) (1,46) (1,47) (1,48) (1,49) (1,50)
Note that for j>3, product of both numbers is <=50, which is your maximum. What this means is that there is a place with 50 consecutive \n characters and all the hits you are getting from 4 to 49 are actually part of the same.
However for 3, the maximum multiple of 3 less than 50 is 48 which gives 16 while you have 17 occurrences here. So there is an extra \n\n\n somewhere with non-\n character on both its 'sides'.
Now for 2 (\n\n), we can subtract 25 (coming from the 50 \ns) and 1 (coming from the separate \n\n\n) to obtain 175-26 = 149.
Counting for the discrepancy, we should sum (2-1)*149 + (3-1)*1 + (50-1)*1, the -1 coming because first \n in each of these is accounted for in the Scanner counting. This sum is 200.

How to read a string with spaces in Java

I am trying to read a user input string which must contain spaces. Right now I'm using:
check = in.nextLine();
position = name.names.indexOf(check);
if (position != -1) {
name.names.get(position);
} else {
System.out.println("Name does not exist");
}
this just returns various errors.
your question isn't very clear - specfically you like like you are checking that what the person has typed matches a known list, not that it does or doesn't have spaces in it, but taking you at your word:
Read the whole line in, then check using
a) Regex
b) indexof() - if your check is very simple
Possibly also want to do a length check on the input line as well (i.e all lines should be < 255 chars or something) , just to be paranoid
If you are doing more like what you code sample looks like then you do something like
ArrayList<String> KnownListOfNames = .....
if(!KnownListOfNames.Contains(UserEnteredString)){
System.out.println("Name not found");
}
Typically you would also do some basic input validation first - google for "SQL injection" if you want to know more.

Categories

Resources