Count no of lines and words from string - java

I want to count the number of words and lines from a string content.
here is my code:
private int[] getLineAndWordCount(final String textContent) {
int wordCount = 0;
int lineCount = 0;
if (textContent.length() > 0) {
textContent = textContent.replace("\t", " ");
String[] newLineArrays = textContent.split("\n");
lineCount = newLineArrays.length;
for (String newLineStr : newLineArrays) {
String[] wordsArray = newLineStr.trim().split(" ");
for (String word : wordsArray) {
if (word.length() > 0) {
wordCount++;
}
}
}
}
return new int[]{lineCount, wordCount};
}
This codes works fine but during exceution it will create so many subStrings. So is there any other effective way to do the same thing. Thanks.

Try to use java.util.Scanner. For instance:
Scanner textScanner = new Scanner(text);
while (textScanner.hasNextLine()) {
linesCount++;
Scanner wordsScanner = new Scanner(textScanner.nextLine());
while (wordsScanner.hasNext()) {
wordsCount++;
wordsScanner.next();
}
}
A javadoc for java.util.Scanner: http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

You can try this way.
Scanner scanner=new Scanner(new File("Location"));
int numberOfLines=0;
StringTokenizer stringTokenizer=null;
int numberOfWords=0;
while (scanner.hasNextLine()){
stringTokenizer=new StringTokenizer(scanner.nextLine()," ");
numberOfWords=numberOfWords+stringTokenizer.countTokens();
numberOfLines++;
}
System.out.println("Number of lines :"+numberOfLines);
System.out.println("Number of words :"+numberOfWords);

Usig Regex
String str = "A B C\n D E F\n";
Pattern compile = Pattern.compile("\n");
Matcher matcher = compile.matcher(str);
int count = 0;
while(matcher.find()){
count++;
}
System.out.println(count);//2
count=0;
Pattern compile1 = Pattern.compile("\\s+");
Matcher matcher1 = compile1.matcher(str);
while(matcher1.find()){
count++;
}
System.out.println(count);//6

You can also try this
int line=str.trim().split("\n").length;
int words=str.trim().split("\\s+").length;

Related

How to remove stop words in Java

I am trying to find top k words in a "data" text file. But I cannot remove stopwords including in "stop.txt" should I do it manually adding stopwords one by one or there is a method to read stop.txt file and remove these words in data.txt file?
try {
System.out.println("Enter value of 'k' words:: ");
Scanner in = new Scanner(System.in);
int n = in.nextInt();
w = new String[n];
r = new int[n];
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader SW = new BufferedReader(new FileReader("stop.txt"));
for(String line; (line = SW.readLine()) != null;)
stopWords.add(line.trim());
SW.close();
FileReader fr = new FileReader("data.txt");
BufferedReader br = new BufferedReader(fr);
String text = "";
String sz = null;
while((sz=br.readLine())!=null){
text = text.concat(sz);
}
String[] words = text.split(" ");
String[] uniqueLabels;
int count = 0;
uniqueLabels = getUniqLabels(words);
for(int j=0; j<n; j++){
r[j] = 0;
}
for(String l: uniqueLabels)
{
if("".equals(l) || null == l)
{
break;
}
for(String s : words)
{
if(l.equals(s))
{
count++;
}
}
for(int i=0; i<n; i++){
if(count>r[i]){
r[i] = count;
w[i] = l;
break;
}
}
count=0;
}
display(n);
} catch (Exception e) {
System.err.println("ERR "+e.getMessage());
}
Read file contents by:
List<String> stopwords = Files.readAllLines(Paths.get("english_stopwords.txt"));
Then use this for removing stop words:
ArrayList<String> allWords =
Stream.of(original.toLowerCase().split(" "))
.collect(Collectors.toCollection(ArrayList<String>::new));
allWords.removeAll(stopwords);
String result = allWords.stream().collect(Collectors.joining(" "));
Removing Stopwords from a String in Java

How to read integers from a file that are separated with semi colon?

So in my codes, I am trying to read a file that is like:
100
22
123;22
123 342;432
but when it outputs it would include the ";" ( ex. 100,22,123;22,123,342;432} ).
I am trying to make the file into an array ( ex. {100,22,123,22,123...} ).
Is there a way to read the file, but ignore the semicolons?
Thanks!
public static void main(String args [])
{
String[] inFile = readFiles("ElevatorConfig.txt");
for ( int i = 0; i <inFile.length; i = i + 1)
{
System.out.println(inFile[i]);
}
System.out.println(Arrays.toString(inFile));
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
public static String[] readFiles(String file)
{
int ctr = 0;
try{
Scanner s1 = new Scanner(new File(file));
while (s1.hasNextLine()){
ctr = ctr + 1;
s1.next();
}
String[] words = new String[ctr];
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
words[i] = s2.next();
}
return words;
}
catch(FileNotFoundException e)
{
return null;
}
}
Replace this by
public static String[] readFiles(String file) {
List<String> retList = new ArrayList<String>();
Scanner s2 = new Scanner(new File(file));
for ( int i = 0 ; i < ctr ; i = i + 1){
String temp = s2.next();
String[] tempArr = se.split(";");
for(int k=0;k<tempArr.length;k++) {
retList.add(tempArr[k]);
}
}
return (String[]) retList.toArray();
}
Use regex. Read the entire file into a String (read each token as a String and append a blank space after each token in the String) and then split it at blank spaces and semi colons.
String x <--- contains all contents of the file
String[] words = x.split("[\\s\\;]+");
The contents of words[] are:
"100", "22", "123", "22", "123", "342", "432"
Remember to parse them to int before using as numbers.
Simple way to use BufferedReader Read line by line then split by ;
public static String[] readFiles(String file)
{
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String allfilestring = sb.toString();
String[] array = allfilestring.split(";");
return array;
}
You can use split() to split the string into array according to your requirement using regex.
String s; // string you have read from the file
String[] s1 = s.split(" |;"); // s1 contains the strings separated by space and ";"
Hope it helps
Keep the code for counting the size of the array.
I would just change the way you input your values.
for (int i = 0; i < ctr; i++) {
words[i] = "" + s1.nextInt();
}
Another option is to replace all non digit characters in your complete file string with a space. That way any non number character is ignored.
BufferedReader br = new BufferedReader(new FileReader(file)))
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
String str = sb.toString();
str = str.replaceAll("\\D+"," ");
Now you have a string with numbers separated by spaces, we can tokenize them into number strings.
String[] final = str.split("\\s+");
then convert to int datatypes.

Swap the word in the String

input:-
1
Ans kot
Output:-
kot Ans
INPUT :
the first line of the input contains the number of test cases. Each test case consists of a single line containing the string.
OUTPUT :
output the string with the words swapped as stated above.**
Code:-
Scanner sc = new Scanner(System.in);
int a = sc.nextInt();
StringBuffer result = new StringBuffer();
for (int i = 0; i < a; i++) {
String b = sc.next();
String my[] = b.split(" ");
StringBuffer r = new StringBuffer();
for (int j = my.length - 1; j > 0; j--) {
r.append(my[j] + " ");
}
r.append(my[0] + "\n");
result.append(r.toString());
}
System.out.println(result.toString());
}
What is wrong in my code ? above is code which i am trying.
String my[] = b.split(" ");
StringBuffer r = new StringBuffer();
for (int j = my.length - 1; j > 0; j--) {
r.append(my[j] + " ");
}
this snippet of your code is only gonna reverse the sentence "word by word" not "character by character". therefore, you need reverse the string (my[j]) before you append it into the StringBuffer
Use this
Scanner sc = new Scanner(System.in);
int a = sc.nextInt();
sc.nextLine();
StringBuffer result = new StringBuffer();
for (int i = 0; i < a; i++) {
String b = sc.nextLine();
String my[] = b.split(" ");
StringBuffer r = new StringBuffer();
for (int j = my.length - 1; j > 0; j--) {
r.append(my[j] + " ");
}
r.append(my[0] + "\n");
result.append(r.toString());
}
System.out.println(result.toString());
}
Multiple things:
You are using next api which will just read your string that you type word by word and you loop until a i.e. in your example just once. So instead use nextLine api which will read whole line instead of just a word and then split by space:
String b = sc.nextLine();
You are reading input with nextInt api followed by enter, you you might sometime end up having return character when reading next token using next api. Instead use:
int a = Integer.parseInt(sc.nextLine());
You are using StringBuffer which has an overhead of obtaining mutex and hence should use StringBuilder.
Takes String input and return String in reverse order of each characters.
String reverse(String x) {
int i = x.length() - 1;
StringBuilder y = new StringBuilder();
while (i >= 0) {
y.append(x.charAt(i));
i--;
}
return y.toString();
}
public static String reverseWords(String input) {
Deque<String> words = new ArrayDeque<>();
for (String word: input.split(" ")) {
if (!word.isEmpty()) {
words.addFirst(word);
}
}
StringBuilder result = new StringBuilder();
while (!words.isEmpty()) {
result.append(words.removeFirst());
if (!words.isEmpty()) {
result.append(" ");
}
}
return result.toString();
}
You can run this code:
String[] splitted = yourString.split(" ");
for (int i = splitted.length-1; i>=0; i--){
System.out.println(splitted[i]);
}
Code:-
Scanner sc =new Scanner(System.in);
int a =Integer.parseInt(sc.nextLine());
StringBuffer result= new StringBuffer();
for (int i = 0; i <a; i++) {
String b=sc.nextLine();
String my[]= b.split(" ");
StringBuffer r = new StringBuffer();
for (int j = my.length-1; j >0; j--) {
r.append(my[j]+" ");
}
r.append(my[0] + "\n");
result.append(r.toString());
}
System.out.println(result.toString());
enter code here

removing all character from string except a-z in array

i am trying to read words from the text file and store it in array.Problem from the code i tried as shown below is that it reads all characters such as "words," and "read." but i only want "words" and "read" in an array.
public String[] openFile() throws IOException
{
int noOfWords=0;
Scanner sc2 = new Scanner(new File(path));
while(sc2.hasNext())
{
noOfWords++;
sc2.next();
}
Scanner sc3 = new Scanner(new File(path));
String bagOfWords[] = new String[noOfWords];
for(int i = 0;i<noOfWords;i++)
{
bagOfWords[i] =sc3.next();
}
sc3.close();
sc2.close();
return bagOfWords;
}
Use regex replace :
replaceAll("([^a-zA-Z]+)","");
And apply that line to
bagOfWords[i] = sc3.next().replaceAll("([^a-zA-Z]+)","");
Use this code:
for (int i = 0; i < noOfWords; i++) {
bagOfWords[i] = sc3.next().replaceAll("[^A-Za-z0-9 ]", "");
}
You probably want only letters. In this case, you can use Character.isLetter(char) method.
Snippet:
String token = "word1";
String newToken = "";
for (int i = 0; i < token.length(); i++) {
char c = token.charAt(i);
if(java.lang.Character.isLetter(c)){
newToken += c;
}
}
System.out.println(newToken);

Copying characters in a string

i am trying to delete every digit from a string and then copy the letter that comes after that digit.
So for example the string 4a2b should output aaaabb.
So far my code looks like this:
Scanner scan= new Scanner(System.in);
String s = scan.nextLine();
String newString = s.replace(" ", "");
newString=newString.replaceAll("\\W+", "");
newString=newString.replaceAll("\\d+", "");
System.out.println(newString);
Is it possible to use regex and replaceAll to do that?
Try,
String newString = "4a2b";
String num = "";
StringBuilder res = new StringBuilder();
for (int i = 0; i < newString.length(); i++) {
char ch = newString.charAt(i);
if (Character.isDigit(ch)) {
num += ch;
} else if (Character.isLetter(ch)) {
if (num.length() > 0) {
for (int j = 0; j < Integer.parseInt(num); j++) {
res.append(ch);
}
}
num="";
}
}
System.out.println(res);
Try this:
public static void main(String[] args)
{
String str = "ae4a2bca";
Matcher m = Pattern.compile("(\\d+)(.)").matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb, times("$2", Integer.parseInt(m.group(1))));
}
m.appendTail(sb);
System.out.println(sb.toString());
}
private static String times(String string, int t)
{
String str = "";
for (int i = 0; i < t; ++i) str += string;
return str;
}

Categories

Resources