How can I extract specific terms from each string line? - java

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines;
test.csv
line1 : "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O "
line2 : "12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
line3 : "9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
one more thing ;
the length of whitespace between each six terms is not always equal.
the length could be one,two,three or four..five... something like that..

Another try:
import java.io.File;
import java.util.Scanner;
public class HelloWorld {
// The amount of columns per row, where each column is seperated by an arbitrary number
// of spaces or tabs
final static int COLS = 7;
public static void main(String[] args) {
System.out.println("Tokens:");
try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
// Counten the current column-id
int n = 0;
String tmp = "";
StringBuilder item = new StringBuilder();
// Operating of a stream
while (scanner.hasNext()) {
tmp = scanner.next();
n += 1;
// If we have reached the fifth column, take its content and append the
// sixth column too, as the name we want consists of space-separated
// expressions. Feel free to customize of your name-layout varies.
if (n % COLS == 5) {
item.setLength(0);
item.append(tmp);
item.append(" ");
item.append(scanner.next());
n += 1;
System.out.println(item.toString()); // Doing some stuff with that
//expression we got
}
}
}
catch(java.io.IOException e){
System.out.println(e.getMessage());
}
}
}

if your line[]'s type is String
String s = line[0];
String[] split = s.split(" ");
return split[4]; //which is the fifth item
For the delimiter, if you want to go more precisely, you can use regular expression.

How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:
String[] parts = str.split("\\t");
Your expected result will be in parts[4].

Just use String.split() using a regex for at least 2 whitespace characters:
String foo = "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid

Related

What is the most efficient way to add 3 characters at a time to an araylist from a text file?

Say you have a text file with "abcdefghijklmnop" and you have to add 3 characters at a time to an array list of type string. So the first cell of the array list would have "abc", the second would have "def" and so on until all the characters are inputted.
public ArrayList<String> returnArray()throws FileNotFoundException
{
int i = 0
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
while (scanCharacters.hasNext())
{
list.add(scanCharacters.next().substring(i,i+3);
i+= 3;
}
scanCharacters.close();
return characters;
}
Please use the below code,
ArrayList<String> list = new ArrayList<String>();
int i = 0;
int x = 0;
Scanner scanCharacters = new Scanner(file);
scanCharacters.useDelimiter(System.getProperty("line.separator"));
String finalString = "";
while (scanCharacters.hasNext()) {
String[] tokens = scanCharacters.next().split("\t");
for (String str : tokens) {
finalString = StringUtils.deleteWhitespace(str);
for (i = 0; i < finalString.length(); i = i + 3) {
x = i + 3;
if (x < finalString.length()) {
list.add(finalString.substring(i, i + 3));
} else {
list.add(finalString.substring(i, finalString.length()));
}
}
}
}
System.out.println("list" + list);
Here i have used StringUtils.deleteWhitespace(str) of Apache String Utils to delete the blank space from the file tokens.and the if condition inside for loop to check the substring for three char is available in the string if its not then whatever character are left it will go to the list.My text file contains the below strings
asdfcshgfser ajsnsdxs in first line and in second line
sasdsd fghfdgfd
after executing the program result are as,
list[asd, fcs, hgf, ser, ajs, nsd, xs, sas, dsd, fgh, fdg, fd]
public ArrayList<String> returnArray()throws FileNotFoundException
{
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
String temp = "";
while (scanCharacters.hasNext())
{
temp+=scanCharacters.next();
}
while(temp.length() > 2){
list.add(temp.substring(0,3));
temp = temp.substring(3);
}
if(temp.length()>0){
list.add(temp);
}
scanCharacters.close();
return list;
}
In this example I read in all of the data from the file, and then parse it in groups of three. Scanner can never backtrack so using next will leave out some of the data the way you're using it. You are going to get groups of words (which are separated by spaces, Java's default delimiter) and then sub-stringing the first 3 letters off.
IE:
ALEXCY WOWZAMAN
Would give you:
ALE and WOW
The way my example works is it gets all of the letters in one string and continuously sub strings off letters of three until there are no more, and finally, it adds the remainders. Like the others have said, it would be good to read up on a different data parser such as BufferedReader. In addition, I suggest you research substrings and Scanner if you want to continue to use your current method.

ArrayList: Get length of longest string, Get average length of string

In Java, I have a method that reads in a text file that has all the words in the dictionary, each on their own line.
It reads each line by using a for loop and adds each word to an ArrayList.
I want to get the length of the longest word (String) in the Array. In addition, I want to get the length of the longest word in the dictionary file. It would probably be easier to split this into several methods, but I don't know the syntax.
So far, the code is have is:
public class spellCheck {
static ArrayList <String> dictionary; //the dictonary file
/**
* load file
* #param fileName the file containing the dictionary
* #throws FileNotFoundException
*/
public static void loadDictionary(String fileName) throws FileNotFoundException {
Scanner in = new Scanner(new File(fileName));
while (in.hasNext())
{
for(int i = 0; i < fileName.length(); ++i)
{
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Assuming that each word is on it's own line, you should be reading the file more like...
try (Scanner in = new Scanner(new File(fileName))) {
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
dictionary.add(dictionaryword);
}
}
Remember, if you open a resource, you are responsible for closing. See The try-with-resources Statement for more details...
Calculating the metrics can be done after reading the file, but since your here, you could do something like...
int totalWordLength = 0;
String longest = "";
while (in.hasNextLine()) {
String dictionaryword = in.nextLine();
totalWordLength += dictionaryword.length();
dictionary.add(dictionaryword);
if (dictionaryword.length() > longest.length()) {
longest = dictionaryword;
}
}
int averageLength = Math.round(totalWordLength / (float)dictionary.size());
But you could just as easily loop through the dictionary and use the same idea
(nb- I've used local variables, so you will either want to make them class fields or return them wrapped in some kind of "metrics" class - your choice)
Set a two counters and a variable that holds the current longest word found before you start reading in with your while loop. To find the average have one counter be incremented by one each time the line is read and have the second counter add up the total number of characters in each word (obviously the total number of characters entered, divided by the total number of words read -- as denoted by the total number of lines -- is the average length of each word.
As for the longest word, set the longest word to be the empty string or some dummy value like a single character. Each time you read in a line compare the current word with the previously found longest word (using the .length() method on the String to find its length) and if its longer set a new longest word found
Also, if you have all this in a file, I'd use a buffered reader to read in your input data
May be this could help
String words = "Rookie never dissappoints, dont trust any Rookie";
// read your file to string if you get string while reading then you can use below code to do that.
String ss[] = words.split(" ");
List<String> list = Arrays.asList(ss);
Map<Integer,String> set = new Hashtable<Integer,String>();
int i =0;
for(String str : list)
{
set.put(str.length(), str);
System.out.println(list.get(i));
i++;
}
Set<Integer> keys = set.keySet();
System.out.println(keys);
System.out.println(set);
Object j[]= keys.toArray();
Arrays.sort(j);
Object max = j[j.length-1];
set.get(max);
System.out.println("Tha longest word is "+set.get(max));
System.out.println("Length is "+max);

How to show sentence word by word in a separate line

The sentence String is expected to be a bunch of words separated by spaces, e.g. “Now is the time”.
showWords job is to output the words of the sentence one per line.
It is my homework, and I am trying, as you can see from the code below. I can not figure out how to and which loop to use to output word by word... please help.
import java.util.Scanner;
public class test {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.println("Enter the sentence");
String sentence = in.nextLine();
showWords(sentence);
}
public static void showWords(String sentence) {
int space = sentence.indexOf(" ");
sentence = sentence.substring(0,space) + "\n" + sentence.substring(space+1);
System.out.println(sentence);
}
}
You're on the right path. Your showWords method works for the first word, you just have to have it done until there are no words.
Loop through them, preferably with a while loop. If you use the while loop, think about when you need it to stop, which would be when there are no more words.
To do this, you can either keep an index of the last word and search from there(until there are no more), or delete the last word until the sentence string is empty.
Since this is a homework question, I will not give you the exact code but I want you to look at the method split in the String-class. And then I would recommend a for-loop.
Another alternative is to replace in your String until there are no more spaces left (this can be done both with a loop and without a loop, depending on how you do it)
Using regex you could use a one-liner:
System.out.println(sentence.replaceAll("\\s+", "\n"));
with the added benefit that multiple spaces won't leave blank lines as output.
If you need a simpler String methods approach you could use split() as
String[] split = sentence.split(" ");
StringBuilder sb = new StringBuilder();
for (String word : split) {
if (word.length() > 0) { // eliminate blank lines
sb.append(word).append("\n");
}
}
System.out.println(sb);
If you need an even more bare bones approach (down to String indexes) and more on the lines of your own code; you would need to wrap your code inside a loop and tweak it a bit.
int space, word = 0;
StringBuilder sb = new StringBuilder();
while ((space = sentence.indexOf(" ", word)) != -1) {
if (space != word) { // eliminate consecutive spaces
sb.append(sentence.substring(word, space)).append("\n");
}
word = space + 1;
}
// append the last word
sb.append(sentence.substring(word));
System.out.println(sb);
Java's String class has a replace method which you should look into. That'll make this homework pretty easy.
String.replace
Update
Use the split method of the String class to split the input string on the space character delimiter so you end up with a String array of words.
Then loop through that array using a modified for loop to print each item of the array.
import java.util.Scanner;
public class Test {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.println("Enter the sentence");
String sentence = in.nextLine();
showWords(sentence);
}
public static void showWords(String sentence) {
String[] words = sentence.split(' ');
for(String word : words) {
System.out.println(word);
}
}
}

Java format word in string

For example I have a text file that contains the contents of each line of a book, I have a java program to search for a particular word in those lines from the book.
This is the program:
import java.io.File;
import java.util.ArrayList;
import java.util.Scanner;
public class AliceSearch {
public static void main(String[] args) throws Exception {
ArrayList<String> aiw = new ArrayList<String>();
ArrayList<String> matches = new ArrayList<String>();
Scanner scan = new Scanner(new File("aiw.txt"));
Scanner input = new Scanner(System.in);
while (scan.hasNext()){
aiw.add(scan.nextLine());
}
String searchTerm;
System.out.print("Please Input Search Parameter : ");
searchTerm = input.nextLine();
boolean itemFound = false;
String currItem = null;
for(int i = 0; i<aiw.size(); i++ ) {
currItem = (String)aiw.get(i);
if (currItem.contains(searchTerm)) {
matches.add(currItem);
itemFound = true;
}
}
System.out.println("");
if ( itemFound == false ) {
System.out.println ( "No results containing "+searchTerm );
}else{
System.out.println ( "We Found the following results : " );
for(int r = 0; r < matches.size(); r++){
System.out.println("");
System.out.println(matches.get(r));
}
}
scan.close();
input.close();
}
}
I would like the searchTerm from each resultant line to be in uppercase when outputed (or when placed in the matches ArrayList). How would i go about this? I know that you use .toUpperCase(); but I do not now how i can change one word in a string of words.
Thanks in advance!
Instead of outputting it the way you do it right now:
System.out.println(matches.get(r));
can't you use
System.out.println(matches.get(r).replace(searchTerm, searchTerm.toUpperCase()));
Here is the JavaDoc for the replace() method used to replace the found word with it's uppercase version. It would be better to have
String uppercase = searchTerm.toUpperCase();
outside of the loop and then use
System.out.println(matches.get(r).replace(searchTerm, uppercase));
There is a method on String that should fit your use case exactly:
line.replace(word, word.toUpperCase());
This can be easily done using replace functionality of the pattern. We surely also need to highlight the partially capitalized words (start of the sentence, for instance), so need to create a Pattern with flags, cannot just use String.replaceAll():
Pattern highlight = Pattern.compile(
Pattern.quote(word), Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE );
String hw = word.toUpperCase();
line = highlight.matcher(line).replaceAll(hw);
The first two lines should be prepared in advance as soon as the word is known. There is no need to recompute them newly for every found line. Pattern.quote quotes reserved characters so they will be given no special meaning.

permutation(orderings) of a string of words but separated by a comma in between

I'm having some difficulty having this code generate a number of permutations(orderings) on a string separated by commas ...I can do just a regular string and have the permutations work on just letters but it is a bit more difficult when doing it with words separated by the commas...
To have the program recognize the commas I used the StringTokenizer method and I'm putting it into an arrayList but that is really as far as I have gotten ...the problem again is I'm having trouble permuting each word...to give an example I'll post it below this and then my code below that...thank you for your help everyone! ...and by permutations I mean orderings of the words separated by the comma's
For example, if the input coming in on the BufferedReader looked like:
red,yellow
one,two,three
the output on the PrintWriter should look like:
red,yellow
yellow,red
one,two,three
one,three,two
two,one,three
two,three,one
three,one,two
three,two,one
Note that the input had 3 lines total, including the blank line after "one,two,three" while the output had 11 lines total, including one blank line after "yellow,red" and two blank lines after "three,two,one". It is vital that you get the format exactly correct as the testing will be automated and will require this format. Also note that the order of the output lines for each problem does not matter. This means the first two lines of the output could also have been:
yellow,red
red,yellow
here is the code I have so far ...I have commented some stuff out so don't worry about those parts
import java.io.*;
import java.util.*;
public class Solution
{
public static void run(BufferedReader in, PrintWriter out)
throws IOException
{
String str = new String(in.readLine());
while(!str.equalsIgnoreCase(""))
{
PermutationGenerator generator = new PermutationGenerator(str);
ArrayList<String> permutations = generator.getPermutations();
for(String str: permutations)
{
out.println(in.readLine());
}
out.println();
out.println();
}
out.flush();
}
public class PermutationGenerator
{
private String word;
public PermutationGenerator(String aWord)
{
word = aWord;
}
public ArrayList<String> getPermutations()
{
ArrayList<String> permutations = new ArrayList<String>();
//if(word.length() == 0)
//{
//permutations.add(word);
//return permutations;
//}
StringTokenizer tokenizer = new StringTokenizer(word,",");
while (tokenizer.hasMoreTokens())
{
permutations.add(word);
tokenizer.nextToken();
}
/*
for(int i = 0; i < word.length(); i++)
{
//String shorterWord = word.substring(0,i) + word.substring(i + 1);
PermutationGenerator shorterPermutationGenerator = new PermutationGenerator(word);
ArrayList<String> shorterWordPermutations =
shorterPermutationGenerator.getPermutations();
for(String s: shorterWordPermutations)
{
permutations.add(word.readLine(i)+ s);
}
}*/
//return permutations;
}
}
}
You can use String.split() ( http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String) ) to get the individual words into as an array. You can separately generate all the permutations on integers {1..N} where N is the size of the word array. Then just walk the word array using the numeric permutations as indices.
Parse your input line (which is a comma-separated String ow words) into array of Strings (String[] words).
Use some permutation generator that works on a array, you can easily find such generator using google. U want a generator that can be initialized with Object[], and has a method like Object[] nextPermutation().
Put it together into your solution.
PS U can also use a Integer permutation generator and generate all permutations from 0 to (words.length - 1); each such permutation will give you an array of indexes of words[] to be printed out.

Categories

Resources