I have a string (taken from file):
Computer: intel, graphic card: Nvidia,
Mouse: razer, color: white
etc.
I need to take words between ":" and ",".
When I'm doing this in that way
Scanner sc = new Scanner(new File(path));
String str = sc.nextLine();
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
for (int i = 0; i < str.length(); i++) {
list.add(str.substring(str.indexOf(":"), str.indexOf(",")));
}
System.out.println("test");
sc.nextLine();
}
I'm only taking ": intel".
I don't know how to take more word from same line and them word from next line.
Assuming the content of the file, test.txt is as follows:
Computer: intel, graphic card: Nvidia
Mouse: razer, color: white
The following program will
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
class Coddersclub {
public static void main(String[] args) throws FileNotFoundException {
Scanner sc = new Scanner(new File("test.txt"));
ArrayList<String> list = new ArrayList<String>();
String str = "";
while (sc.hasNextLine()) {
str = sc.nextLine();
String[] specs = str.split(",");
for (String item : specs) {
list.add(item.substring(item.indexOf(":") + 1).trim());
}
}
System.out.println(list);
}
}
output:
[intel, Nvidia, razer, white]
Note: if you are looking for the list to be as [: intel, : Nvidia, : razer, : white], replace list.add(item.substring(item.indexOf(":") + 1).trim()); with list.add(item.substring(item.indexOf(":")).trim());.
Feel free to comment if you are looking for something else.
You are facing this problem because the indexof() function returns the first occurrence of that character in the string. Hence you you are getting the substring between the first occurrence of ':' and first occurrence of ',' . To solve your problem use the functions FirstIndexOf() and LastIndexOf() with str.substring instead of the function IndexOf(). This will return the substring between the first occurrence of ':' and the last occurrence of ',' . I hope you find this answer helpful.
An evergreen solution is :
String string = "Computer: intel, graphic card: Nvidia,";
Map<String,String> map = Pattern.compile("\\s*,\\s*")
.splitAsStream(string.trim())
.map(s -> s.split(":", 2))
.collect(Collectors.toMap(a -> a[0], a -> a.length>1? a[1]: ""));
System.out.println(map.values());
Output:
[ Nvidia, intel]
You can use regex for that.
To extract the text between a : and the last , in the line, use something like:
(?<=\:)(.*?)(?=\,\n)
You can then perform an operation like this:
String mytext = "Computer: intel, graphic card: Nvidia,\n" +
"Mouse: razer, color: white etc.";
Pattern pattern = Pattern.compile("(?<=\:)(.*?)(?=\,\n)");
Matcher matcher = pattern.matcher(mytext);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
The output will be:
intel, graphic card: Nvidia
Inspired by this and this other threads.
modify your code as follows to solve the issue.
Scanner sc = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
String str = sc.nextLine();
String[] sp = testString.split(",");
for(int i=0;i<sp.length;i++){
list.add(sp[i].split(":")[1]);
}
}
// you will get the list as intel ,nvdia, razor etc..
I think in order to get all the keys between ':' and ',' it would be good to split each line by ','and each element of the line by ':' then get the right hand value.
Please try this code :
Scanner sc;
try {
sc = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (sc.hasNextLine()) {
String informations = sc.nextLine();
String[] parts = informations.split(",");
for( String part : parts) {
list.add(part.substring(part.indexOf(':')+1));
}
}
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Related
I have assignment that requires us to read from a text file of covid 19 codon sequences. I have read in the first line as a string. I am able to convert this one line into 3 character substrings. However, my issue is now to do this for the rest of the file. When I add a hasNext method, it doesn't seem to work the same as my testline.
{
//Open the file
File file = new File("D://Downloads/covid19sequence.txt");
Scanner scan = new Scanner(file); String testLine = ""; String contents = ""; String codon2 = "";
double aTotal, lTotal, lPercentage;
ArrayList<String> codonList = new ArrayList<String>();
//Read a line in from the file and assign codons via substring
testLine = scan.nextLine();
for (int i = 0; i < testLine.length(); i += 3)
{
String codon = testLine.substring(i, i + 3);
codonList.add(codon);
}
while(scan.hasNext())
System.out.println(codonList);
}
For reference here is the output for the testline:
[AGA, TCT, GTT, CTC, TAA, ACG, AAC, TTT, AAA, ATC, TGT, GTG, GCT, GTC, ACT, CGG, CTG, CAT, GCT, TAG]
Use while(scan.hasNextLine()) to go through text file, you may do it like this:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.ArrayList;
public class Codons {
public static void main(String[] args) throws FileNotFoundException {
File file = new File("D://Downloads/covid19sequence.txt");
Scanner scan = new Scanner(file); String testLine = ""; String contents = ""; String codon2 = "";
double aTotal, lTotal, lPercentage;
ArrayList<String> codonList = new ArrayList<String>();
//Read a line in from the file and assign codons via substring
while(scan.hasNextLine()) {
testLine = scan.nextLine();
for (int i = 0; i < testLine.length(); i += 3)
{
String codon = testLine.substring(i, i + 3);
codonList.add(codon);
}
}
scan.close();
System.out.println(codonList);
}
}
If a Scanner is used it may be better to implement a separate method reading the contents using the scanner line by line and splitting the line into 3-character chunks as suggested here:
static List<String> readCodons(Scanner input) {
List<String> codons = new ArrayList();
while (input.hasNextLine()) {
String line = input.nextLine();
Collections.addAll(codons, line.split("(?<=\\G...)"));
}
return codons;
}
Test (using Scanner on the basis of a multiline String):
// each line contains 20 codons
String contents = "AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG\n"
+ "GATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGA\n"
+ "ATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGAG\n";
List<String> codons = readCodons(new Scanner(contents));
for (int i = 0; i < codons.size(); i++) {
if (i > 0 && i % 10 == 0) {
System.out.println();
}
System.out.print(codons.get(i) + " ");
}
Output
AGA TCT GTT CTC TAA ACG AAC TTT AAA ATC
TGT GTG GCT GTC ACT CGG CTG CAT GCT TAG
GAT CTG TTC TCT AAA CGA ACT TTA AAA TCT
GTG TGG CTG TCA CTC GGC TGC ATG CTT AGA
ATC TGT TCT CTA AAC GAA CTT TAA AAT CTG
TGT GGC TGT CAC TCG GCT GCA TGC TTA GAG
Similar results should be provided if a scanner is created on a text file:
try (Scanner input = new Scanner(new File("codons.data"))) {
List<String> codons = readCodons(input);
// print/process codons
}
I am trying to index each word in a text file Using java
Index means i am denoting indexing of words here..
This is my sample file https://pastebin.com/hxB8t56p
(the actual file I want to index is much larger)
This is the code I have tried so far
ArrayList<String> ar = new ArrayList<String>();
ArrayList<String> sen = new ArrayList<String>();
ArrayList<String> fin = new ArrayList<String>();
ArrayList<String> word = new ArrayList<String>();
String content = new String(Files.readAllBytes(Paths.get("D:\\folder\\poem.txt")), StandardCharsets.UTF_8);
String[] split = content.split("\\s"); // Split text file content
for(String b:split) {
ar.add(b); // added into the ar arraylist //ar contains every line of poem
}
FileInputStream fstream = null;
String answer = "";fstream=new FileInputStream("D:\\folder\\poemt.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int count = 1;
int songnum = 0;
while((strLine=br.readLine())!=null) {
String text = strLine.replaceAll("[0-9]", ""); // Replace numbers from txt
String nums = strLine.split("(?=\\D)")[0]; // get digits from strLine
if (nums.matches(".*[0-9].*")) {
songnum = Integer.parseInt(nums); // Parse string to int
}
String regex = ".*\\d+.*";
boolean result = strLine.matches(regex);
if (result == true) { // check if strLine contain digit
count = 1;
}
answer = songnum + "." + count + "(" + text + ")";
count++;
sen.add(answer); // added songnum + line number and text to sen
}
for(int i = 0;i<sen.size();i++) { // loop to match and get word+poem number+line number
for (int j = 0; j < ar.size(); j++) {
if (sen.get(i).contains(ar.get(j))) {
if (!ar.get(j).isEmpty()) {
String x = ar.get(j) + " - " + sen.get(i);
x = x.replaceAll("\\(.*\\)", ""); // replace single line sentence
String[] sp = x.split("\\s+");
word.add(sp[0]); // each word in the poem is added to the word arraylist
fin.add(x); // word+poem number+line number
}
}
}
}
Set<String> listWithoutDuplicates = new LinkedHashSet<String>(fin); // Remove duplicates
fin.clear();fin.addAll(listWithoutDuplicates);
Locale lithuanian = new Locale("ta");
Collator lithuanianCollator = Collator.getInstance(lithuanian); // sort array
Collections.sort(fin,lithuanianCollator);
System.out.println(fin);
(change in blossom. - 0.2,1.2, & the - 0.1,1.2, & then - 0.1,1.2)
I will first copy the intended output for your pasted example, and then go over the code to find how to change it:
Poem.txt
0.And then the day came,
to remain blossom.
1.more painful
then the blossom.
Expected output
[blossom. - 0.2,1.2, came, - 0.1, day - 0.1, painful - 1.1, remain - 0.2, the - 0.1,1.2, then - 0.1,1.2, to - 0.2]
As #Pal Laden notes in comments, some words (the, and) are not being indexed. It is probable that stopwords are being ignored for indexing purposes.
Current output of code is
[blossom. - 0.2, blossom. - 1.2, came, - 0.1, day - 0.1, painful - 1.1, remain - 0.2, the - 0.1, the - 1.2, then - 0.1, then - 1.2, to - 0.2]
So, assuming you fix your stopwords, you are actually quite close. Your fin array contains word+poem number+line number, but it should contain word+*list* of poem number+line number. There are several ways to fix this. First, we will need to do stopword removal:
// build stopword-removal set "toIgnore"
String[] stopWords = new String[]{ "a", "the", "of", "more", /*others*/ };
Set<String> toIgnore = new HashSet<>();
for (String s: stopWords) toIgnore.add(s);
if ( ! toIgnore.contains(sp[0)) fin.add(x); // only process non-ignored words
// was: fin.add(x);
Now, lets fix the list problem. The easiest (but ugly) way is to fix "fin" at the very end:
List<String> fixed = new ArrayList<>();
String prevWord = "";
String prevLocs = "";
for (String s : fin) {
String[] parts = s.split(" - ");
if (parts[0].equals(prevWord)) {
prevLocs += "," + parts[1];
} else {
if (! prevWord.isEmpty()) fixed.add(prevWord + " - " + prevLocs);
prevWord = parts[0];
prevLocs = parts[1];
}
}
// last iteration
if (! prevWord.isEmpty()) fixed.add(prevWord + " - " + prevLocs);
System.out.println(fixed);
How to do it the right way (TM)
You code can be much improved. In particular, using flat ArrayLists for everything is not always the best idea. Maps are great for building indices:
// build stopwords
String[] stopWords = new String[]{ "and", "a", "the", "to", "of", "more", /*others*/ };
Set<String> toIgnore = new HashSet<>();
for (String s: stopWords) toIgnore.add(s);
// prepare always-sorted, quick-lookup set of terms
Collator lithuanianCollator = Collator.getInstance(new Locale("ta"));
Map<String, List<String>> terms = new TreeMap<>((o1, o2) -> lithuanianCollator.compare(o1, o2));
// read lines; if line starts with number, store separately
Pattern countPattern = Pattern.compile("([0-9]+)\\.(.*)");
String content = new String(Files.readAllBytes(Paths.get("/tmp/poem.txt")), StandardCharsets.UTF_8);
int poemCount = 0;
int lineCount = 1;
for (String line: content.split("[\n\r]+")) {
line = line.toLowerCase().trim(); // remove spaces on both sides
// update locations
Matcher m = countPattern.matcher(line);
if (m.matches()) {
poemCount = Integer.parseInt(m.group(1));
lineCount = 1;
line = m.group(2); // ignore number for word-finding purposes
} else {
lineCount ++;
}
// read words in line, with locations already taken care of
for (String word: line.split(" ")) {
if ( ! toIgnore.contains(word)) {
if ( ! terms.containsKey(word)) {
terms.put(word, new ArrayList<>());
}
terms.get(word).add(poemCount + "." + lineCount);
}
}
}
// output formatting to match that of your code
List<String> output = new ArrayList<>();
for (Map.Entry<String, List<String>> e: terms.entrySet()) {
output.add(e.getKey() + " - " + String.join(",", e.getValue()));
}
System.out.println(output);
Which gives me [blossom. - 0.2,1.2, came, - 0.1, day - 0.1, painful - 1.1, remain - 0.2, to - 0.2]. I have not fixed the list of stopwords to get a perfect match, but that should be easy to do.
I am trying to split some simple data from a .txt file. I have found some useful structures on the internet but it was not enough to split the data the way I wanted. I get a string like this:
{X:0.8940594 Y:0.6853521 Z:1.470214}
And I want to transform it to like this;
0.8940594
0.6853521
1.470214
And then put them in a matrix in order X=[], Y=[], Z=[]; (the data is the coordinate of an object)
Here is my code:
BufferedReader in = null; {
try {
in = new BufferedReader(new FileReader("file.txt"));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split("\\s+");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
}
What do I need to add to my code to get the data the way I want?
Right now with this code I receive data like this:
{X:0.8940594
Y:0.6853521
Z:1.470214}
You can try using a regex similar to the following to match and capture the three numbers contained in each tuple:
{\s*X:(.*?)\s+Y:(.*?)\s+Z:(.*?)\s*}
Each quantity contained in parenthesis is a capture group, and is available after a match has taken place.
int size = 100; // replace with actual size of your vectors/matrix
double[] A = new double[size];
double[] B = new double[size];
double[] C = new double[size];
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String regex = "\\{\\s*X:(.*?)\\s+Y:(.*?)\\s+Z:(.*?)\\s*\\}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
int counter = 0;
while (m.find()) {
A[counter] = Double.parseDouble(m.group(1));
B[counter] = Double.parseDouble(m.group(2));
C[counter] = Double.parseDouble(m.group(3));
++counter;
}
You can use this regex -?\d+\.\d+ for example :
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
Pattern pattern = Pattern.compile("-?\\d+\\.\\d+");
Matcher matcher = pattern.matcher(input);
List<String> result = new ArrayList<>();
while (matcher.find()) {
result.add(matcher.group());
}
System.out.println(result);
In your case you want to match the real number, you can check the Regex .
This code will solve your problem.
String input = "{X:0.8940594 Y:0.6853521 Z:1.470214} ";
String[] parts = input.split("(?<= )");
List<String> output = new ArrayList();
for (int i = 0; i < parts.length; i++) {
//System.out.println("*" + i);
//System.out.println(parts[i]);
String[] part = parts[i].split("(?<=:)");
String[] temp = part[1].split("}");
output.add(temp[0]);
}
System.out.println("This List contains numbers:" + output);
Output->This List contains numbers:[0.8940594 , 0.6853521 , 1.470214]
How about this?
public class Test {
public static void main(String[] args) {
String s = "{X:0.8940594 Y:0.6853521 Z:1.470214}";
String x = s.substring(s.indexOf("X:")+2, s.indexOf("Y:")-1);
String y = s.substring(s.indexOf("Y:")+2, s.indexOf("Z:")-1);
String z = s.substring(s.indexOf("Z:")+2, s.lastIndexOf("}"));
System.out.println(x);
System.out.println(y);
System.out.println(z);
}
}
Your regex splits on whitespace, but does not remove the curly braces.
So instead of splitting on whitespace, you split on a class of characters: whitespace and curly braces.
The line with the regex then becomes:
String[] splited = read.split("[\\s+\\{\\}]");
Here is an ideone link with the full snippet.
After this, you'll want to split the resulting three lines on the :, and parse the righthand side. You can use Double.parseDouble for this purpose.
Personally, I would try to avoid long regex expressions; they are hard to debug.
It may be best to remove the curly braces first, then split the result on whitespace and colons. This is more lines of code, but it's more robust and easier to debug.
I have to parse a csv file which has fields that can look like the following:
("FOO, BAR BAZ", 42)
And yield the two fields:
FOO, BAR BAZ
42
I'm not sure how to do this succinctly using Apache Commons CSV or OpenCSV, so I'm looking for some guidance. It may just be that I don't fully understand the org.apache.commons.csv.CSVFormat property "quoteChar" which is touched on in the documentation but never clearly explained anywhere I could find. If so, it'd be very helpful if you could point me towards better documentation of that feature.
Here's a brief example that shows my problem as well as what I've tried and the results:
String test = "(\"FOO, BAR BAZ\", 42)";
int numTries = 5;
CSVParser[] tries = new CSVParser[numTries];
tries[0] = CSVParser.parse(line, CSVFormat.DEFAULT.withRecordSeparator("\n"));//BAR BAZ"
tries[1] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote('"'));//BAR BAZ"
tries[2] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote(null));//BAR BAZ"
tries[3] = CSVParser.parse(line, CSVFormat.DEFAULT.withQuote('"').withQuoteMode(QuoteMode.NON_NUMERIC));//BAR BAZ"
tries[4] = CSVParser.parse(line, CSVFormat.DEFAULT.withRecordSeparator(")\n("));//BAR BAZ"
for(int i = 0; i < numTries; i++){
CSVRecord record = tries[i].getRecords().get(0);
System.out.println(record.get(1));//.equals("42"));
}
Note that it works fine if you exclude the parentheses from the input.
You can use OpenCSV's CSVReader to read the data and get the data elements as shown below:
public static void main(String[] args) {
try(FileReader fr = new FileReader(new File("C:\\Sample.txt"));
CSVReader csvReader = new CSVReader(fr);) {
String[] data = csvReader.readNext();
for(String data1 : data) {
System.out.println(data1);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can achieve this with opencsv as follows:
import com.opencsv.CSVReader;
import java.io.FileReader;
import java.io.IOException;
public class NewClass1 {
public static void main(String[] args) throws IOException {
String fileName = "C:\\yourFile.csv";
String [] nextLine;
// use the three arg constructor to tell the reader which delimiter you have in your file(2nd arg : here ',')
// you can change this to '\t' if you have tab separeted file or ';' or ':' ... whatever your delimiter is
// (3rd arg) '"' if your fields are double quoted or '\'' if single quoted or no 3rd arg if the fields are not quoted
CSVReader reader = new CSVReader(new FileReader(fileName), ',' ,'"');
// nextLine[] is an array of values from the line
// each line represented by String[], and each field as an element of the array
while ((nextLine = reader.readNext()) != null) {
System.out.println("nextLine[0]: " +nextLine[0]);
System.out.println("nextLine[1]: " +nextLine[1]);
}
}
}
For me the default-format of commons-csv does the right thing for a correctly formatted CSV message:
Reader in = new StringReader("\"FOO, BAR BAZ\", 42");
Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (CSVRecord record : records) {
for(int i = 0;i < record.size();i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
Leads to:
At 0: FOO, BAR BAZ
At 1: 42
For the specially formatted lines you likely need to do a bit more handling top remove those brackets:
BufferedReader lineReader = new BufferedReader(
new StringReader("(\"FOO, BAR BAZ\", 42)\n(\"FOO, BAR FOO\", 44)"));
while(true) {
String line = lineReader.readLine();
if (line == null) {
break;
}
String adjustedLine = line.substring(1, line.length() - 1);
records = CSVFormat.DEFAULT.parse(new StringReader(adjustedLine));
for (CSVRecord record : records) {
for (int i = 0; i < record.size(); i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
}
In the input file, there are 2 columns: 1) stem, 2) affixes. In my coding, i recognise each of the columns as tokens i.e. tokens[1] and tokens[2]. However, for tokens[2] the contents are: ng ny nge
stem affixes
---- -------
nyak ng ny nge
my problem here, how can I declare the contents under tokens[2]? Below are my the snippet of the coding:
try {
FileInputStream fstream2 = new FileInputStream(file2);
DataInputStream in2 = new DataInputStream(fstream2);
BufferedReader br2 = new BufferedReader(new InputStreamReader(in2));
String str2 = "";
String affixes = " ";
while ((str2 = br2.readLine()) != null) {
System.out.println("Original:" + str2);
tokens = str2.split("\\s");
if (tokens.length < 4) {
continue;
}
String stem = tokens[1];
System.out.println("stem is: " + stem);
// here is my point
affixes = tokens[3].split(" ");
for (int x=0; x < tokens.length; x++)
System.out.println("affix is: " + affixes);
}
in2.close();
} catch (Exception e) {
System.err.println(e);
} //end of try2
You are using tokens as an array (tokens[1]) and assigning the value of a String.split(" ") to it. So it makes things clear that the type of tokens is a String[] array.
Next,
you are trying to set the value for affixes after splitting tokens[3], we know that tokens[3] is of type String so calling the split function on that string will yield another String[] array.
so the following is wrong because you are creating a String whereas you need String[]
String affixes = " ";
so the correct type should go like this:
String[] affixes = null;
then you can go ahead and assign it an array.
affixes = tokens[3].split(" ");
Are you looking for something like this?
public static void main(String[] args) {
String line = "nyak ng ny nge";
MyObject object = new MyObject(line);
System.out.println("Stem: " + object.stem);
System.out.println("Affixes: ");
for (String affix : object.affixes) {
System.out.println(" " + affix);
}
}
static class MyObject {
public final String stem;
public final String[] affixes;
public MyObject(String line) {
String[] stemSplit = line.split(" +", 2);
stem = stemSplit[0];
affixes = stemSplit[1].split(" +");
}
}
Output:
Stem: nyak
Affixes:
ng
ny
nge