Splitting a line and filling an array skipping blank values in Java - java

I have an array of line, which is somewhat like below
Here's example:
A-NUMBER ROUTINF ACO AO L MISCELL
0-0 0 1-20
0-00
0-01 FDS 3-20
0-02 6 7 3-20
0-03 4 3-20
1-0 F=PRE
ANT=3
NAPI=1
1-1 F=PRE
ANT=3
I need to parse the line according to column by skipping the column which has blank values and create a new line like below
ANUM = 0-0, ACO=0, L=1-20;
ANUM = 0-00;
ANUM = 0-01, ROUTINF=FDS, L=3-20;
ANUM = 0-02, ACO=6, AO=7, L=3-20;
ANUM = 0-03, AO=4,L=3-20;
ANUM = 1-0, F=PRE, ANT=3, NAPI=1;
ANUM = 1-1, F=PRE, ANT=3;
I can split the line but my code can't remember which column the value belongs to and when to skip the values.
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s+");
for(String str : splitted)
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}

The columns in your file seems to be of fixed length (I don't see any other way to distinguish each column). If that is the case then I would recommend using substring(srat, end) instead of split.
Create a class to hold one single record.
class Record {
String aNumber,
List<String> routingf, aco, ao, l, miscell;
public Record(String aNumber) {
this.aNumber = aNumber;
this.routingf = new ArrayList<>();
// init other lists like above ...
}
public void addRoutingf(String routingf) {
// add only of not null and is not empty trimmed
if(routingf != null && routiingf.trim().length() > 0) {
this.routingf.add(routiingf);
}
}
// implement add-methods for other lists like above ...
}
While parsing each line remember the last created record. If in the actual line A-NUMBER is empty then use the last created record to store the values, otherwise create a new record and remember it as last/actual so you can use it for the upcoming lines if necessary.
Save all record in a list
List<Record> records = new ArrayList<>();

What is the common separator? Just split on that... Your + at the moment will consume any amount of white space. \s{1,4} wil limit it to between 1 and 4 characters. Find the right numbers for your data.

if your input time use one space char (for instance tab) between columns your code is almost OK
String[] splitted = null;
for (Integer i = 0; i < lines.size(); i++) {
splitted = lines.get(i).split("\\s");
if(!(splitted.length == 1)){
anum = splitted[0];
routinf = splitted[1];
aco = splitted[2];
ao = splitted[3];
l = splitted[4];
}else {
miscell = splitted[0];
}
}
//print only not empty fields
pls note removing of unnecessary for loop and change of split character to \s from \s+

Just a thought, but you could also experiment if it helps to keep the whitespaces in the result for defining which column it belongs to.
lines.get(i).split(yourDelimiter, -1);
Its hard to tell if this helps without knowing what exactly your origin files are looking like, but you could give it a try.
e.g. if the values are always at a certain point in the splitted string with whitespaces, you could easily tell which column it belongs to and extract them.

Related

What is the most efficient way to add 3 characters at a time to an araylist from a text file?

Say you have a text file with "abcdefghijklmnop" and you have to add 3 characters at a time to an array list of type string. So the first cell of the array list would have "abc", the second would have "def" and so on until all the characters are inputted.
public ArrayList<String> returnArray()throws FileNotFoundException
{
int i = 0
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
while (scanCharacters.hasNext())
{
list.add(scanCharacters.next().substring(i,i+3);
i+= 3;
}
scanCharacters.close();
return characters;
}
Please use the below code,
ArrayList<String> list = new ArrayList<String>();
int i = 0;
int x = 0;
Scanner scanCharacters = new Scanner(file);
scanCharacters.useDelimiter(System.getProperty("line.separator"));
String finalString = "";
while (scanCharacters.hasNext()) {
String[] tokens = scanCharacters.next().split("\t");
for (String str : tokens) {
finalString = StringUtils.deleteWhitespace(str);
for (i = 0; i < finalString.length(); i = i + 3) {
x = i + 3;
if (x < finalString.length()) {
list.add(finalString.substring(i, i + 3));
} else {
list.add(finalString.substring(i, finalString.length()));
}
}
}
}
System.out.println("list" + list);
Here i have used StringUtils.deleteWhitespace(str) of Apache String Utils to delete the blank space from the file tokens.and the if condition inside for loop to check the substring for three char is available in the string if its not then whatever character are left it will go to the list.My text file contains the below strings
asdfcshgfser ajsnsdxs in first line and in second line
sasdsd fghfdgfd
after executing the program result are as,
list[asd, fcs, hgf, ser, ajs, nsd, xs, sas, dsd, fgh, fdg, fd]
public ArrayList<String> returnArray()throws FileNotFoundException
{
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
String temp = "";
while (scanCharacters.hasNext())
{
temp+=scanCharacters.next();
}
while(temp.length() > 2){
list.add(temp.substring(0,3));
temp = temp.substring(3);
}
if(temp.length()>0){
list.add(temp);
}
scanCharacters.close();
return list;
}
In this example I read in all of the data from the file, and then parse it in groups of three. Scanner can never backtrack so using next will leave out some of the data the way you're using it. You are going to get groups of words (which are separated by spaces, Java's default delimiter) and then sub-stringing the first 3 letters off.
IE:
ALEXCY WOWZAMAN
Would give you:
ALE and WOW
The way my example works is it gets all of the letters in one string and continuously sub strings off letters of three until there are no more, and finally, it adds the remainders. Like the others have said, it would be good to read up on a different data parser such as BufferedReader. In addition, I suggest you research substrings and Scanner if you want to continue to use your current method.

Using Files.lines with .map(line -> line.split("multiple delimiters"))

I have an input file with the following format:
Ontario:Brampton:43° 41' N:79° 45' W
Ontario:Toronto:43° 39' N:79° 23' W
Quebec:Montreal:45° 30' N:73° 31' W
...
I have a class named where the values will go.
example:
Province: Ontario
City: Brampton
LatDegrees: 43
LatMinutes: 41
LatDirection: N
LongDegrees: 79 .... etc
I have already completed a method that parses this out correctly, but i'm trying to learn if this can be done better with Java 8 using Streams, Lambdas.
If I start with the following:
Files.lines(Paths.get(inputFile))
.map(line -> line.split("\\b+")) //this delimits everything
//.filter(x -> x.startsWith(":"))
.flatMap(Arrays::stream)
.forEach(System.out::println);
Can someone please help me reproduce the following please ?
private void parseLine(String data) {
int counter1 = 1; //1-2 province or city
int counter2 = 1; //1-2 LatitudeDirection,LongitudeDirection
int counter3 = 1; //1-4 LatitudeDegrees,LatitudeMinutes,LongitudeDegrees,LongitudeMinutes
City city = new City(); //create City object
//String read = Arrays.toString(data); //convert array element to String
String[] splited = data.split(":"); //set delimiter
for (String part : splited) {
//System.out.println(part);
char firstChar = part.charAt(0);
if(Character.isDigit(firstChar)){ //if the first char is a digit, then this part needs to be split again
String[] splited2 = part.split(" "); //split second time with space delimiter
for (String part2: splited2){
firstChar = part2.charAt(0);
if (Character.isDigit(firstChar)){ //if the first char is a digit, then needs trimming
String parseDigits = part2.substring(0, part2.length()-1); //trim trailing degrees or radians character
switch(counter2++){
case 1:
city.setLatitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LatitudeDegrees: " + city.getLatitudeDegrees());
break;
case 2:
city.setLatitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LatitudeMinutes: " + city.getLatitudeMinutes());
break;
case 3:
city.setLongitudeDegrees(Integer.parseInt(parseDigits));
//System.out.println("LongitudeDegrees: " + city.getLongitudeDegrees());
break;
case 4:
city.setLongitudeMinutes(Integer.parseInt(parseDigits));
//System.out.println("LongitudeMinutes: " + city.getLongitudeMinutes());
counter2 = 1; //reset counter2
break;
}
}else{
if(counter3 == 1){
city.setLatitudeDirection(part2.charAt(0));
//System.out.println("LatitudeDirection: " + city.getLatitudeDirection());
counter3++; //increment counter3 to use longitude next
}else{
city.setLongitudeDirection(part2.charAt(0));
//System.out.println("LongitudeDirection: " + city.getLongitudeDirection());
counter3 = 1; //reset counter 3
//System.out.println("Number of cities: " + cities.size());
cities.add(city);
}
}
}
}else{
if(counter1 == 1){
city.setProvince(part);
//System.out.println("\nProvince: " + city.getProvince());
counter1++;
}else if(counter1 == 2){
city.setCity(part);
//System.out.println("City: " + city.getCity());
counter1 = 1; //reset counter1
}
}
}
}
There's probably a better solution to my parseLine() method no doubt, but I would really like to condense that as outlined above.
Thanks !!
Let’s start with some general notes.
Your sequence .map(line -> line.split("\\b+")).flatMap(Arrays::stream) isn’t recommended. These two steps will first create an array before creating another stream wrapping that array. You can skip the array step by using splitAsStream though this requires you to deal with Pattern explicitly instead of hiding it within String.split:
.flatMap(Pattern.compile("\\b+")::splitAsStream)
but note that in this case, splitting into words doesn’t really pay off.
If you want to keep your original parseLine method, you can simply do
Files.lines(Paths.get(inputFile))
.forEach(this::parseLine);
and you’re done.
But seriously, that is not a real solution. To do pattern matching, you should use a library designated to pattern matching, e.g. the regex package. You are using it already, when you do splitting via split("\\b+") but that’s far behind from what it can do for you.
Lets define the pattern:
(…) forms a group that allows capturing the matching part so we can extract it for our result
[^:]* specifies a token consisting of arbitrary characters except the colon ([^:]) of arbitrary length (*)
\d+ defines a number (d = numeric digit, + = one or more)
[NS] and [WE] match a single character being either N or S, or either W or E, respectively
so the entire pattern you are looking for is
([^:]*):([^:]*):(\d+)° (\d+)' ([NS]):(\d+)° (\d+)' ([WE])
and the entire parse routine will be:
static Pattern CITY_PATTERN=Pattern.compile(
"([^:]*):([^:]*):(\\d+)° (\\d+)' ([NS]):(\\d+)° (\\d+)' ([WE])");
static City parseCity(String line) {
Matcher matcher = CITY_PATTERN.matcher(line);
if(!matcher.matches())
throw new IllegalArgumentException(line+" doesn't match "+CITY_PATTERN);
City city=new City();
city.setProvince(matcher.group(1));
city.setCity(matcher.group(2));
city.setLatitudeDegrees(Integer.parseInt(matcher.group(3)));
city.setLatitudeMinutes(Integer.parseInt(matcher.group(4)));
city.setLatitudeDirection(line.charAt(matcher.start(5)));
city.setLongitudeDegrees(Integer.parseInt(matcher.group(6)));
city.setLongitudeMinutes(Integer.parseInt(matcher.group(7)));
city.setLongitudeDirection(line.charAt(matcher.start(8)));
return city;
}
and I really hope you are calling your hard-to-read method never “condense” anymore…
Using the routine above, a clean Stream-based processing solution would look like
List<City> cities = Files.lines(Paths.get(inputFile))
.map(ContainingClass::parseCity).collect(Collectors.toList());
to collect a file into a new list of cities.

How can I extract specific terms from each string line?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines;
test.csv
line1 : "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O "
line2 : "12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
line3 : "9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
one more thing ;
the length of whitespace between each six terms is not always equal.
the length could be one,two,three or four..five... something like that..
Another try:
import java.io.File;
import java.util.Scanner;
public class HelloWorld {
// The amount of columns per row, where each column is seperated by an arbitrary number
// of spaces or tabs
final static int COLS = 7;
public static void main(String[] args) {
System.out.println("Tokens:");
try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
// Counten the current column-id
int n = 0;
String tmp = "";
StringBuilder item = new StringBuilder();
// Operating of a stream
while (scanner.hasNext()) {
tmp = scanner.next();
n += 1;
// If we have reached the fifth column, take its content and append the
// sixth column too, as the name we want consists of space-separated
// expressions. Feel free to customize of your name-layout varies.
if (n % COLS == 5) {
item.setLength(0);
item.append(tmp);
item.append(" ");
item.append(scanner.next());
n += 1;
System.out.println(item.toString()); // Doing some stuff with that
//expression we got
}
}
}
catch(java.io.IOException e){
System.out.println(e.getMessage());
}
}
}
if your line[]'s type is String
String s = line[0];
String[] split = s.split(" ");
return split[4]; //which is the fifth item
For the delimiter, if you want to go more precisely, you can use regular expression.
How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:
String[] parts = str.split("\\t");
Your expected result will be in parts[4].
Just use String.split() using a regex for at least 2 whitespace characters:
String foo = "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid

Java Concordance Exercise

So I have a programming exercise that involves concordance. I am attempting to take a .txt file, use regex to parse it into strings containing all words, then create a Hashtable that has the key (the word) and the value (the number of times the word appears in the document). We are supposed to be able to account for both case sensitive and non-case sensitive scenarios by passing in a boolean.
Here is what I have:
public Concordance( String pathName, boolean is_case_sensitive ) {
Scanner file = new Scanner(pathName);
try {
file = new Scanner(new File(pathName));
} catch (FileNotFoundException e) {
System.out.println("No File Found");
}
String[] words;
Pattern pattern = Pattern.compile("\\W+");
words = pattern.split(file.nextLine());
String[] wordsCopy = new String[words.length];
for (int i = 0; i < words.length; i++){
wordsCopy[i] = new String(words[i]);
}
int counter = 0;
while (file.hasNext()){
for (String w : words){
counter = 0;
for (String w2 : wordsCopy){
if (is_case_sensitive == false){
if (w.equalsIgnoreCase(w2)){
counter++;
//w2 = null;
tableOfWords.put(w, counter);
file.next();
}
}
if (is_case_sensitive == true){
if (w.equals(w2)){
counter++;
//w2 = null;
tableOfWords.put(w, counter);
file.next();
}
}
}
}
}
}
To walk you through where I am and where my error I believe is....
I use the scanner to "take in" the file the use the regex \W+ to get all of words. I create a String array, I split the Pattern pattern into the String array. Then I create a deep copy of the array to use during comparison. (So I now have two String arrays - words and wordsCopy). I use an int counter variable to keep track of how many times it appears and address case sensitivity by using an if statement and the equals/equalsIgnoreCase methods. I have been going back and forth between assigning w2 to null (its currently commented out) as I intuitively feel like if it is not set to null, it will be counted twice, but I can't seem to think through it appropriately. I think I am counting items in duplicate, but can't seem to figure out a solution. Any insight? Thanks!
You dont need any extra String[] to check case sensitive
Pattern pattern = Pattern.compile("\\W+");
HashMap<String, AtomicInteger> tableOfWords = new HashMap<String, AtomicInteger>();
while (file.hasNextLine()){
words = pattern.split(file.nextLine());
for (String w : words){
String tmp = w;
if (!is_case_sensitive){
tmp = String.valueOf(w.toLowerCase());
}
AtomicInteger count = tableOfWords.get(tmp);
if (count == null){
count = new AtomicInteger(0);
}
count.incrementAndGet();
tableOfWords.put(tmp,count);
}
}
Convert the actual word into low / high case if case sensitive is not required. then everything work perfectly.
As far as I can see you are actually counting words multiple times (more than twice aswell)
I'll give you a simple foreach loop to explain what you're doing,
some of the syntax might be wrong as i'm not using an ide to write this code
int[5] ints = {1,2,3,4,5};
int[5] intcopy = ints;
for(int i:ints){
for(int j: intcopy){
system.out.println(j);
}
}
What you will end out printing is
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
So instead of counting 5 things you are counting 25 things, hope this helps

Why is the size of this vector 1?

When I use System.out.println to show the size of a vector after calling the following method then it shows 1 although it should show 2 because the String parameter is "7455573;photo41.png;photo42.png" .
private void getIdClientAndPhotonames(String csvClientPhotos)
{
Vector vListPhotosOfClient = new Vector();
String chainePhotos = "";
String photoName = "";
String photoDirectory = new String(csvClientPhotos.substring(0, csvClientPhotos.indexOf(';')));
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';')+1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
if (chainePhotos.indexOf(';') == -1)
{
vListPhotosOfClient.addElement(new String(chainePhotos));
}
else // aaa;bbb;...
{
for (int i = 0 ; i < chainePhotos.length() ; i++)
{
if (chainePhotos.charAt(i) == ';')
{
vListPhotosOfClient.addElement(new String(photoName));
photoName = "";
continue;
}
photoName = photoName.concat(String.valueOf(chainePhotos.charAt(i)));
}
}
}
So the vector should contain the two String photo41.png and photo42.png , but when I print the vector content I get only photo41.png.
So what is wrong in my code ?
The answer is not valid for this question anymore, because it has been retagged to java-me. Still true if it was Java (like in the beginning): use String#split if you need to handle csv files.
It's be far easier to split the string:
String[] parts = csvClientPhotos.split(";");
This will give a string array:
{"7455573","photo41.png","photo42.png"}
Then you'd simply copy parts[1] and parts[2] to your vector.
You have two immediate problems.
The first is with your initial manipulation of the string. The two lines:
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';')+1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
when applied to 7455573;photo41.png;photo42.png will end up giving you photo41.png.
That's because the first line removes everything up to the first ; (7455573;) and the second strips off everything from the final ; onwards (;photo42.png). If your intent is to just get rid of the 7455573; bit, you don't need the second line.
Note that fixing this issue alone will not solve all your ills, you still need one more change.
Even though your input string (to the loop) is the correct photo41.png;photo42.png, you still only add an item to the vector each time you encounter a delimiting ;. There is no such delimiter at the end of that string, meaning that the final item won't be added.
You can fix this by putting the following immediately after the for loop:
if (! photoName.equals(""))
vListPhotosOfClient.addElement(new String(photoName));
which will catch the case of the final name not being terminated with the ;.
These two lines are the problem:
chainePhotos = csvClientPhotos.substring(csvClientPhotos.indexOf(';') + 1);
chainePhotos = chainePhotos.substring(0, chainePhotos.lastIndexOf(';'));
After the first one the chainePhotos contains "photo41.png;photo42.png", but the second one makes it photo41.png - which trigers the if an ends the method with only one element in the vector.
EDITED: what a mess.
I ran it with correct input (as provided by the OP) and made a comment above.
I then fixed it as suggested above, while accidently changing the input to 7455573;photo41.png;photo42.png; which worked, but is probably incorrect and doesn't match the explanation above input-wise.
I wish someone would un-answer this.
You can split the string manually. If the string having the ; symbol means why you can do like this? just do like this,
private void getIdClientAndPhotonames(String csvClientPhotos)
{
Vector vListPhotosOfClient = split(csvClientPhotos);
}
private vector split(String original) {
Vector nodes = new Vector();
String separator = ";";
// Parse nodes into vector
int index = original.indexOf(separator);
while(index>=0) {
nodes.addElement( original.substring(0, index) );
original = original.substring(index+separator.length());
index = original.indexOf(separator);
}
// Get the last node
nodes.addElement( original );
return nodes;
}

Categories

Resources