Splitting a specific bunch of text - JAVA - java

If I have this element:
<Comments type="ITEM_OUT_COMMENTS" xml:lang="en">Item text
203871: ATAG ZON POMPUPR 15-60 DO NOT DELETE SupplierAuxiliaryPartID : 395##!817##!N
Material PO text
Computers, Mainframe
COMPUTERS,MAINFRAME,SOURCED
</Comments>
Is it also possible to get only this piece of text back: 395##!817##!N
This piece of text is always to be found behind: SupplierAuxiliaryPartID :
But it can happen that there are no spaces like this SupplierAuxiliaryPartID:395##!817##!N:
<Comments type="ITEM_OUT_COMMENTS" xml:lang="en">Item text
203871: ATAG ZON POMPUPR 15-60 DO NOT DELETE SupplierAuxiliaryPartID:395##!817##!N
Material PO text
Computers, Mainframe
COMPUTERS,MAINFRAME,SOURCED
</Comments>
I tried several splits but every time I cannot get the right piece of text.

final String LABEL = "SupplierAuxiliaryPartID"
String getSupplierAuxiliaryPartId(String comments)
{
// Split comments by line
for (String line : comments.split('\n'))
{
// find where the label is
int index = line.indexOf(label);
if (index == -1)
{
// no label on this line
continue;
}
// find first colon after the label
index = line.indexOf(":", index + LABEL.length);
if (index == -1)
{
// label without colon, but maybe the next line has a valid one
continue;
}
// return the remaining of the line after the colon striping out extra whitespaces
return line.substring(index + 1).trim();
}
// label with column not present in comment
return null;
}

Related

How to merge many List<String> elements in one based on double quote delimiter in java

I have a CSV file generated in other platform (Salesforce), by default it seems Salesforce is not handling break lines in the file generation in some large text fields, so in my CSV file I have some rows with break lines like this that I need to fix:
"column1","column2","my column with text
here the text continues
more text in the same field
here we finish this","column3","column4"
Same idea using this piece of code:
List<String> listWords = new ArrayList<String>();
listWords.add("\"Hi all");
listWords.add("This is a test");
listWords.add("of how to remove");
listWords.add("");
listWords.add("breaklines and merge all in one\"");
listWords.add("\"This is a new Line with the whole text in one row\"");
in this case I would like to merge the elements. My first approach was to check for the lines were the last char is not a ("), concatenates the next line and just like that until we see the las char contains another double quote.
this is a non working sample of what I was trying to achieve but I hope it gives you an idea
String[] csvLines = csvContent.split("\n");
Integer iterator = 0;
String mergedRows = "";
for(String row:csvLines){
newCsvfile.add(row);
if(row != null){
if(!row.isEmpty()){
String lastChar = String.valueOf(row.charAt(row.length()-1));
if(!lastChar.contains("\"")){
//row += row+" "+csvLines[iterator+1].replaceAll("\r", "").replaceAll("\n", "").replaceAll("","").replaceAll("\r\n?|\n", "");
mergedRows += row+" "+csvLines[iterator+1].replaceAll("\r", "").replaceAll("\n", "").replaceAll("","").replaceAll("\r\n?|\n", "");
row = mergedRows;
csvLines[iterator+1] = null;
}
}
newCsvfile.add(row);
}
iterator++;
}
My final result should look like (based on the list sample):
"Hi all This is a test of how to remove break lines and merge all in one"
"This is a new Line with the whole text in one row".
What is the best approach to achieve this?
In case you don't want to use a CSV reading library like #RealSkeptic suggested...
Going from your listWords to your expected solution is fairly simple:
List<String> listSentences = new ArrayList<>();
String tmp = "";
for (String s : listWords) {
tmp = tmp.concat(" " + s);
if (s.endsWith("\"")){
listSentences.add(tmp);
tmp = "";
}
}

Java Regex : How to search a text or a phrase in a large text

I have a large text file and I need to search a word or a phrase in the file line by line and output the line with the text found in it.
For example, the sample text is
And the earth was without form,
Where [art] thou?
if the user search for thou word, the only line to be display is
Where [art] thou?
and if the user search for the earth, the first line should be displayed.
I tried using the contains function but it will display also the without when searching only for thou.
This is my sample code :
String[] verseList = TextIO.readFile("pentateuch.txt");
Scanner kbd = new Scanner(System.in);
int counter = 0;
for (int i = 0; i < verseList.length; i++) {
String[] data = verseList[i].split("\t");
String[] info3 = data[3].split(" ");
System.out.print("Search for: ");
String txtSearch = kbd.nextLine();
LinkedList<String> searchedList = new LinkedList<String>();
for (String bible : verseList){
if (bible.contains(txtSearch)){
searchedList.add(bible);
counter++;
}
}
if (searchedList.size() > 0){
for (String s : searchedList){
String[] searchedData = s.split("\t");
System.out.printf("%s - %s - %s - %s \n",searchedData[0], searchedData[1], searchedData[2], searchedData[3]);
}
}
System.out.print("Total: " + counter);
So I am thinking of using regex but I don't know how.
Can anyone help? Thank you.
Since sometimes variables have non-word characters at boundary positions, you cannot rely on \b word boundary.
In such cases, it is safer to use look-arounds (?<!\w) and (?!\w), i.e. in Java, something like:
"(?<!\\w)" + searchedData[n] + "(?!\\w)"
To match a String that contains a word, use this code:
String txtSearch; // eg "thou"
if (str.matches(".*?\\b" + txtSearch + "\\b.*"))
// it matches
This code builds a regex that only matches if both ends of txtSearch fall and the start/end of a word in the string by using \b, which means "word boundary".

Read nth line from string

i am trying to read 7th line of a string so that i can filter the required text but not getting more.(assuming i have n number of line).
class Lastnthchar {
public static void main(String[] args) {
// TODO Auto-generated method stub
String alldata =" FORM"+"\n"+
" to get all data"+"\n"+
" PART A is mandatory"+"\n"+
" enclose all Certificate"+"\n"+
" Certificate No. SFDSFDFS Last updated on 12-Jun-2009"+"\n"+
" Name and address"+"\n"+
" Lisa Lawerence"+"\n"+
" 10/3 TOP FLOOR, Street no 22 ,NewYork"+"\n"+
" residence"+"\n"+
" zip-21232"+"\n"+
" C 78,New York"+"\n"+
" US"+"\n"+
" US"+"\n"+
" "+"\n"+
" worldwide";
String namerequired = new String ();
//BufferedReader br = new BufferedReader(alldata);
int lineno = 0;
for(lineno = 0; lineno <alldata.length(); lineno ++)
{
//what should i do?
}
}
}
so if any solution please help.
alldata.length() will return the length of the string (i.e. number of characters), not the number of lines.
To get the nth line you'll need to split the string at the line breaks, e.g.
alldata.split("\n")[6] to get the 7th line (provided there are at least 7 lines).
This also assumes you have line breaks (\n) in your string and not just carriage returns (\r). If you want to split at both individually or in combination, you can change the parameter of split() to "\r\n|\n|\r". If you want to skip empty lines, you can split at any sequence of at least one line break or carriage return, e.g. "[\r\n]+".
Example:
System.out.println("--- Input:");
String input = "A\nB\rC\n\nD\r\nE";
System.out.println(input);
System.out.println("--- 4th element, split by \\n:");
System.out.println(input.split("\n")[3]); //3rd element will be "D\r"
System.out.println("--- 4th element, split by \\r\\n|\\n|\\r:");
System.out.println(input.split("\r\n|\n|\r")[3]); //3rd element will be an empty string
System.out.println("--- 4th element, split by [\\r\\n]+:");
System.out.println(input.split("[\r\n]+")[3]); //3rd element will be "D"
System.out.println("--- END");
Output:
--- Input:
A
B
C
D
E
--- 4th element, split by \n:
D
--- 4th element, split by \r\n|\n|\r:
--- 4th element, split by [\r\n]+:
D
--- END
Alternatively, if you're reading the text from some stream (e.g. from a file) you can use BufferedReader#readLine() and count the lines. Additionally you can initialize the BufferedReader with a FileReader, StringReader etc., depending on where you read the input from.
If you're reading from the console, the Console class also has a readLine() method.
If you use the BufferedReader you could do the following:
class Lastnthchar {
public static void main(String[] args) throws IOException {
String alldata =" FORM"+"\n"+
" to get all data"+"\n"+
" PART A is mandatory"+"\n"+
" enclose all Certificate"+"\n"+
" Certificate No. SFDSFDFS Last updated on 12-Jun-2009"+"\n"+
" Name and address"+"\n"+
" Lisa Lawerence"+"\n"+
" 10/3 TOP FLOOR, Street no 22 ,NewYork"+"\n"+
" residence"+"\n"+
" zip-21232"+"\n"+
" C 78,New York"+"\n"+
" US"+"\n"+
" US"+"\n"+
" "+"\n"+
" worldwide";
BufferedReader br = new BufferedReader(new StringReader(alldata));
String namerequired;
String line;
int counter = 0;
while ((line = br.readLine()) != null) {
if (counter == 6) {
namerequired = line;
}
counter++;
}
}
}
One way to approach your problem is to check index of "\n" specified amount of times until you find the line you need. I'm writing this off the top of my head so i'm sorry if syntax is not 100% accurate, but the logic is here:
public String readSpecifiedLine(String str, int lineNumber){
int lineStartIndex = 0;
//start by finding start of specified line
for(int i=0;i<lineNumber;i++){
lineStartIndex = str.IndexOf("\n",lineStartIndex); //find new line symbol from
//specified index
lineStartIndex++; //increase the index by 1 so the to skip newLine Symbol on
//next search or substring method
//Note, you might need to increase by 2 if "\n" counts as 2 characters in a string
}
int nextLine = str.IndexOf("\n",lineStartIndex); //end of line 7
retrun str.substring(lineStartIndex,nextline);
}
You might need to play around with indexes

JAVA - Ignore part of strings containing "#"

I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.
Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}
A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again
readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");
Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}

Align Strings in columns in JTextArea

I want to print Strings in JTextArea and align them properly. Its hard to explain so I will upload the screen shot of what I am trying to achieve.
So Strings printed in each line are printed from Paper object which has parameters (id, title, author, date, rank). The data is read from a text file and is stored in a LinkedList using loadPaper() function.
Then displayPapers() function is used to display content of the Paper object to the JTextArea.
displayPapers() is listed below:
/** Print all Paper object present in the LinkedList paperList to textArea */
public void displayPapers(){
// clear textArea before displaying new content
displayTxtArea.setText("");
Paper currentPaper;
ListIterator<Paper> iter = paperList.listIterator();
while(iter.hasNext()){
currentPaper = iter.next();
String line = currentPaper.toString();
if("".equals(line)){
continue;
} // end if
String[] words = line.split(",");
displayTxtArea.append (" "
+ padString(words[0],30)
+ padString(words[1],30)
+ " "
+ padString(words[2],30)
+ " "
+ padString(words[3],30)
+ padString(words[4],30)
+ "\n");
System.out.println(words);
//displayTxtArea.append(currentPaper.toString());
} // end while
displayTxtArea.append(" Total " + noOfPapers + " entries!");
} // end showAllPaper
The padString() function adds spaces to the String so that all of them have same number of words. PadString() is listed below:
/** Add spaces to Strings so that all of the are of same number of characters
* #param str String to be padded
* #param n total number words String should be padded to
* #return str Padded string
*/
private String padString(String str, int n){
if(str.length() < n){
for(int j = str.length(); j < n; j++){
str += " ";
} // end for
} // end if
return str;
} // end padString
I have worked on this for a while but still cant get the solution. As you can notice the above picture not everything is perfectly aligned as intended.
How do I align them perfectly so that it looks nicer? Thanks.
Output will be aligned "properly" in your JTextArea only if you use a mono-spaced font. "Andale Mono 14" for example would do the trick.
Also, in order to make your life easier and avoid the padding hell, use String.format with it's syntax.
String format = "%1$5s %2$-40s %3$-20s";
String someLine;
while (whatEver...) {
...
someLine = String.format(format, aNum, aName, aDate);
jTextArea1.append(someLine + "\n");
}
Use a JTable instead (for what is apparently tabular information). See How To Use Tables for more details & working examples.
You may use HTML with swing component or use JEditorPane.
JLabel jt=new JLabel();
jt.setText("<html>
<table border='1'>
<tr><th>No</th><th>Name</th></tr>
<tr><td>1</td><td>Mr.A</td></tr></table></html>");
You can also change the font of the JTextArea if it is allowed in your problem
textArea.setFont(new Font("monospaced", Font.PLAIN, 12));

Categories

Resources