Trim() in Java not working the way I expect? [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Query about the trim() method in Java
I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words).
For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before.
However, it's giving me trouble. My code looks like this:
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).trim());
}
The result is just the same; no spaces are removed at the end.
Thank you in advance for your excellent answers!
UPDATE:
The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:
for (String s : splitSource2) {
if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
splitSource3.set(i, splitSource3.get(i).trim());
System.out.println(i + ": " + splitSource3.get(i));
}
}
UPDATE:
Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out
System.out.println(i + ": " + splitSource3.get(i) + "*");
in a for each loop afterward.
This is how I knew I had a problem.
By the way, the problem has still not been fixed.
UPDATE:
Sample output (minus single quotes):
'0: Olin D. Kirkland                                          '
'1: Sophomore                                          '
'2: Someplace, Virginia  12345<br />VA SomeCity<br />'
'3: Undergraduate                                          '
EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().

It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.
If my assumption is correct then you've got two choices:
Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:
private static void cutCharacters(String fromHtml) {
String result = fromHtml;
char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
for (char ch : problematicCharacters) {
result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
}
return result;
}
If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:
private String getImportantParts(String fromHtml) {
Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
Matcher m = p.matcher(fromHtml);
StringBuilder buff = new StringBuilder();
while (m.find()) {
buff.append(m.group(1));
}
return buff.toString().trim();
}

Works without a problem for me.
Here your code a bit refactored and (maybe) better readable:
final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
String nameWithoutOpeningTag = s.substring(openingTag.length());
splitSource3.add(nameWithoutOpeningTag);
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
String name = splitSource3.get(i);
int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
String nameWithoutClosingTag = name.substring(0, closingTagBegin);
String nameTrimmed = nameWithoutClosingTag.trim();
splitSource3.set(i, nameTrimmed);
System.out.println("|" + splitSource3.get(i) + "|");
}
I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.

Related

Get specific words from a string in Java

If I have the following URL:
http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0
How can I get the name of the plugin (simply named wordpressplugin in the URL) and the version so the output will be - wordpressplugin ver 1.0?
I am posting my comment as an answer
String s = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String[] ary = s.split("/");
System.out.println(ary[5] + " " + ary[7]);
Easiest way this is acc to your question,
you have to use regex for more dynamic searching.
You may do it like so, using Regex support in Java.
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
Pattern pattern = Pattern.compile("(.*plugins/)(.*)(/\\d{3}/)(ver.*)");
Matcher matcher = pattern.matcher(url);
if (matcher.matches()) {
System.out.println("Plugin: " + matcher.group(2));
System.out.println("Version: " + matcher.group(4));
}
Notice the use of capture groups. Here's the output.
Plugin: wordpressplugin
Version: ver=1.0
You should have a look into Regular Expressions (in Oracle tutorials), which are the general tool in any programming language to get/match sub-strings out of a larger string (which follows some more or less fixed format).
Because you claim to be new to JAVA, here is a very simple answer that should suit your skills
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String search = "plugins/";
int index = url.indexOf(search);
String pluginName, version;
if (index > -1)
{
index += search.length;
pluginName = url.substring(index, url.indexOf("/",index + 1));
search = "ver=";
index = url.indexOf(search);
if (index > -1)
{
version = url.substring(index + search.length);
System.out.prinln(pluginName + " " + version);
}
}
PS: This would work if and only if your url format always remains the same!
The fastest way to solve this problem is to take advantage of the split method of Strings. Just study the method below carefully, it's basic.
public String getVersionNumber(String url){
String[] arr0 = url.split("//");
//The code above returns an array of two strings: "http:" and "www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0"
String[] arr1 = arr0[1].split("/");
//The code above returns an array of six strings: "www.example.com", "wordpress", "plugins", "wordpressplugin", "123" and "ver=1.0".
return String.format("%s %s", arr1[3], arr1[5]);
//OUTPUT: wordpressplugin ver=1.0
//I simply returned what I needed.
}
I hope this helps.. merry coding!

Having trouble with string concatenation

I was trying to concatenate a string to itself + something else, like this:
String example = " "
for (int i = 0; i < 5; i++) {
if (condition OK) {
example = example + "\nAnother text";
}
}
JOptionPane.showMessageDialog(null, example);
In my mind, it should've print " (new line)Another text" but it seems to work only with the last entry in my "Another text". Like, if the condition inside the "for" loop is OK 3 times, it prints " (new line)Another text(3)" instead of " (new line) Another Text(1) (new line) Another text(2)...
Any idea of what may be happening?
EDIT: after realizing that my code was fine, I followed afzalex recommendation and found out the error was in my condition. Thanks bro
I used below program I got expected output.
String example = " ";
for (int i = 0; i < 5; i++) {
if (i == 1 || i == 3) {
example = example + "\nAnother text";
}
}
System.out.println(example);
Output:
Another text
Another text
So, probably it could be something wrong with JOptionPane.showMessageDialog(null, example); If it is being interpreted as HTML in the end, then better use </br> instead of \n, that can give you new line.

Splitting a string based on " " and spaces [duplicate]

This question already has answers here:
Regular Expression to Split String based on space and matching quotes in java
(3 answers)
Closed 8 years ago.
I have a String str, which is comprised of several words separated by single spaces.
If I want to create a set or list of strings I can simply call str.split(" ") and I would get I want.
Now, assume that str is a little more complicated, for example it is something like:
str = "hello bonjour \"good morning\" buongiorno";
In this case what is in between " " I want to keep so that my list of strings is:
hello
bonjour
good morning
buongiorno
Clearly, if I used split(" ") in this case it won't work because I'd get
hello
bonjour
"good
morning"
buongiorno
So, how do I get what I want?
You can create a regex that finds every word or words between "".. like:
\w+|(\"\w+(\s\w+)*\")
and search for them with the Pattern and Matcher classes.
ex.
String searchedStr = "";
Pattern pattern = Pattern.compile("\\w+|(\\\"\\w+(\\s\\w+)*\\\")");
Matcher matcher = pattern.matcher(searchedStr);
while(matcher.find()){
String word = matcher.group();
}
Edit: works for every number of words within "" now. XD forgot that
You can do something like below. First split the Sting using "\"" and then split the remaining ones using space" " . The even tokens will be the ones between quotes "".
public static void main(String args[]) {
String str = "hello bonjour \"good morning\" buongiorno";
System.out.println(str);
String[] parts = str.split("\"");
List<String> myList = new ArrayList<String>();
int i = 1;
for(String partStr : parts) {
if(i%2 == 0){
myList.add(partStr);
}
else {
myList.addAll(Arrays.asList(partStr.trim().split(" ")));
}
i++;
}
System.out.println("MyList : " + myList);
}
and the output is
hello bonjour "good morning" buongiorno
MyList : [hello, bonjour, good morning, buongiorno]
You may be able to find a solution using regular expressions, but what I'd do is simply manually write a string breaker.
List<String> splitButKeepQuotes(String s, char splitter) {
ArrayList<String> list = new ArrayList<String>();
boolean inQuotes = false;
int startOfWord = 0;
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == splitter && !inQuotes && i != startOfWord) {
list.add(s.substring(startOfWord, i));
startOfWord = i + 1;
}
if (s.charAt(i) == "\"") {
inQuotes = !inQuotes;
}
}
return list;
}

Insert a space after every given character - java

I need to insert a space after every given character in a string.
For example "abc.def..."
Needs to become "abc. def. . . "
So in this case the given character is the dot.
My search on google brought no answer to that question
I really should go and get some serious regex knowledge.
EDIT : ----------------------------------------------------------
String test = "0:;1:;";
test.replaceAll( "\\:", ": " );
System.out.println(test);
// output: 0:;1:;
// so didnt do anything
SOLUTION: -------------------------------------------------------
String test = "0:;1:;";
**test =** test.replaceAll( "\\:", ": " );
System.out.println(test);
You could use String.replaceAll():
String input = "abc.def...";
String result = input.replaceAll( "\\.", ". " );
// result will be "abc. def. . . "
Edit:
String test = "0:;1:;";
result = test.replaceAll( ":", ": " );
// result will be "0: ;1: ;" (test is still unmodified)
Edit:
As said in other answers, String.replace() is all you need for this simple substitution. Only if it's a regular expression (like you said in your question), you have to use String.replaceAll().
You can use replace.
text = text.replace(".", ". ");
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace%28java.lang.CharSequence,%20java.lang.CharSequence%29
If you want a simple brute force technique. The following code will do it.
String input = "abc.def...";
StringBuilder output = new StringBuilder();
for(int i = 0; i < input.length; i++){
char c = input.getCharAt(i);
output.append(c);
output.append(" ");
}
return output.toString();

Java Help Manipulating Anchor with Pattern

I'm having trouble accomplishing a few things with my program, I'm hoping someone is able to help out.
I have a String containing the source code of a HTML page.
What I would like to do is extract all instances of the following HTML and place it in an array:
<img src="http://*" alt="*" style="max-width:460px;">
So I would then have an array of X size containing values similar to the above, obviously with the src and alt attributes updated.
Is this possible? I know there are XML parsers, but the formatting is ALWAYS the same.
Any help would be greatly appreciated.
I'll suggest using ArrayList instead of a static array since it looks like you don't know how many matches you are going to have.
Also not good idea to have REGEX for HTML but if you are sure the tags always use the same format then I'll recommend:
Pattern pattern = Pattern.compile(".*<img src=\"http://(.*)\" alt=\"(.*)\"\\s+sty.*>", Pattern.MULTILINE);
Here is an example:
public static void main(String[] args) throws Exception {
String web;
String result = "";
for (int i = 0; i < 10; i++) {
web = "<img src=\"http://image" + i +".jpg\" alt=\"Title of Image " + i + "\" style=\"max-width:460px;\">";
result += web + "\n";
}
System.out.println(result);
Pattern pattern = Pattern.compile(".*<img src=\"http://(.*)\" alt=\"(.*)\"\\s+sty.*>", Pattern.MULTILINE);
List<String> imageSources = new ArrayList<String>();
List<String> imageTitles = new ArrayList<String>();
Matcher matcher = pattern.matcher(result);
while (matcher.find()) {
String imageSource = matcher.group(1);
String imageTitle = matcher.group(2);
imageSources.add(imageSource);
imageTitles.add(imageTitle);
}
for(int i = 0; i < imageSources.size(); i++) {
System.out.println("url: " + imageSources.get(i));
System.out.println("title: " + imageTitles.get(i));
}
}
}
As your getting an ArrayIndexOutOfBoundsException, it is most likely that the String array imageTitles is not big enough to hold all instances of ALT that are found in the regex search. In this case it is likely that it is a zero-size array.

Categories

Resources