How to remove whitespace in String imported from Excel - java

I need to remove all white character from a string and I am not able to do so.
Anyone has an idea on how to do it?
Here is my string retrieved from an excel file via jxl API :
"Destination à gauche"
And here are its bytes :
6810111511610511097116105111110-96-32321039711799104101
There is the code I use to remove whitespaces :
public static void checkEntetes(Workbook book) {
String sheetName = "mysheet";
System.out.print(sheetName + " : ");
for(int i = 0; i < getColumnMax(book.getSheet(sheetName)); i++) {
String elementTrouve = book.getSheet(sheetName).getCell(i, 0).getContents();
String fileEntete = new String(elementTrouve.getBytes()).replaceAll("\\s+","");
System.out.println("\t" + elementTrouve + ", " + bytesArrayToString(elementTrouve.getBytes()));
System.out.println("\t" + fileEntete + ", " + bytesArrayToString(fileEntete.getBytes()));
}
System.out.println();
}
And this outputs :
"Destination à gauche", 6810111511610511097116105111110-96-32321039711799104101
"Destination àgauche", 6810111511610511097116105111110-96-321039711799104101
I even tried to make it myself and it still leaves a space before the 'à' char.
public static String removeWhiteChars(String s) {
String retour = "";
for(int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c != (char) ' ') {
retour += c;
}
}
return retour;
}

regular expressions to the rescue:
str = str.replaceAll("\\s+", "")
will remove any sequence of whitespace characters. for example:
String input = "Destination à gauche";
String output = input.replaceAll("\\s+","");
System.out.println("output is \""+output+"\"");
outputs Destinationàgauche
if youre starting point is indeed the raw bytes (byte[]) you will first need to make them into a String:
byte[] inputData = //get from somewhere
String stringBefore = new String(inputData, Charset.forName("UTF-8")); //you need to know the encoding
String withoutSpaces = stringBefore.replaceAll("\\s+","");
byte[] outputData = withoutSpaces.getBytes(Charset.forName("UTF-8"));

If you would like to use a formula, the TRIM function will do exactly what you're looking for:
+----+------------+---------------------+
| | A | B |
+----+------------+---------------------+
| 1 | =TRIM(B1) | value to trim here |
+----+------------+---------------------+
So to do the whole column.
1) Insert a column
2) Insert TRIM function pointed at cell you are trying to correct.
3) Copy formula down the page
4) Copy inserted column
5) Paste as "Values"
Reference: Question number 9578397 on stackoverflow.com

Related

Java 6 converting utf8 to iso88591 charset and ignoring unmappable characters

I have written the following function which gets rid of characters in a string that can't be represented in iso88591:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
return new String(str.getBytes(),iso88591charset);
}
My problem is this doesn't have the behavior I require.
When it comes across a character that has no representation it is converted to multiple bytes. I want that character to be simply omitted from the result.
I would also like to somehow not have to have all those replace commands.
I have been researching charsetEnocder. It has methods like:
CharsetEncoder encoder = iso88591charset.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE);
encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
which seem to be what I want, but I have failed to even write a function that mimics what I already have using charset encoder yet alone get to set those options.
Also I am restricted to Java 6 :(
Update:
I came up with a nasty solution for this, but there must be a better way to do it:
public static String convert(String str) {
if (str.length()==0) return str;
str = str.replace("–","-");
str = str.replace("“","\"");
str = str.replace("”","\"");
String str2 = "";
for (int c=0;c<str.length();c++) {
String cur = (new Character(str.charAt(c))).toString();
if (cur.equals(new String(cur.getBytes(),iso88591charset))) str2 += cur;
}
return new String(str2.getBytes(),iso88591charset);
}
One possibile way could be
// U+2126 - omega sign
// U+2013 - en dash
// U+201c - left double quotation mark
// U+201d - right double quotation mark
String str = "\u2126\u2013\u201c\u201d";
System.out.println("original = " + str);
str = str.replace("–", "-");
str = str.replace("“", "\"");
str = str.replace("”", "\"");
System.out.println("replaced = " + str);
StringBuilder sb = new StringBuilder();
for (char c : str.toCharArray()) {
if (c <= '\u00ff') {
sb.append(c);
}
}
System.out.println("stripped = " + sb);
output
original = Ω–“”
replaced = Ω-""
stripped = -""

Replacing Strings Java

I have this function to check if some words appear in a specific line, and then surround them with a given char.
The code above works like a charm, however since the words in the string array "words" are always low case, the words will be lower case as well. How can i fix this issue ?
The inputs:
BufferedReader in = "Hello, my name is John:";
char c = '*';
String [] words = {"hello","john"};
The desired output:
BufferedWriter out = "*Hello*, my name is *John*:";
The actual output:
BufferedWriter out = "*hello*, my name is *john*";
The code:
public void replaceString(BufferedReader in, BufferedWriter out, char c, String[] words){
String line_in = in.readLine();
while (line_in != null) {
for (int j = 0; j < words.length; j++) {
line_in = line_in.replaceAll("(?i)" + words[j], bold + words[j]
+ bold);
}
out.write(line_in);
out.newLine();
line_in = in.readLine();
}
}
Use
line_in.replaceAll("(?i)(" + words[j] + ")", bold + "$1" + bold);
// \________________/ \/
// capture word reference it

End line StringBuilder in RandomAccessFile

I'm trying use the class RandomAccessFile, but I have a problem with the Strings.
This is the first part. Write in a File:
public static void main(String[] args) throws IOException {
File file = new File("/home/pep/java/randomFile.dat");
RandomAccessFile fitxerAleatori = new RandomAccessFile(file, "rw");
String[] surnames = { "SMITH",
"LOMU" };
int[] dep = { 10,
20 };
Double[] salary = { 1200.50,
1200.50 };
StringBuilder buffer = null;
int n = surnames.length;
for (int i = 0; i<n; i++){
randomFile.writeInt(i+1); //ID
buffer = new StringBuilder(surnames[i]);
buffer.setLength(10); //10 characters
randomFile.writeChars(buffer.toString());
randomFile.writeInt(dep[i]);
randomFile.writeDouble(salary[i]);
}
randomFile.close();
}
In the second part, I try read this file:
File file = new File("/home/pep/java/randomFile.dat");
RandomAccessFile randomFile = new RandomAccessFile(file, "r");
char[] surname = new char[10];
char aux;
int id, dep, pos;
Double salary;
pos = 0;
for (;;) {
randomFile.seek(pos);
id = randomFile.readInt();
for (int i = 0; i < surname.length; i++) {
aux = randomFile.readChar();
surname[i] = aux;
}
String surnameStr = new String(surname); //HERE IS THE PROBLEM!!
dep = randomFile.readInt();
salary = randomFile.readDouble();
System.out.println("ID: " + id + ", Surname: " + surnameStr + ", Departament: " + dep + ", Salary: " + salary);
pos = pos + 36; // 4 + 20 + 4 + 8
if (randomFile.getFilePointer() == randomFile.length())
break;
}
randomFile.close();
}
Well, when I hope read:
ID: 1, Surname: SMITH, Dep: 10, Salary: 1200.50
I recived:
ID: 1, Surname: SMITH
It's like in the surname there is a end of line, because if I don't display the surname, the other info is correct.
Thank you!
Where does cognom come from? [Edit: OK, I found it. It's Catalan for surname. And now the typo coming from departamento is also clear. :-]
What do you get if you insert System.out.println( Arrays.toString( surname )) before the problem line? I assume it's something like [S, M, I, T, H, [], [], [], [], []] (in Eclipse's Console view). Where [] stands for a square, i.e. a non-printable character.
What do you get if you insert System.out.println( (int) surname[5] )? I assume it's 0. And I assume this 0 value is causing the problem.
What do you get if you use a surname that's exactly 10 characters long?
Hint 1: There's a typo in Departament.
Hint 2: Give System.out.printf(...) a chance in favour of println(...).
Hint 3: The if in your solution can be shortened to the more elegant:
cognom[i] = aux != 0 ? aux : ' ';
The problem was in the char array. I change de loop for that read the chars:
for (int i = 0; i < surname.length; i++) {
aux = randomFile.readChar();
surname[i] = aux != 0 ? aux : ' ';
}
Creating a StringBuffer and setting its length to ten will cause nulls to be written for strings shorter than ten characters, and that in turn will cause a decoding problem when you read. It would be much better to create a String, pad it with spaces to ten chars, write it, then trim() the resulting String when you read it.

Get the Beginning Position Of a field

Hi Guys I am writing a code that reads a text file in this format:
City |First Name| Second Name|Last Name|
The output I currently have is :
Column 1 is 17--------City
Column 2 is 10--------First Name
Column 3 is 12--------Second Name
Column 4 is 9---------Last Name
I need the Begin Position Also Of each Field in the Text File for example:
Column 1 is 17--------City : Position 1
Column 2 is 10--------First Name: Position 18
Column 3 is 12--------Second Name: Position 31
Column 4 is 9---------Last Name: Position 44
Here Is the Code I currently Have. Is there a way to achieve This?
package stanley.column.reader;
import java.io.*;
public class StanleyColumnReader {
public static void main(String[] args) throws IOException {
System.out.println("Developed By Stanley Mungai");
File f = new File("C:/File/");
if (!f.exists()) {
f.createNewFile();
} else {
f.delete();
}
String [] files = f.list();
for (int j = 0; j < files.length; j++){
FileInputStream fs = new FileInputStream("C:/File/" + files[j]);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String result = "_result";
BufferedWriter is = new BufferedWriter(new FileWriter("C:/File/" + files[j] + result + ".txt"));
for (int i = 0; i < 0; i++) {
br.readLine();
}
String line = br.readLine();
String[] split = line.split("|");
for (int i = 0; i < split.length; i++) {
int k = i + 1;
System.out.println("Calculating the size of field " + k );
is.write("Column " + k + " is " + split[i].length());
is.flush();
is.newLine();
}
}
System.out.println("Success");
System.out.println("Output Saved to C:/File");
}
}
You could do that with a bit more advanced regexp group matching and get the group start index. But might be overkill and too advanced considering the question.
But a quick simple way in your case that might work is to just use indexOf on the line.
That is change your output to include:
" Position "+(line.indexOf(split[i])+1)
As long as a last name, first name and city aren't repeated on the same line...
You hardly need to flush on each line by the way, I suggest to move it outside the loop.
The regexp solution:
//first declare the pattern once in the class
static final Pattern pattern = Pattern.compile("\\s*(.*?)\\s*\\|");
...
//instead of the split loop:
String line = "City |First Name| Second Name|Last Name| Foo |Bar |"; //br.readLine();
Matcher matcher = pattern.matcher(line);
int column = 1;
while (matcher.find(column == 1 ? 0 : matcher.end())) {
String match = matcher.group(1);
System.out.println("Column " + column + " is " + match.length() + "---" + match + ": Position " + (matcher.start() + 1));
column++;
}
Possibly, depending on the exact position you want, you might want to change (matcher.start()+1) to (matcher.start(1)+1)
IS this an assignment? Please tag it properly.
You haven't said whether the delimiters are "|" in the data too but seeing your code, I am assuming it is.
What I don't understand is how the position you mentioned for Column 3 is 31 and column 4 is 44? Column 3 should be 10+17+1 =28 and column 4 should be 10+17+12+1=40. If I am getting it wrong, you need to post your original data too.
String[] split = line.split("|");
int pos=1; //initial position
for (int i = 0; i < split.length; i++) {
System.out.println("Calculating the size of field " + (i+1));
is.write("Column " + (i+1) + " is " + pos+" : Position "+pos);
pos=pos+split[i].length+1; //starting position for next column data
is.flush();
is.newLine();
}
Or you could find position by using indexOf method : line.indexOf(split[i])+1
If I understand what you need. Maybe you can use the indexOf method. This brings you the first coincidence. After finding this, change the pipe for something different and call indexOf pipe in the next iteration again.
String line = br.readLine();
for (int i = 0; i < split.length; i++) {
System.out.println("Calculating the position " + line.indexOf("|") );
line[line.indexOf("|")] = ",";
}

Java special characters RegEx

I want to achieve following using Regular expression in Java
String[] paramsToReplace = {"email", "address", "phone"};
//input URL string
String ip = "http://www.google.com?name=bob&email=okATtk.com&address=NYC&phone=007";
//output URL string
String op = "http://www.google.com?name=bob&email=&address=&phone=";
The URL can contain special characters like %
Try this expression: (email=)[^&]+ (replace email with your array elements) and replace with the group: input.replaceAll("("+ paramsToReplace[i] + "=)[^&]+", "$1");
String input = "http://www.google.com?name=bob&email=okATtk.com&address=NYC&phone=007";
String output = input;
for( String param : paramsToReplace ) {
output = output.replaceAll("("+ param + "=)[^&]+", "$1");
}
For the example above. you can use split
String[] temp = ip.split("?name=")[1].split("&")[0];
op = temp[0] + "?name=" + temp[1].split("&")[0] +"&email=&address=&phone=";
Something like this?
private final static String REPLACE_REGEX = "=.+\\&";
ip=ip+"&";
for(String param : paramsToReplace) {
ip = ip.replaceAll(param+REPLACE_REGEX, Matcher.quoteReplacement(param+"=&"));
}
P.S. This is only a concept, i didn't compile this code.
You don't need regular expressions to achieve that:
String op = ip;
for (String param : paramsToReplace) {
int start = op.indexOf("?" + param);
if (start < 0)
start = op.indexOf("&" + param);
if (start < 0)
continue;
int end = op.indexOf("&", start + 1);
if (end < 0)
end = op.length();
op = op.substring(0, start + param.length() + 2) + op.substring(end);
}

Categories

Resources